Build generative AI solutions with Amazon Bedrock

Generative AI is revolutionizing how businesses operate, interact with customers, and innovate. If you’re embarking on the journey to build a generative AI-powered solution, you might wonder how to navigate the complexities involved from selecting the right models to managing prompts and enforcing data privacy.

In this post, we show you how to build generative AI applications on Amazon Web Services (AWS) using the capabilities of Amazon Bedrock, highlighting how Amazon Bedrock can be used at each step of your generative AI journey. This guide is valuable for both experienced AI engineers and newcomers to the generative AI space, helping you use Amazon Bedrock to its fullest potential.

Amazon Bedrock is a fully managed service that provides a unified API to access a wide range of high-performing foundation models (FMs) from leading AI companies like Anthropic, Cohere, Meta, Mistral AI, AI21 Labs, Stability AI, and Amazon. It offers a robust set of tools and features designed to help you build generative AI applications efficiently while adhering to best practices in security, privacy, and responsible AI.

Calling an LLM with an API

You want to integrate a generative AI feature into your application through a straightforward, single-turn interaction with a large language model (LLM). Perhaps you need to generate text, answer a question, or provide a summary based on user input. Amazon Bedrock simplifies generative AI application development and scaling through a unified API for accessing diverse, leading FMs. With support for Amazon models and leading AI providers, you have the freedom to experiment without being locked into a single model or provider. With the rapid pace of development in AI, you can seamlessly switch models for optimized performance with no application rewrite required.

Beyond direct model access, Amazon Bedrock expands your options with the Amazon Bedrock Marketplace. This marketplace gives you access to over 100 specialized FMs; you can discover, test, and integrate new capabilities all through fully managed endpoints. Whether you need the latest innovation in text generation, image synthesis, or domain-specific AI, Amazon Bedrock provides the flexibility to adapt and scale your solution with ease.

With one API, you stay agile and can effortlessly switch between models, upgrade to the latest versions, and future-proof your generative AI applications with minimal code changes. To summarize, Amazon Bedrock offers the following benefits:

Simplicity: No need to manage infrastructure or deal with multiple APIs
Flexibility: Experiment with different models to find the best fit
Scalability: Scale your application without worrying about underlying resources

To get started, use the Chat or Text playground to experiment with different FMs, and use the Converse API to integrate FMs into your application.

After you’ve integrated a basic LLM feature, the next step is optimizing the performance and making sure you’re using the right model for your requirements. This brings us to the importance of evaluating and comparing models.

Choosing the right model for your use case

Selecting the right FM for your use case is crucial, but with so many options available, how do you know which one will give you the best performance for your application? Whether it’s for generating more relevant responses, summarizing information, or handling nuanced queries, choosing the best model is key to providing optimal performance.

You can use Amazon Bedrock model evaluation to rigorously test different FMs to find the one that delivers the best results for your use case. Whether you’re in the early stages of development or preparing for launch, selecting the right model can make a significant difference in the effectiveness of your generative AI solutions.

The model evaluation process consists of the following components:

Automatic and human evaluation: Begin by experimenting with different models using automated evaluation metrics like accuracy, robustness, or toxicity. You can also bring in human evaluators to measure more subjective factors, such as friendliness, style, or how well the model aligns with your brand voice.
Custom datasets and metrics: Evaluate the performance of models using your own datasets or pre-built options. Customize the metrics that matter most for your project, making sure the selected model aligns with your business or operational goals.
Iterative feedback: Throughout the development process, run evaluations iteratively, allowing for faster refinement. This helps you compare models side by side, so you can make a data-driven decision when selecting the FM that fits your use case.

Imagine you’re building a customer support AI assistant for an ecommerce service. You can model evaluation to test multiple FMs with real customer queries, evaluating which model provides the most accurate, friendly, and contextually appropriate responses. By comparing models side by side, you can choose the model that will deliver the best possible user experience for your customers. After you’ve evaluated and selected the ideal model, the next step is making sure it aligns with your business needs. Off-the-shelf models might perform well, but for a truly tailored experience, you need more customization. This leads to the next important step in your generative AI journey: personalizing models to reflect your business context. You need to make sure the model generates the most accurate and contextually relevant responses. Even the best FMs will not have access to the latest or domain-specific information critical to your business. To solve this, the model needs to use your proprietary data sources, making sure its outputs reflect the most up-to-date and relevant information. This is where you can use Retrieval Augmented Generation (RAG) to enrich the model’s responses by incorporating your organization’s unique knowledge base.

Enriching model responses with your proprietary data

A publicly available LLM might perform well on general knowledge tasks, but struggle with outdated information or lack context from your organization’s proprietary data. You need a way to provide the model with the most relevant, up-to-date insights to provide accuracy and contextual depth. There are two key approaches that you can use to enrich model responses:

RAG: Use RAG to dynamically retrieve relevant information at query time, enriching model responses without requiring retraining
Fine-tuning: Use RAG to customize your chosen model by training it on proprietary data, improving its ability to handle organization-specific tasks or domain knowledge

We recommend starting with RAG because of its flexible and straightforward to implement. You can then fine-tune the model for deeper domain adaptation if needed. RAG dynamically retrieves relevant information at query time, making sure model responses stay accurate and context aware. In this approach, data is first processed and indexed in a vector database or similar retrieval system. When a user submits a query, Amazon Bedrock searches this indexed data to find relevant context, which is injected into the prompt. The model then generates a response based on both the original query and the retrieved insights without requiring additional training.

Amazon Bedrock Knowledge Bases automates the RAG pipeline—including data ingestion, retrieval, prompt augmentation, and citations—reducing the complexity of setting up custom integrations. By seamlessly integrating proprietary data, you can make sure that the models generate accurate, contextually rich, and continuously updated responses.

Bedrock Knowledge Bases supports various data types to tailor AI-generated responses to business-specific needs:

Unstructured data: Extract insights from text-heavy sources like documents, PDFs, and emails
Structured data: Enable natural language queries on databases, data lakes, and warehouses without moving or preprocessing data
Multimodal data: Process both text and visual elements in documents and images using Amazon Bedrock Data Automation
GraphRAG: Enhance knowledge retrieval with graph-based relationships, enabling AI to understand entity connections for more context-aware responses

With these capabilities, Amazon Bedrock reduces data silos, making it straightforward to enrich AI applications with both real-time and historical knowledge. Whether working with text, images, structured datasets, or interconnected knowledge graphs, Amazon Bedrock provides a fully managed, scalable solution without the need for complex infrastructure. To summarize, using RAG with Amazon Bedrock offers the following benefits:

Up-to-date information: Responses include the latest data from your knowledge bases
Accuracy: Reduces the risk of incorrect or irrelevant answers
No extra infrastructure: You can avoid setting up and managing your own vector databases or custom integrations

When your model is pulling from the most accurate and relevant data, you might find that its general behavior still needs some refinement perhaps in its tone, style, or understanding of industry-specific language. This is where you can further fine-tune the model to align it even more closely with your business needs.

Tailoring models to your business needs

Out-of-the-box FMs provide a strong starting point, but they often lack the precision, brand voice, or industry-specific expertise required for real-world applications. Maybe the language doesn’t align with your brand, or the model struggles with specialized terminology. You might have experimented with prompt engineering and RAG to enhance responses with additional context. Although these techniques help, they have limitations (for example, longer prompts can increase latency and cost), and models might still lack deep domain expertise needed for domain-specific tasks. To fully harness generative AI, businesses need a way to securely adapt models, making sure AI-generated responses are not only accurate but also relevant, reliable, and aligned with business goals.

Amazon Bedrock simplifies model customization, enabling businesses to fine-tune FMs with proprietary data without building models from scratch or managing complex infrastructure.

Rather than retraining an entire model, Amazon Bedrock provides a fully managed fine-tuning process that creates a private copy of the base FM. This makes sure your proprietary data remains confidential and isn’t used to train the original model. Amazon Bedrock offers two powerful techniques to help businesses refine models efficiently:

Fine-tuning: You can train an FM with labeled datasets to improve accuracy in industry-specific terminology, brand voice, and company workflows. This allows the model to generate more precise, context-aware responses without relying on complex prompts.
Continued pre-training: If you have unlabeled domain-specific data, you can use continued pre-training to further train an FM on specialized industry knowledge without manual labeling. This approach is especially useful for regulatory compliance, domain-specific jargon, or evolving business operations.

By combining fine-tuning for core domain expertise with RAG for real-time knowledge retrieval, businesses can create highly specialized AI models that stay accurate and adaptable, and make sure the style of responses align with business goals. To summarize, Amazon Bedrock offers the following benefits:

Privacy-preserved customization: Fine-tune models securely while making sure that your proprietary data remains private
Efficiency: Achieve high accuracy and domain relevance without the complexity of building models from scratch

As your project evolves, managing and optimizing prompts becomes critical, especially when dealing with different iterations or testing multiple prompt versions. The next step is refining your prompts to maximize model performance.

Managing and optimizing prompts

As your AI projects scale, managing multiple prompts efficiently becomes a growing challenge. Tracking versions, collaborating with teams, and testing variations can quickly become complex. Without a structured approach, prompt management can slow down innovation, increase costs, and make iteration cumbersome. Optimizing a prompt for one FM doesn’t always translate well to another. A prompt that performs well with one FM might produce inconsistent or suboptimal outputs with another, requiring significant rework. This makes switching between models time-consuming and inefficient, limiting your ability to experiment with different AI capabilities effectively. Without a centralized way to manage, test, and refine prompts, AI development becomes slower, more costly, and less adaptable to evolving business needs.

Amazon Bedrock simplifies prompt engineering with Amazon Bedrock Prompt Management, an integrated system that helps teams create, refine, version, and share prompts effortlessly. Instead of manually adjusting prompts for months, Amazon Bedrock accelerates experimentation and enhances response quality without additional code. Bedrock Prompt Management introduces the following capabilities:

Versioning and collaboration: Manage prompt iterations in a shared workspace, so teams can track changes and reuse optimized prompts.
Side-by-side testing: Compare up to two prompt variations simultaneously to analyze model behavior and identify the most effective format.
Automated prompt optimization: Fine-tune and rewrite prompts based on the selected FM to improve response quality. You can select a model, apply optimization, and generate a more accurate, contextually relevant prompt.

Bedrock Prompt Management offers the following benefits:

Efficiency: Quickly iterate and optimize prompts without writing additional code
Teamwork: Enhance collaboration with shared access and version control
Insightful testing: Identify which prompts perform best for your use case

After you’ve optimized your prompts for the best results, the next challenge is optimizing your application for cost and latency by choosing the most appropriate model within a family for a given task. This is where intelligent prompt routing can help.

Optimizing efficiency with intelligent model selection

Not all prompts require the same level of AI processing. Some are straightforward and need fast responses, whereas others require deeper reasoning and more computational power. Using high-performance models for every request increases costs and latency, even when a lighter, faster model could generate an equally effective response. At the same time, relying solely on smaller models might reduce accuracy for complex queries. Without an automated approach, business must manually determine which model to use for each request, leading to higher costs, inefficiencies, and slower development cycles.

Amazon Bedrock Intelligent Prompt Routing optimizes AI performance and cost by dynamically selecting the most appropriate FM for each request. Instead of manually choosing a model, Amazon Bedrock automates model selection within a model family, making sure that each prompt is routed to the best-performing model for its complexity. Bedrock Intelligent Prompt Routing offers the following capabilities:

Adaptive model routing: Automatically directs simple prompts to lightweight models and complex queries to more advanced models, providing the right balance between speed and efficiency
Performance balance: Makes sure that you use high-performance models only when necessary, reducing AI inference costs by up to 30%
Effortless integration: Automatically selects the right model within a family, simplifying deployment

By automating model selection, Amazon Bedrock removes the need for manual decision-making, reduces operational overhead, and makes sure AI applications run efficiently at scale. With Amazon Bedrock Intelligent Prompt Routing, each query is processed by the most efficient model, delivering speed, cost savings, and high-quality responses. The next step in optimizing AI efficiency is reducing redundant computations in frequently used prompts. Many AI applications require maintaining context across multiple interactions, which can lead to performance bottlenecks, increased costs, and unnecessary processing overhead.

Reducing redundant processing for faster responses

As your generative AI applications scale, efficiency becomes just as critical as accuracy. Applications that repeatedly use the same context—such as document Q&A systems (where users ask multiple questions about the same document) or coding assistants that maintain context about code files—often face performance bottlenecks and rising costs because of redundant processing. Each time a query includes long, static context, models reprocess unchanged information, leading to increased latency as models repeatedly analyze the same content and unnecessary token usage inflates compute expenses. To keep AI applications fast, cost-effective, and scalable, optimizing how prompts are reused and processed is essential.

Amazon Bedrock Prompt Caching enhances efficiency by storing frequently used portions of prompts—reducing redundant computations and improving response times. It offers the following benefits:

Faster processing: Skips unnecessary recomputation of cached prompt prefixes, boosting overall throughput
Lower latency: Reduces processing time for long, repetitive prompts, delivering a smoother user experience, and reducing latency by up to 85% for supported models
Cost-efficiency: Minimizes compute resource usage by avoiding repeated token processing, reducing costs by up to 90%

With prompt caching, AI applications respond faster, reduce operational costs, and scale efficiently while maintaining high performance. With Bedrock Prompt Caching providing faster responses and cost-efficiency, the next step is enabling AI applications to move beyond static prompt-response interactions. This is where agentic AI comes in, empowering applications to dynamically orchestrate multistep processes, automate decision-making, and drive intelligent workflows.

Automating multistep tasks with agentic AI

As AI applications grow more sophisticated, automating complex, multistep tasks become essential. You need a solution that can interact with internal systems, APIs, and databases to execute intricate workflows autonomously. The goal is to reduce manual intervention, improve efficiency, and create more dynamic, intelligent applications. Traditional AI models are reactive; they generate responses based on inputs but lack the ability to plan and execute multistep tasks. Agentic AI refers to AI systems that act with autonomy, breaking down complex tasks into logical steps, making decisions, and executing actions without constant human input. Unlike traditional models that only respond to prompts, agentic AI models have the following capabilities:

Autonomous planning and execution: Breaks complex tasks into smaller steps, makes decisions, and plans actions to complete the workflow
Chaining capabilities: Handles sequences of actions based on a single request, enabling the AI to manage intricate tasks that would otherwise require manual intervention or multiple interactions
Interaction with APIs and systems: Connects to your enterprise systems and automatically invokes necessary APIs or databases to fetch or update data

Amazon Bedrock Agents enables AI-powered task automation by using FMs to plan, orchestrate, and execute workflows. With a fully managed orchestration layer, Amazon Bedrock simplifies the process of deploying, scaling, and managing AI agents. Bedrock Agents offers the following benefits:

Task orchestration: Uses FMs’ reasoning capabilities to break down tasks, plan execution, and manage dependencies
API integration: Automatically calls APIs within enterprise systems to interact with business applications
Memory retention: Maintains context across interactions, allowing agents to remember previous steps, providing a seamless user experience

When a task requires multiple specialized agents, Amazon Bedrock supports multi-agent collaboration, making sure agents work together efficiently while alleviating manual orchestration overhead. This unlocks the following capabilities:

Supervisor-agent coordination: A supervisor agent delegates tasks to specialized subagents, providing optimal distribution of workloads
Efficient task execution: Supports parallel task execution, enabling faster processing and improved accuracy
Flexible collaboration modes: You can choose between the following modes:
- Fully orchestrated supervisor mode: A central agent manages the full workflow, providing seamless coordination
- Routing mode: Basic tasks bypass the supervisor and go directly to subagents, reducing unnecessary orchestration
Seamless integration: Works with enterprise APIs and internal knowledge bases, making it straightforward to automate business operations across multiple domains

By using multi-agent collaboration, you can increase task success rates, reduce execution time, and improve accuracy, making AI-driven automation more effective for real-world, complex workflows. To summarize, agentic AI offers the following benefits:

Automation: Reduces manual intervention in complex processes
Flexibility: Agents can adapt to changing requirements or gather additional information as needed
Transparency: You can use the trace capability to debug and optimize agent behavior

Although automating tasks with agents can streamline operations, handling sensitive information and enforcing privacy is paramount, especially when interacting with user data and internal systems. As your application grows more sophisticated, so do the security and compliance challenges.

Maintaining security, privacy, and responsible AI practices

As you integrate generative AI into your business, security, privacy, and compliance become critical concerns. AI-generated responses must be safe, reliable, and aligned with your organization’s policies to help violating brand guidelines or regulatory policies, and must not include inaccurate or misleading responses.

Amazon Bedrock Guardrails provides a comprehensive framework to enhance security, privacy, and accuracy in AI-generated outputs. With built-in safeguards, you can enforce policies, filter content, and improve trustworthiness in AI interactions. Bedrock Guardrails offers the following capabilities:

Content filtering: Block undesirable topics and harmful content in user inputs and model responses.
Privacy protection: Detect and redact sensitive information like personally identifiable information (PII) and confidential data to help prevent data leaks.
Custom policies: Define organization-specific rules to make sure AI-generated content aligns with internal policies and brand guidelines.
Hallucination detection: Identify and filter out responses not grounded in your data sources through the following capabilities:
- Contextual grounding checks: Make sure model responses are factually correct and relevant by validating them against enterprise data source. Detect hallucinations when outputs contain unverified or irrelevant information.
- Automated reasoning for accuracy: Moves beyond trust me to prove it AI outputs by applying mathematically sound logic and structured reasoning to verify factual correctness.

With security and privacy measures in place, your AI solution is not only powerful but also responsible. However, if you’ve already made significant investments in custom models, the next step is to integrate them seamlessly into Amazon Bedrock.

Using existing custom models with Amazon Bedrock Custom Model Import

Use Amazon Bedrock Custom Model Import if you’ve already invested in custom models developed outside of Amazon Bedrock and want to integrate them into your new generative AI solution without managing additional infrastructure.

Bedrock Custom Model Import includes the following capabilities:

Seamless integration: Import your custom models into Amazon Bedrock
Unified API access: Interact with models—both base and custom—through the same API
Operational efficiency: Let Amazon Bedrock handle the model lifecycle and infrastructure management

Bedrock Custom Model Import offers the following benefits:

Cost savings: Maximize the value of your existing models
Simplified management: Reduce overhead by consolidating model operations
Consistency: Maintain a unified development experience across models

By importing custom models, you can use your prior investments. To truly unlock the potential of your models and prompt structures, you can automate more complex workflows, combining multiple prompts and integrating with other AWS services.

Automating workflows with Amazon Bedrock Flows

You need to build complex workflows that involve multiple prompts and integrate with other AWS services or business logic, but you want to avoid extensive coding.

Amazon Bedrock Flows has the following capabilities:

Visual builder: Drag-and-drop components to create workflows
Workflow automation: Link prompts with AWS services and automate sequences
Testing and versioning: Test flows directly in the console and manage versions

Amazon Bedrock Flows offers the following benefits:

No-code solution: Build workflows without writing code
Speed: Accelerate development and deployment of complex applications
Collaboration: Share and manage workflows within your team

With workflows now automated and optimized, you’re nearly ready to deploy your generative AI-powered solution. The final stage is making sure that your generative AI solution can scale efficiently and maintain high performance as demand grows.

Monitoring and logging to close the loop on AI operations

As you prepare to move your generative AI application into production, it’s critical to implement robust logging and observability to monitor system health, verify compliance, and quickly troubleshoot issues. Amazon Bedrock offers built-in observability capabilities that integrate seamlessly with AWS monitoring tools, enabling teams to track performance, understand usage patterns, and maintain operational control

Model invocation logging: You can enable detailed logging of model invocations, capturing input prompts and output responses. These logs can be streamed to Amazon CloudWatch or Amazon Simple Storage Service (Amazon S3) for real-time monitoring or long-term analysis. Logging is configurable through the AWS Management Console or the CloudWatchConfig API.
CloudWatch metrics: Amazon Bedrock provides rich operational metrics out-of-the-box, including:
- Invocation count
- Token usage (input/output)
- Response latency
- Error rates (for example, invalid input and model failures)

These capabilities are essential for running generative AI solutions at scale with confidence. By using CloudWatch, you gain visibility across the full AI pipeline from input prompts to model behavior; making it straightforward to maintain uptime, performance, and compliance as your application grows.

Finalizing and scaling your generative AI solution

You’re ready to deploy your generative AI application and need to scale it efficiently while providing reliable performance. Whether you’re handling unpredictable workloads, enhancing resilience, or needing consistent throughput, you must choose the right scaling approach. Amazon Bedrock offers three flexible scaling options that you can use to tailor your infrastructure to your workload needs:

On-demand: Start with the flexibility of on-demand scaling, where you pay only for what you use. This option is ideal for early-stage deployments or applications with variable or unpredictable traffic. It offers the following benefits:
- No commitments.
- Pay only for tokens processed (input/output).
- Great for dynamic or fluctuating workloads.
Cross-Region inference: When your traffic grows or becomes unpredictable, you can use cross-Region inference to handle bursts by distributing compute across multiple AWS Regions, enhancing availability without additional cost. It offers the following benefits:
- Up to two times larger burst capacity.
- Improved resilience and availability.
- No additional charges, you have the same pricing as your primary Region.
Provisioned Throughput: For large, consistent workloads, Provisioned Throughput maintains a fixed level of performance. This option is perfect when you need predictable throughput, particularly for custom models. It offers the following benefits:
- Consistent performance for high-demand applications.
- Required for custom models.
- Flexible commitment terms (1 month or 6 months).

Conclusion

Building generative AI solutions is a multifaceted process that requires careful consideration at every stage. Amazon Bedrock simplifies this journey by providing a unified service that supports each phase, from model selection and customization to deployment and compliance. Amazon Bedrock offers a comprehensive suite of features that you can use to streamline and enhance your generative AI development process. By using its unified tools and APIs, you can significantly reduce complexity, enabling accelerated development and smoother workflows. Collaboration becomes more efficient because team members can work seamlessly across different stages, fostering a more cohesive and productive environment. Additionally, Amazon Bedrock integrates robust security and privacy measures, helping to ensure that your solutions meet industry and organization requirements. Finally, you can use its scalable infrastructure to bring your generative AI solutions to production faster while minimizing overhead. Amazon Bedrock stands out as a one-stop solution that you can use to build sophisticated, secure, and scalable generative AI applications. Its extensive capabilities alleviate the need for multiple vendors and tools, streamlining your workflow and enhancing productivity.

Explore Amazon Bedrock and discover how you can use its features to support your needs at every stage of generative AI development. To learn more, see the Amazon Bedrock User Guide.

About the authors

Venkata Santosh Sajjan Alla is a Senior Solutions Architect at AWS Financial Services, driving AI-led transformation across North America’s FinTech sector. He partners with organizations to design and execute cloud and AI strategies that speed up innovation and deliver measurable business impact. His work has consistently translated into millions in value through enhanced efficiency and additional revenue streams. With deep expertise in AI/ML, Generative AI, and cloud-native architectures, Sajjan enables financial institutions to achieve scalable, data-driven outcomes. When not architecting the future of finance, he enjoys traveling and spending time with family. Connect with him on LinkedIn.

Axel Larsson is a Principal Solutions Architect at AWS based in the greater New York City area. He supports FinTech customers and is passionate about helping them transform their business through cloud and AI technology. Outside of work, he is an avid tinkerer and enjoys experimenting with home automation.