Amazon Strands Agents SDK: A technical deep dive into agent architectures and observability

The Amazon Strands Agents SDK is an open source framework for building AI agents that emphasizes a model-driven approach. Instead of hardcoding complex task flows, Strands uses the reasoning abilities of modern large language models (LLMs) to handle planning and tool usage autonomously. Developers can create an agent with a prompt (defining the agent’s role or behavior) and a list of tools, and the LLM-powered agent will figure out how to chain its reasoning and invoke tools as needed. This dramatically simplifies agent development compared to traditional workflow-based frameworks.

In this post, we first introduce the Strands Agents SDK and its core features. Then we explore how it integrates with AWS environments for secure, scalable deployments, and how it provides rich observability for production use. Finally, we discuss practical use cases, and present a step-by-step example to illustrate Strands in action.

What is the Strands Agents SDK?

The Strands Agents SDK is an open source framework designed to simplify the creation of robust LLM-powered AI agents. Rather than requiring developers to handcraft complex workflows, Strands embraces a model-driven approach centered around three key components: a language model, a system prompt, and a set of tools. This architecture empowers the LLM to perform the crucial reasoning, autonomously deciding the optimal actions and when to use tools based on the current context and task. This model-driven design allows agents to be flexible, intelligent, and autonomous, while minimizing the boilerplate code typically needed to support multi-step or multi-agent interactions. Its effectiveness is already proven – Strands is actively used in production by multiple AWS teams for their AI agents in production, including Kiro, Amazon Q, and AWS Glue.

Key capabilities of the Strands Agents SDK

The Strands Agents SDK offers the following key capabilities:

Lightweight, flexible agent loop – Strands implements a simple yet extensible agent loop that drives the interaction. The LLM behind the agent iteratively reads the conversation (and context), plans an action, possibly calls a tool, and then incorporates the tool’s result before deciding the next step, until it reaches a final answer. This loop is fully customizable when needed, but works out of the box for most use cases.
Tool use and integration – Tools are external functions or APIs the agent can call (for example, calculators, web search, or database queries). Strands makes it straightforward to define tools in Python with a @tool decorator and supply them to agents. During development, the SDK supports hot-reloading, so you can modify or add tools and have them picked up automatically without restarting the agent. This accelerates iteration and testing. The SDK also comes with an optional library of pre-built tools (strands-agents-tools) for common functionalities like arithmetic, web requests, and more. Strands supports both the Model Context Protocol (MCP) and A2A (Agent-to-Agent). MCP is an open standard that gives agents access to thousands of external tools hosted on model servers, greatly expanding their capabilities without custom coding. A2A allows agents to call each other as tools – enabling powerful multi-agent collaboration and specialization with minimal overhead.
Model-agnostic and multi-model support – Strands is not tied to a single LLM provider. It can work with models on Amazon Bedrock (for example, Anthropic’s Claude or other Amazon Bedrock models) by default, but also supports Anthropic’s API; open source models such as LlamaAPI, Ollama, OpenAI; and others through a pluggable provider interface. For example, you can switch the agent’s model from Anthropic’s Claude hosted on Amazon Bedrock to a local Meta Llama 3 or OpenAI GPT-4 by changing the model provider in the code. With this flexibility, you can choose the model that best fits your needs or swap models in different deployments.
Scalability from prototypes to production – The same Strands agent code can run locally for quick testing and then be deployed to AWS for production use. The SDK is already used internally at AWS for agent-based features in services like Amazon Q (developer assistant), AWS Glue, and VPC Reachability Analyzer. It supports running agents in various environments – including Amazon Elastic Compute Cloud (Amazon EC2), AWS Lambda, AWS Fargate, and Amazon Bedrock AgentCore – and isolating tool execution from the agent for security and reliability. Strands agents can run anywhere and integrate with cloud services, but because it’s open source, you can also run them on premises or in other clouds.
Advanced use cases – Although Strands excels with simple single-agent assistants, it also supports more complex agent systems. You can compose multi-agent applications where multiple agents collaborate or coordinate (for example, an agent that delegates subtasks to other specialist agents). The SDK supports patterns like agent hierarchies, agent networks, and even swarm-style cooperation (discussed more in the next section). It also allows building fully autonomous agents that loop on tasks without human input, enabling multi-step workflows where the agent’s chain-of-thought spans many tool calls or intermediate decisions. Additionally, features like streaming responses (token streaming) are supported for real-time agent interactions.
Open source and community contributions – The Strands Agents SDK is Apache-2.0 licensed and open to contributions. Several companies (Accenture, Anthropic, Meta, PwC, and others) have already contributed to its development. For example, Anthropic contributed integration for their API, and Meta added support for their Llama models. This community-driven approach means the tool and model ecosystem is growing beyond AWS. Developers are encouraged to join the project on GitHub to report issues, add new tools or model providers, and help expand the framework.

Strands provides a concise yet powerful way to build AI agents. With a few lines of Python, you define an agent’s role and its available tools, and the SDK handles the rest – from reasoning through a problem to invoking tools and producing answers. The next sections explore how Strands supports various agent architectures and how it makes these agents observable in production.

Agent-architectural patterns supported by Strands

Strands supports multiple agent architecture patterns, scaling from a single self-contained agent up to complex networks of cooperating agents. In this section, we explain the key patterns and how the Strands SDK enables them.

Single-agent pattern

The simplest scenario is a single AI agent endowed with an LLM and (optionally) some tools, which interacts with a user or performs a job without delegating to other agents. In Strands, a single-agent is represented by the Agent class – you initialize it with a model (or accept the default) and tools it can use. The agent runs an internal event loop to decide how to answer each query: it might directly respond using the model’s knowledge, or choose to invoke one of its tools, incorporate the result, potentially call more tools, and so on until it finishes. This loop continues until the agent produces a final answer.

In code, a single-agent usage is straightforward. In the following example code, we create a basic agent with a calculator tool and ask it a question:

from strands import Agent
from strands_tools import calculator 

# Create an agent that can use a calculator tool
agent = Agent(tools=[calculator])
result = agent("What is the square root of 1764?")
print(result)

In this example, the agent uses its LLM to interpret the question. It recognizes that it might need to calculate a square root, so it calls the provided calculator tool (a simple arithmetic function) to get the result, and then returns the answer. Strands handles the prompt formatting, calling the tool, and inserting the tool’s result back into the model’s context for the final answer. With a single agent pattern, reasoning and tool use happen within one agent process.

A single agent is suitable for many tasks, such as question-answering, data retrieval, simple assistants. It keeps the logic self-contained. However, as tasks grow in complexity, you might reach the limits of what one agent (even with tools) can effectively manage (for example, if multiple different expertise or concurrent actions are needed). That’s where the multi-agent patterns come in.

Multi-agent networks (swarm or peer-to-peer agents)

Strands supports agent networks where multiple agents operate and communicate to solve problems collaboratively. In such a network, there is no single orchestrator; instead, agents interact peer-to-peer or in an open topology. This pattern is sometimes referred to as a swarm of agents working together. Each agent in the network might have a specialized role or perspective, and they share information to converge on a solution.

In a swarm-style network, communication patterns can vary. One common approach is a mesh communication where agents can talk to other agents freely. This is useful for brainstorming agents or consensus-building, where agents exchange ideas. Other communication schemes include using a shared memory or blackboard (a common repository where agents post and read information), or message-passing channels between specific agent pairs.

Strands provides tools to implement these networks. You can create multiple agent instances (each with its own prompt, persona, and tools) and connect them together. For example, you might have a research agent, creative agent, and critic agent connected in a mesh. The research agent provides factual data, the creative agent proposes ideas, and the critic agent spots flaws; together they iterate towards an answer. The Strands SDK also includes a built-in agent_graph tool to help manage such networks programmatically (so you can define agents and connections, then send messages into the network).

Communication and coordination in an agent swarm can be designed with different philosophies:

Collaborative swarms – Agents actively build on each other’s contributions and aim for consensus. This might be ideal for creative problem solving or research, where combining perspectives yields the best result.
Competitive swarms – Agents work in parallel on the task (perhaps with different methods or hypotheses) and might even critique each other’s results. This could be useful in scenarios like multiple agents trying independent strategies to see which is best.
Hybrid approaches – A mix of cooperation on some subtasks and independent exploration on other.

Strands doesn’t force a particular style; you can implement the message exchange logic as needed (the SDK leaves the content of messages and timing up to the developer or even the agents themselves). The new agent_graph utility simplifies setting up networks by specifying a topology (for example, fully connected mesh) and then letting you broadcast or direct messages to agents. Each agent can run on a separate thread or process, providing scalability. Multi-agent networks excel in complex problem domains where different skills or viewpoints are needed concurrently.

Supervisor-agent model (orchestrator with tool agents)

Another pattern Strands supports is the supervisor-agent model, also known as the orchestrator and specialists architecture or agents as tools pattern. In this design, one agent acts as a primary orchestrator (supervisor) that interfaces with the user or high-level task, and it delegates subtasks to one or more specialist agents. Each specialist is effectively an agent wrapped as a callable tool that the orchestrator can invoke for specific needs.

The manager agent decides which specialist agent is required for a given query and forwards the request, then integrates the results back into a final answer. For example, you might design an orchestrator agent that, when asked a complex question, can call a Research Assistant agent for factual lookup, or a Math Assistant agent for calculations, or a Travel Planner agent for itinerary tasks, depending on the query. Each assistant is an LLM-powered agent with its own system prompt and tools specialized to its domain.

Strands makes it straightforward to implement this. You can create specialized agents and expose them as Python tools using the @tool decorator, as illustrated in the following code:

from strands import Agent, tool
from strands_tools import retrieve, http_request

# System prompt for a specialized research agent
RESEARCH_ASSISTANT_PROMPT = """
You are a specialized research assistant. Focus on providing factual, well-sourced information for research questions.
Always cite sources in your answers.
"""

@tool
def research_assistant(query: str) -> str:
    """Tool that uses a specialized agent to answer research queries."""
    # Create a specialized agent for research tasks
    research_agent = Agent(
        system_prompt=RESEARCH_ASSISTANT_PROMPT,
        tools=[retrieve, http_request] # this agent can use web retrieval tools
    )
    return research_agent(query) # delegate the query to the research agent

In this example, we defined research_assistant as a tool. The solution spins up an agent with a special prompt and a couple of tools for web research (like retrieve to fetch documents and http_request to call web APIs). It then queries that agent and returns the result. We could similarly define other specialist agent tools, such as math_assistant or trip_planner_assistant, each with their own prompt and domain-specific toolset.

Now we can create the orchestrator agent that uses these specialist agents as its tools:

# Orchestrator agent that can delegate to specialized assistants
orchestrator_agent = Agent(
    tools=[research_assistant, math_assistant, trip_planner_assistant]
)
# When a user question comes in, the orchestrator can decide which agent to invoke
response = orchestrator_agent(
    "What are the latest NASA findings on Mars, and can you calculate the travel time to Mars at 20km/s?"
)
print (response)

When the orchestrator_agent receives the complex question, it uses the LLM’s reasoning (guided by its prompt, which we could customize as a general coordinator) to determine how to answer. It might decide this question has two parts – a research part (latest NASA findings) and a calculation part (travel time) – so it might call the research_assistant tool for the first part and the math_assistant tool for the second part. Each tool a full agent that carries out its subtask (for example, the research assistant might use http_request to fetch data from a NASA API or knowledge base). The orchestrator then assembles the final answer. This hierarchy creates a clear delegation chain: the top-level agent offloads work to experts and then merges their outputs.

The benefits of the supervisor agent architecture include separation of concerns (each agent specializes, making the system straightforward to maintain) and modularity (you can add or remove specialist agents without rewriting the whole agent). It also mirrors human organizational structures – a manager coordinating specialists – which can be an intuitive way to scale up agent complexity. In Strands, this pattern is fully supported by treating agents as just another kind of tool. The orchestrator agent’s built-in logic (through the LLM) reads tool docstrings and decides when to use which specialist tool, especially if you provide it guidance like “Use the Research Assistant for any questions about scientific facts or current data” in its system prompt.

Hierarchical agent architectures

The hierarchical pattern is an extension of the supervisor-agent idea to multiple levels of delegation. Instead of a single orchestrator and a list of specialists, you can have layers of agents forming a hierarchy or tree structure. For example, at the top might be an executive agent handling the broad objective; it delegates high-level tasks to a few manager agents, each of whom further breaks down tasks among their worker agents. This is useful when problems are very complex or naturally tree-structured (project management, multi-stage workflows).

Strands’s agent graph concept generalizes this. In an agent graph, nodes are agents and edges define communication or supervisory links. A hierarchical topology is one of the supported graph patterns: a tree where each parent node directs its children. Information flows down the tree as tasks and up the tree as results or reports. The star topology (one central coordinator with many direct specialists) is actually a one-level hierarchy (the supervisor-agent model), whereas a true hierarchy might have multi-level oversight.

Using the Strands SDK, you can manually create such structures by assembling agents and coordinating their calls, but the simpler route is to use the graph tool or similar utilities. As shown earlier, you can programmatically define nodes and edges of a graph (specifying each agent’s role and prompt) and the tool will instantiate that network. Then you can send messages into the graph (for instance, give a top-level instruction to the executive agent) and the system will propagate tasks downward and solutions upward according to the defined edges. Each agent in the hierarchy can maintain its own state and focus – for example, a mid-level manager agent might keep track of progress of its sub-tasks independently.

Hierarchical agent architectures are ideal when you need layered processing or decision-making. For instance, consider an autonomous software engineering agent: an executive agent decides overall what feature to implement and delegates to a design agent and a coding agent, and the coding agent might further delegate tasks to a code generation agent and a testing agent. Each level adds oversight and can catch errors or refine requirements from the level below. This reduces the cognitive load on each individual agent. Strands facilitates this by providing the mechanisms to link agents in tree or graph structures and handle message passing along those links. The built-in support for maintaining sessions and state for each agent and controlling information flow means you can fine-tune what each agent knows, providing clarity and preventing agents from interfering with each other’s context.

Finally, Strands doesn’t lock you into a single pattern – you can combine them. For example, you could have a swarm of peer agents at one level of the hierarchy and a supervisor above them, or an orchestrator that consults a small swarm of brainstorming agents as one of its tools. The SDK’s flexible design (with agents as first-class objects and tools) lets you mix patterns to suit the problem.

Observability in Strands agents

In production, observability is crucial to understand and trust what your agents are doing. The Strands SDK was built with observability in mind, providing built-in instrumentation hooks, telemetry collection, and support for logging and metrics out of the box.

Instrumentation and traces

Strands can record agent trajectories – the sequence of steps (for example, model calls, tool calls) an agent takes for each request. It uses OpenTelemetry (OTEL) standards to emit this data, meaning you can plug it into other OTEL-compatible monitoring backends (such as AWS X-Ray, Amazon CloudWatch, and Jaeger) to visualize and analyze agent behavior.

Each run of an agent can produce a trace, which consists of spans for each significant action. For example, when the agent calls the LLM model, that’s a span, which can include metadata like the prompt, model parameters (for example, temperature or max tokens), and token usage counts. When the agent invokes a tool, that’s another span, recording which tool was called and the input and output. By stitching these spans together, you get an end-to-end timeline of how the agent arrived at its answer.

This distributed tracing extends across components – for instance, if your agent is calling a remote microservice as a tool, the trace can propagate into that service (provided it also uses OTEL), giving you a cross-system view of a request. Such insight is invaluable for debugging agent reasoning, optimizing performance, and identifying failure points in complex multi-agent workflows.

Metrics tracking

Strands also tracks key metrics about agent operations. Metrics are aggregate measurements that help quantify performance and usage. You can capture metrics such as the number of times each tool was invoked (and perhaps success and failure rates of those calls), runtime of tool calls, how many turns or agent loops run per interaction, latency of model responses (time to first byte and time to complete), and token consumption (prompt tokens vs. completion tokens) per request. Strands can also surface system metrics (CPU, memory usage if relevant) and custom business metrics like how often users are happy with the agent’s answer (if feedback is provided).

By monitoring these metrics, developers and operations teams can make sure the agent is behaving efficiently and reliably. For example, a sudden spike in tool error rates or a jump in token usage per query might signal a regression that needs attention. Metrics can feed into dashboards or alerting systems to maintain the operational health of your AI agent service.

Logging

The SDK emits logs for important events – for example, the full prompt being sent to the model, the model’s raw response, decisions it made about which tool to use, and errors encountered. These logs are timestamped and can be configured at various verbosity levels (debug, info, error) similar to other applications. Logs are useful for deep debugging or audit trails, and because they might include sensitive or verbose information, Strands allows structuring or redacting logs as needed. In production, you might integrate Strands logs with standard logging infrastructure (CloudWatch Logs, ELK stack) for centralized analysis.

Observability

Strands encourages end-to-end observability. The SDK documentation outlines a framework where agent developers, data engineers, and product owners all consume telemetry. For example, developers use traces to diagnose why an agent made a certain decision (such as visualizing a trace to see the chain of tool calls leading to a wrong answer). Data engineering teams might aggregate telemetry in a data warehouse to analyze usage patterns or costs over time. AI researchers could use logs and traces to identify failure modes and fine-tune prompts or models (treating trace data as feedback to improve the agent).

Strands provides the raw capabilities (instrumentation points and OTEL integration) to enable this, but it’s up to the implementing team to set up collectors and dashboards. Best practices include standardizing on open formats (like OTEL) for interoperability, using collectors to route telemetry to multiple sinks (for operations or business intelligence), and filtering or sampling data to manage volume and privacy.

Observability is not an afterthought in Strands – it’s built into the agent loop so that when your agent is running in production, you can monitor its reasoning and actions. This focus on instrumentation sets Strands apart, especially compared to earlier agent frameworks where developers often had to add their own logging or tracing.

Enterprise readiness and deployment best practices

The Strands Agents SDK was designed with enterprise production use in mind, providing features and guidance to help agents run reliably, at scale, and securely in business-critical environments. This section discusses how Strands addresses key enterprise requirements that are essential for adopting agentic workflows in large organizations.

Scalability and performance

Strands can scale from quick prototypes to large-scale production deployments seamlessly. The same agent code running on a developer’s laptop can be deployed to the cloud without changes. Strands is built in a lightweight manner (a Python framework orchestrating API calls to LLMs and tools), so it introduces minimal overhead. Agents can run concurrently – each agent in a multi-agent configuration can execute on its own thread or process to utilize multiple cores or machines. This means an orchestrator agent can dispatch work to specialist agents running in parallel processes, achieving concurrency and faster throughput on multi-CPU systems. When integrated into distributed environments (like microservices or serverless architectures), you can run multiple agent instances behind a load balancer to handle high request volumes. The framework’s model-agnostic nature also means you can choose more powerful model endpoints for heavier workloads or distribute calls across different model providers to avoid bottlenecks.

In practice, performance is usually dominated by the LLM’s response time and external API calls the agent makes; Strands makes sure it pipelines these operations efficiently (for example, by streaming responses when available, to start delivering output as soon as possible). There is no strict upper limit in Strands on the number of tools or steps an agent can handle, aside from compute resources and model limitations – making it suitable for complex, long-running tasks if needed.

Security and data protection

Enterprise applications demand strong security measures, and Strands provides mechanisms and recommendations to build secure agents. Because agents can execute code or call external APIs through tools, it’s critical to manage what tools are available to an agent. Strands supports fine-grained control over tool access – you decide which tools to include for each agent.

Sensitive data handling is also emphasized: AWS recommends implementing end-to-end encryption for sensitive data that agents handle, both at rest and in transit. For example, if an agent stores conversation history or intermediate results, those could be encrypted or kept in memory only. It’s also important to sanitize inputs and outputs; using Amazon Bedrock Guardrails or custom validation to filter the agent’s responses can prevent the agent from returning confidential information or disallowed content. Strands’s logging can be configured to omit or redact sensitive details, so audit logs don’t become a source of leakage.

Authentication and authorization should be layered in front of agent endpoints: when deploying using Amazon API Gateway or Lambda, you can use AWS Identity and Access Management (IAM) roles, Amazon Cognito, or OAuth tokens to make sure only authorized systems or users can invoke the agent. Within the agent, you might also enforce role-based logic – for example, certain tools only activate if the requesting user has appropriate permissions. For multi-agent systems, isolating each agent’s context (which Strands supports using sessions and separate prompts) can enforce the principle of least privilege (each sub-agent only knows what it needs to).

Threat modeling for AI agents is a new but important practice; AWS has published guidance like the MAESTRO framework for agentic AI threat modeling. This encourages teams to anticipate how an agent might be misused or attacked (for instance, prompt injection by malicious inputs, or attempts to make an agent reveal secure data) and to implement mitigations such as input validation, output filtering, and robust exception handling.

Strands gives you the building blocks for powerful autonomous agents, but enterprises should wrap those agents with the same rigor applied to any application – encryption, monitoring, access control, and testing against adversarial inputs – to run them safely and responsibly.

AWS service integration

As an AWS originated project, the Strands Agents SDK integrates naturally with the AWS ecosystem. It can work with Amazon Bedrock out of the box (for access to foundation models), which simplifies using high-quality, scalable models with enterprise security (data is not left unencrypted or sent to external third-parties when using Amazon Bedrock).

Beyond model hosting, Strands’s use of OTEL means you can pipe trace data into AWS X-Ray for distributed tracing visuals and send metrics to CloudWatch for real-time monitoring. For example, you could set up CloudWatch alarms on metrics like tool error rate or latency per agent call to alert operations teams of anomalies.

Strands can also call AWS services as tools – either using APIs or specialized tools. Community-contributed tools for AWS are already available (for instance, tools that can run AWS SDK commands or query AWS resources). In one scenario, an agent could use an AWS SDK tool to automatically remediate an AWS infrastructure issue (making it a DevOps assistant), or use an Amazon DynamoDB tool to store and retrieve information as part of its workflow. In fact, one of the Strands example use cases demonstrates an agent storing weather data into DynamoDB, showing how seamlessly an agent can incorporate AWS data services. Because Strands is Python, it also works smoothly with Lambda for serverless deployments (more on that later in this post) and with AWS Step Functions if you need to embed an agent call as one step in a larger orchestrated business workflow. Enterprises can use existing AWS identity and networking features like virtual private cloud (VPC), IAM, or AWS Key Management Service (AWS KMS) encryption to further secure and isolate their agents when running in the cloud.

Conversely, Strands remains cloud-agnostic enough that if needed, you can run it on premises or in other cloud environments – for example, using local models through Ollama or connecting to third-party APIs – giving flexibility for hybrid deployments.

Deployment best practices

There are several proven patterns for deploying Strands agents in production, and the SDK provides a deployment toolkit with reference implementations. Depending on use case, you might choose one of the following deployment methods:

Serverless (Lambda) – This is ideal for short-lived agent tasks or event-driven invocations. You can deploy an agent as a Lambda function, possibly using the Lambda function URL feature or API Gateway to trigger it using HTTPS. This offers scalability (Lambda will spawn concurrent executions as needed) and minimal operations overhead. It’s best for agents that complete within the Lambda runtime limit and don’t require long-lived state (though you can use external storage for state if needed). AWS provides examples for deploying Strands this way, which also include using streaming using Lambda for real-time responses. For interactive or streaming agents, you might prefer using WebSockets or an asynchronous pattern, because Lambda invocations are stateless.
Containers (AWS Fargate and Amazon ECS or Amazon EKS) – For long-running or stateful agent services, containerizing the agent logic is a common approach. You can host the agent loop in a container (for example, a microservice that listens for requests, invokes the agent, and returns results). Fargate (serverless containers) and Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS) are both supported in reference architectures. This approach is well-suited to streaming interactions (where an agent might keep a connection open to stream tokens) and high-concurrency scenarios. With containers, you can allocate more memory/CPU for larger models or use GPU-backed instances if running heavy local models. You can also horizontally scale the number of agent service instances and integrate with service meshes or mesh-like patterns if multiple agent services need to communicate.
Hybrid return-of-control pattern – In some enterprise scenarios, part of the tool execution is done on the client side or in a separate environment for security. Strands supports an architecture where the agent is hosted in one environment (for example, in AWS) but some tools are executed in a different environment (like on a user’s device or in a secured on-premises service). The return-of-control pattern lets the agent delegate certain tool calls back to the client application. For instance, a client application might register a local tool (for example, for accessing a local database or hardware device) with the agent. When the agent decides to use that tool, it returns a signal for the client to execute it and await the result. Meanwhile, other tools can be hosted in the cloud. This pattern gives maximum flexibility and can address data governance concerns (keeping some data processing local) while still benefiting from the agent’s reasoning capabilities in the cloud.
Monolithic vs. microservices – Strands agents can be deployed as a monolith (the agent loop and all tools in one process) or split into microservices (each tool as its own service that the agent calls through an API). Monolithic deployments are simpler and have less latency (function calls in memory), but splitting tools into separate services can improve fault isolation, allow independent scaling of expensive tools, and enable polyglot implementations (tools in other languages). A best practice is to start monolithic for simplicity, then refactor out critical tools as needed (for example, a data-intensive tool might be better as a separate service that can be scaled on its own). Strands’s observability will still capture cross-service calls if OTEL tracing is propagated, giving you a full picture of the distributed workflow.
Amazon Bedrock AgentCore – For production-grade deployment of Strands agents with built-in support for identity, memory, observability, and tool integration, AWS offers Amazon Bedrock AgentCore. This is a secure, serverless runtime designed specifically for running AI agents in real-world applications. You can wrap a Strands agent using the BedrockAgentCoreApp wrapper and deploy it through the AWS Command Line Interface (AWS CLI) or container workflows. Amazon Bedrock AgentCore supports long-running tasks (up to 8 hours), asynchronous tool execution, and tool interoperability using MCP, A2A, or API Gateway based services. It also includes secure identity features such as OAuth, Amazon Cognito, and IAM, as well as native observability with CloudWatch and OTEL. This approach is ideal for teams looking for a scalable, secure, and fully managed agent infrastructure that integrates seamlessly with existing AWS services. Amazon Bedrock AgentCore is currently available in public preview as of July 2025.

When operating agents in production, it’s also recommended to implement robust error handling and monitoring. For example, you might wrap the agent invocation in a retry loop or fallback logic – if the agent fails or returns an incomplete result, it will handle the exception and respond gracefully (perhaps return a default answer or a message that it will get back later). Define timeouts for tool calls and possibly limit the number of reasoning loops to avoid “runaway” agents. Use CloudWatch or a similar monitoring system to collect metrics like latency, error counts, token usage, and cost per request, and set up alerts for anomalies. In an enterprise, operational excellence is as important as the agent’s accuracy – Strands gives you the hooks (telemetry, logs, config options) to achieve this, but it’s up to your DevOps team to wire them into your existing operations toolkit.

By following these enterprise deployment best practices – choosing the right architecture, enforcing security at multiple layers, and using the infrastructure of AWS – companies can confidently deploy Strands agents that meet their scalability, security, and compliance requirements while delivering advanced AI capabilities to end-users.

Practical usage example: Autonomous workflow in action

To illustrate how you might use the Strands SDK for an autonomous workflow, let’s walk through a scenario. Suppose we want to build an autonomous research assistant that can perform a multi-step task: the user asks a broad question, and the agent needs to gather information from the web, perform some analysis, and provide a well-formulated answer with citations. This involves decision-making (how to break down the task), tool usage (web search, reading documents, performing calculations), and synthesis of results.

Define tools

We need some tools for web research. Strands includes a retrieve tool (for searching and retrieving documents) and an http_request tool (for calling APIs or fetching URLs). If additional processing is needed, we could also include a Python execution tool or others.

Create the agent with a suitable prompt

We give our agent a system prompt that instructs it to be a diligent research assistant. For example: “You have access to web search and browsing. Always find factual information and cite sources in your answer. If calculations are needed, do them step by step.” This prompt sets the context so the LLM knows how to behave (defining the role and guidelines for the agent).

from strands_tools import calculator

# Orchestrator agent that can both research and calculate
smart_agent = Agent(
    system_prompt="You are an AI research assistant. You answer questions with facts and citations. You have tools for web research and math.",
    tools=[research_assistant, calculator]
)
query = "What are the latest NASA findings on Mars, and how long would it take a spacecraft to travel from Earth to Mars at 20 km/s?"
answer = smart_agent(query)
print(answer)

Agent reasoning and tool use

When a query comes in, the agent’s LLM will parse the question and decide on a plan. Thanks to Strands’s model-driven loop, the agent can internally reason something like: “The question asks for the latest NASA findings on Mars and a travel time calculation. Plan: (1) use web search tool to find recent NASA announcements about Mars, (2) summarize findings, (3) use calculator (or internal math) to compute travel time, (4) compose final answer.”

The agent will execute this plan step by step: call retrieve with a query like “latest NASA Mars findings,” get results (perhaps through http_request to read an article), extract key information, do the calculation for travel time (maybe using the calculator tool if available, or basic arithmetic), and finally synthesize everything into a coherent answer, citing the article source for the Mars information.

This happens in one agent() call, transparently. The user sees the final answer, but Strands orchestrated multiple tool calls and reasoning iterations to get there. The observability features described earlier make it possible to inspect this process after the fact (for example, we can see in the trace that the agent made two tool calls: one to retrieve for NASA info and one to calculator for computing travel time).

Compose the final answer

The agent combines the results of its tools and formulates the answer to return to the user. For instance:

## Latest NASA Findings on Mars (2023-2025)

### Atmospheric Discoveries
1. **First Visible-Light Auroras** (March 2024)
- NASA observed visible-light auroras on Mars after a solar flare and coronal mass ejection
- This event occurred near the peak of the current solar cycle

2. **MAVEN's "Sputtering" Discovery** (2024)
- After a decade of searching, NASA's MAVEN mission made the first direct observation of atmospheric "sputtering"
- This process helps explain how Mars lost its atmosphere over time

### Geological Discoveries
1. **Arsia Mons Volcano Observation** (2025)
- NASA's 2001 Mars Odyssey orbiter captured new images of Arsia Mons, one of Mars' largest volcanoes
- The images show the massive volcano peeking through morning cloud cover

2. **Perseverance Rover's New Explorations** (2025)
- Perseverance began exploring "Krokodillen," a region potentially containing some of Mars' oldest rocks
- The rover continues collecting samples for the future Mars Sample Return mission

3. **Dust Devil Captured in Perseverance Selfie** (May 2024)
- A Martian dust devil was captured in Perseverance's fifth selfie since landing

### Other Notable Findings
- Curiosity rover was photographed in motion by the Mars Reconnaissance Orbiter
- NASA's Europa Clipper spacecraft captured infrared images of Mars during a flyby
- Ongoing updates to the Mars Relay Network for communication between rovers and orbiters

## Earth to Mars Travel Time at 20 km/s
The distance between Earth and Mars varies significantly due to their elliptical orbits:
- **Minimum distance**: 54.6 million km (when planets are at their closest)
- **Maximum distance**: 401 million km (when planets are at their furthest)

At a constant speed of 20 km/s:
- **Minimum travel time**:
- 2,730,000 seconds
- 758.3 hours
- **~31.6 days**

- **Maximum travel time**:
- 20,050,000 seconds
- 5,569.4 hours
- **~232.1 days** (about 7.7 months)

This significant variation in travel time (from approximately 1 month to 7.7 months) demonstrates why launch windows to Mars are carefully planned to coincide with favorable orbital alignments that minimize travel time, fuel consumption, and mission duration.

This answer illustrates how the agent retrieved factual data, specifically the minimum and maximum distances between Earth and Mars, and used that information to calculate travel time at a constant speed of 20 km/s. It expressed the results in multiple formats – seconds, hours, and days – offering a realistic travel range from about 1 month to 7.7 months depending on planetary positions – all without the user needing to prompt those steps separately.

By using Strands for this workflow, we didn’t not have to script the sequence “first do search, then do calculation” ourselves; the agent decided it intelligently. If the question had been different, the agent might have taken a different approach or called different tools, all based on its autonomous reasoning. By adjusting the provided tools and the system prompt, we can guide the agent’s behavior for various autonomous workflows (from writing code with a code-execution tool to analyzing datasets with a data analysis tool).

Error handling and reflection

In practice, autonomous agents might make mistakes or need to recover (perhaps the first web search wasn’t useful and it needs to try a different query). Strands agents have the ability to reflect and retry within the loop – for instance, if a tool’s result is unexpected or irrelevant, the agent can reformulate its approach. We can also program explicit guardrails or use the observability signals to intervene if something goes wrong (for example, if no useful answer after X loops, stop and respond with an apology or ask for clarification). The SDK provides callback hooks and the ability to integrate such logic, but by default, the LLM’s own iterative reasoning often suffices for moderate tasks.

Through this example, we see how Strands enables multi-step decision making in a single unified agent. It empowers the agent to act autonomously – deciding which steps to take and in what order – using the powerful reasoning of the LLM, all giving developers simple control points (which tools to allow, what the agent’s role and prompt is, and the ability to analyze logs and traces afterward).

Strands vs. other agent frameworks

With the rapid rise of AI agents, a number of frameworks have emerged. LangChain is one of the most popular libraries for building LLM-driven applications and agents. In this section, we provide an overview of similarities and differences, and when to consider each.

Core concept and philosophy

Both Strands and LangChain enable the pattern of using LLMs to drive actions (the ReAct paradigm of reasoning and tool use). They let you define tools and functions and have the model call them to solve tasks. The big difference is in developer experience and emphasis. Strands preaches minimal orchestration: you supply the prompt and tools and trust the model to figure out the sequence. It consciously avoids requiring developers to write complex workflow code or state machines around the agent.

LangChain started with a focus on giving developers building blocks to construct chains or sequences of LLM calls and tools. In early LangChain, you might manually stitch together a series of steps or use a predefined agent template. Over time, LangChain added agentic behavior (like its AgentExecutor with an LLM planning actions), but it still offers more low-level control if you want it. LangChain’s latest versions introduced explicit workflow orchestration tools like LangGraph for DAG-based flows and a MultiAgent orchestration engine. This means LangChain can support very fine-grained, developer-defined workflows when needed (you can script each sub-agent and how they connect). Strands chooses to abstract that away by default – the workflow emerges from the model’s decisions, not the developer’s hardcoded logic.

In short: Strands is “LLM-first” (model as planner) whereas LangChain is “developer-first” (assemble your desired chain, though it can also delegate planning to the model if you use its agent abstractions).

Tool ecosystem and integration

Both frameworks recognize the importance of tools (also called skills or functions). LangChain provides a wide array of integrations – not only generic tools like math or web search, but connectors to databases, vector stores, and APIs, often through community-contributed wrappers.

Strands’s approach to tools embraces standards like MCP (model-provided tools). By supporting MCP, Strands can access a large library of tools (for example, QA over documents, coding helpers, and cloud resource manipulators) without each tool needing a custom integration script in your code. LangChain doesn’t natively support MCP, but it has its own catalog of tools, and often you integrate by writing a Python function and adding it as a tool (similar to Strands’s @tool, which was likely inspired by patterns popularized by LangChain). LangChain has a wider array of built-in connectors (especially for things like data stores or model providers) given its maturity, whereas Strands is catching up quickly by using open protocols and community contributions (Anthropic and Meta have already added compatibility in Strands for their systems).

Strands can naturally integrate well with AWS services (for example, Amazon Bedrock models), whereas LangChain has historically been used a lot with OpenAI’s API, custom local models, and various third-party services. Both can be adapted to either use case, but the simplest mappings tend to be Amazon Bedrock with Strands and OpenAI with LangChain, even though Strands also works fine with OpenAI.

Multi-agent and structuring of agents

Strands comes with first-class support for multi-agent systems (you can make swarms, graphs, and hierarchical setups with built-ins like the agent_graph tool). LangChain has also moved into this space, providing things like MultiAgentManager/Executor and even experimental ecosystems like Microsoft’s AutoGen integration, where multiple agents (like a user agent and a system agent) converse. However, LangChain’s multi-agent patterns are not as unified under one concept; you often find examples or custom implementations (for instance, agents that critique each other or work in a loop).

In 2025, LangChain introduced LangGraph and other features to let developers explicitly design multi-agent workflows with flows, shared memory, and parallelism. This gives a lot of power to the developer to orchestrate agents in LangChain, but requires you to plan those interactions. Strands provides more out-of-the-box constructs (like a ready-made way to turn agents into tools or create a mesh network) for common patterns, leaning on sensible defaults.

AWS also has a separate library called Agent Squad, which focuses on multi-agent orchestration (routing to specialist agents). AWS suggests using Agent Squad when you need many specialized sub-agents with strict context isolation, and using Strands when you want a single-agent-with-tools approach that leans on the LLM’s reasoning. That indicates Strands often works best with a single agent (or a contained hierarchy of agents) using tools, whereas if you wanted a large ensemble of disparate agents with their own models or knowledge, you might use a different orchestrator and possibly incorporate Strands agents within it. For most cases though, Strands is fully capable of multi-agent coordination.

Memory and context handling

Both frameworks allow for maintaining state between agent turns (conversation history, or more generally an agent’s memory). LangChain has numerous utilities for short-term memory (chat message history) and long-term memory (vector store backed memory). Strands provides sessions and state management too – you can choose where to store conversation history or other context (in-memory, persistent store) and how to truncate or retrieve it. The principles are similar, but LangChain has more pre-built variants (like summarizing memory or knowledge graphs as memory), reflecting its broader usage in conversational AI.

For a solution architect, if memory beyond a simple history is needed (like integrating a vector database for knowledge), LangChain offers pluggable classes for that; with Strands, you might implement a custom tool to query a vector database (or use MCP if a tool exists for it). Both can achieve the goal, but the approach differs (LangChain would treat it as a retriever in a chain, Strands would treat it as a tool the agent can call).

Observability and evaluation

Strands focuses on production observability with its OTEL integration and guidance on metrics. It treats telemetry as a first-class concern for live agents. LangChain, although it provides some logging and callbacks (and there are third-party solutions like Langfuse to instrument LangChain agents), doesn’t have a built-in distributed tracing or metrics system at the level Strands does. In practice, teams using LangChain have often needed to add their own logging or use wrappers to record agent steps. With Strands, much of that comes included or with a simple configuration. This makes Strands appealing for enterprise and production scenarios where monitoring is a must.

On the evaluation side, both Strands and LangChain acknowledge the need to evaluate agent outputs. LangChain has an evaluation module (for grading responses), and Strands offers evaluation techniques for agents (such as collecting traces for analysis or using feedback loops). The telemetry advantage leans in favor of Strands due to its built-in OTEL support.

Use case suitability

Choosing between Strands and LangChain will depend on the project’s priorities. If you want a quick, boilerplate solution to get an agent working – especially if you plan to use AWS infrastructure or Amazon Bedrock models – Strands is a fantastic option. It offers a quick path from “prompt plus tools” to a working agent, and it excels when you are comfortable letting a powerful model orchestrate the process (which, given today’s LLMs like GPT-4 or Anthropic’s Claude 2, is often a reasonable choice). Strands is also a strong choice if observability and reliability in production are top concerns; it was designed by AWS with production in mind (the fact that internal AWS teams replaced months of custom agent tinkering with Strands speaks to that focus).

On the other hand, if your use case requires extensive custom control over each step, or involves a lot of integrations with external systems that LangChain already has modules for (for example, you need to interface with 10 different databases and proprietary APIs, and you want a library that has samples for all), LangChain’s ecosystem might get you there faster. LangChain also has a larger community at the moment, so finding examples and support for niche scenarios could be faster. Additionally, for academic or very experimental multi-agent research, LangChain’s flexibility to craft novel agent loops or logic might be useful. However, Strands is quickly evolving, and being open source, it might soon incorporate many of the needed integrations (and you can always use Strands in conjunction with other libraries – for example, call a LangChain chain as a Strands tool).

Performance

There isn’t a clear publicly documented performance benchmark between Strands and LangChain. Both primarily spend time waiting on LLM API calls or running the model, so raw performance differences likely come down to how efficiently they manage prompts and tools. Strands being lightweight might have less overhead in the agent loop, whereas LangChain’s flexibility can sometimes introduce extra layers (for instance, parsing outputs with regex or validators). In practice, both are fast enough and any latency is dominated by the model inference and any external API calls.

Conclusion

In summary, Strands and LangChain have overlapping goals but different design centers. Strands, coming from AWS, emphasizes using the power of state-of-the-art models with minimal fuss and making sure you can deploy and monitor these agents reliably. LangChain offers a broader toolkit for constructing AI reasoning pipelines, which can be molded into agent behaviors but also requires more decision-making from the developer on how the workflow should look. Many solution architects might even use them together or for different layers of a system (you don’t have to use one exclusively). Frameworks like Strands are simplifying agent development – as LLMs improve, developers can focus more on what they want the agent to achieve and less on how to micromanage the agent’s steps. As the ecosystem evolves, we can expect both Strands and LangChain (and others) to learn from each other and contribute to making AI agents more capable, robust, and simpler to build than ever before.For more information about the Strands Agents SDK, refer to the following:

About the authors

Jin Tan Ruan is a Senior Generative AI Developer on the AWS Industries Prototyping and Customer Engineering (PACE) team. He specializes in building multi-agent and multimodal applications using foundation models, with a focus on real-world implementations of AGI-like capabilities. Jin brings a strong background in software development and holds nine AWS certifications, including the Machine Learning Specialty. At AWS, Jin designs and builds advanced agentic systems that combine reasoning, planning, memory, and tool use – using LLMs for autonomous decision-making and task orchestration. He is also deeply involved in fine-tuning and prompt engineering to optimize foundation models for enterprise-scale, domain-specific performance. Jin holds a Master’s in Computer Science and Software Engineering from Syracuse University. Outside of work, he enjoys exploring new places and diving into the world of horror films. Connect with Jin on LinkedIn.