AI agents have reached a critical inflection point where their ability to generate sophisticated code exceeds the capacity to execute it safely in production environments. Organizations deploying agentic AI face a fundamental dilemma: although large language models (LLMs) can produce complex code scripts, mathematical analyses, and data visualizations, executing this AI-generated code introduces significant security vulnerabilities and operational complexity.

In this post, we introduce the Amazon Bedrock AgentCore Code Interpreter, a fully managed service that enables AI agents to securely execute code in isolated sandbox environments. We discuss how the AgentCore Code Interpreter helps solve challenges around security, scalability, and infrastructure management when deploying AI agents that need computational capabilities. We walk through the service’s key features, demonstrate how it works with practical examples, and show you how to get started with building your own agents using popular frameworks like Strands, LangChain, and LangGraph.

Security and scalability challenges with AI-generated code

Consider an example where an AI agent needs perform analysis on multi-year sales projections data for a product, to understand anomalies, trends, and seasonality. The analysis should be grounded in logic, repeatable, handle data securely, and scalable over large data and multiple iterations, if needed. Although LLMs excel at understanding and explaining concepts, they lack the ability to directly manipulate data or perform consistent mathematical operations at scale. LLMs alone are often inadequate for complex data analysis tasks like these, due to their inherent limitations in processing large datasets, performing precise calculations, and generating visualizations. This is where code interpretation and execution tools become essential, providing the capability to execute precise calculations, handle large datasets efficiently, and create reproducible analyses through programming languages and specialized libraries. Furthermore, implementing code interpretation capabilities comes with significant considerations. Organizations must maintain secure sandbox environments to help prevent malicious code execution, manage resource allocation, and maintain data privacy. The infrastructure requires regular updates, robust monitoring, and careful scaling strategies to handle increasing demand.

Traditional approaches to code execution in AI systems suffer from several limitations:

These barriers have prevented organizations from fully using the computational capabilities of AI agents, limiting their applications to simple, deterministic tasks rather than the complex, code-dependent workflows that could maximize business value.

Introducing the Amazon Bedrock AgentCore Code Interpreter

With the AgentCore Core Interpreter, AI agents can write and execute code securely in sandbox environments, enhancing their accuracy and expanding their ability to solve complex end-to-end tasks. This purpose-built service minimizes the security, scalability, and integration challenges that have hindered AI agent deployment by providing a fully managed, enterprise-grade code execution system specifically designed for agentic AI workloads. The AgentCore Code Interpreter is designed and built from the ground up for AI-generated code, with built-in safeguards, dynamic resource allocation, and seamless integration with popular AI frameworks. It offers advanced configuration support and seamless integration with popular frameworks, so developers can build powerful agents for complex workflows and data analysis while meeting enterprise security requirements.

Transforming AI agent capabilities

The AgentCore Code Interpreter powers advanced use cases by addressing several critical enterprise requirements:

Purpose-built for AI agent code execution

The AgentCore Code Interpreter represents a shift in how AI agents interact with computational resources. This operation processes the agent generated code, runs it in a secure environment, and returns the execution results, including output, errors, and generated visualizations. The service operates as a secure, isolated execution environment where AI agents can run code (Python, JavaScript, and TypeScript), perform complex data analysis, generate visualizations, and execute mathematical computations without compromising system security. Each execution occurs within a dedicated sandbox environment that provides complete isolation from other workloads and the broader AWS infrastructure. What distinguishes the AgentCore Code Interpreter from traditional execution environments is its optimization for AI-generated workloads. The service handles the unpredictable nature of AI-generated code through intelligent resource management, automatic error handling, and built-in security safeguards specifically designed for untrusted code execution.

Key features and capabilities of AgentCore Code Interpreter include:

How the AgentCore Code Interpreter works

To understand the functionality of the AgentCore Code Interpreter, let’s examine the orchestrated flow of a typical data analysis request from an AI agent, as illustrated in the following diagram.

The workflow consists of the following key components:

Practical real-world applications and use cases

The AgentCore Code Interpreter can be applied to real-world business problems that are difficult to solve with LLMs alone.

Use case 1: Automated financial analysis

An agent can be tasked with performing on-demand analysis of financial data. For this example, a user provides a CSV file of billing data within the following prompt and asks for analysis and visualization: “Using the billing data provided below, create a bar graph that shows the total spend by product category… After generating the graph, provide a brief interpretation of the results…”The agent takes the following actions:

  1. The agent receives the prompt and the data file containing the raw data.
  2. It invokes the AgentCore Code Interpreter, generating Python code with the pandas library to parse the data into a DataFrame. The agent then generates another code block to group the data by category and sum the costs, and asks the AgentCore Code Interpreter to execute it.
  3. The agent uses matplotlib to generate a bar chart and the AgentCore Code Interpreter saves it as an image file.
  4. The agent returns both a textual summary of the findings and the generated PNG image of the graph.

Use case 2: Interactive data science assistant

The AgentCore Code Interpreter’s stateful session supports a conversational and iterative workflow for data analysis. For this example, a data scientist uses an agent for exploratory data analysis. The workflow is as follows:

  1. The user provides a prompt: “Load dataset.csv and provide descriptive statistics.”
  2. The agent generates and executes pandas.read_csv('dataset.csv') followed by .describe()and returns the statistics table.
  3. The user prompts, “Plot a scatter plot of column A versus column B.”
  4. The agent, using the dataset already loaded in its session, generates code with matplotlib.pyplot.scatter() and returns the plot.
  5. The user prompts, “Run a simple linear regression and provide the R^2 value.”
  6. The agent generates code using the scikit-learn library to fit a model and calculate the R^2 metric.

This demonstrates iterative code execution capabilities, which allow agents to work through complex data science problems in a turn-by-turn manner with the user.

Solution overview

To get started with the AgentCore Code Interpreter, clone the GitHub repo:

git clone https://github.com/awslabs/amazon-bedrock-agentcore-samples.git

In the following sections, we show how to create a question answering agent that validates answers through code and reasoning. We build it using the Strands SDK, but you can use a framework of your choice.

Prerequisites

Make sure you have the following prerequisites:

Configure your IAM role

Your IAM role should have appropriate permissions to use the AgentCore Code Interpreter:

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "bedrock-agentcore:CreateCodeInterpreter",
            "bedrock-agentcore:StartCodeInterpreterSession",
            "bedrock-agentcore:InvokeCodeInterpreter",
            "bedrock-agentcore:StopCodeInterpreterSession",
            "bedrock-agentcore:DeleteCodeInterpreter",
            "bedrock-agentcore:ListCodeInterpreters",
            "bedrock-agentcore:GetCodeInterpreter"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": "arn:aws:logs:*:*:log-group:/aws/bedrock-agentcore/code-interpreter*"
    }
]
}

Set up and configure the AgentCore Code Interpreter

Complete the following setup and configuration steps:

  1. Install the bedrock-agentcore Python SDK:
pip install bedrock-agentcore
  1. Import the AgentCore Code Interpreter and other libraries:
from bedrock_agentcore.tools.code_interpreter_client import code_session
from strands import Agent, tool
import json
  1. Define the system prompt:
SYSTEM_PROMPT  """You are a helpful AI assistant that validates all answers through code execution.

TOOL AVAILABLE:
- execute_python: Run Python code and see output
  1. Define the code execution tool for the agent. Within the tool definition, we use the invoke method to execute the Python code generated by the LLM-powered agent. It automatically starts a serverless AgentCore Code Interpreter session if one doesn’t exist.
@tool
def execute_python(code: str, description: str = "") -> str:
    """Execute Python code in the sandbox."""
    
    if description:
        code = f"# {description}n{code}"
    
    print(f"n Generated Code: {code}")
        
    for event in response["stream"]:
        return json.dumps(event["result"])
  1. Configure the agent:
agent  Agent(
tools[execute_python],
system_promptSYSTEM_PROMPT,
callback_handler
)

Invoke the agent

Test the AgentCore Code Interpreter powered agent with a simple prompt:

query  "Tell me the largest random prime number between 1 and 100, which is less than 84 and more that 9"
try:
    response_text = ""
    async for event in agent.stream_async(query):
        if "data" in event:
            chunk = event["data"]
            response_text += chunk
            print(chunk, end="")
except Exception as e:
    print(f"Error occurred: {str(e)}")

We get the following result:

I'll find the largest random prime number between 1 and 100 that is less than 84 and more than 9. To do this, I'll write code to:

1. Generate all prime numbers in the specified range
2. Filter to keep only those > 9 and < 84
3. Find the largest one

Let me implement this:
 Generated Code: import random

def is_prime(n):
    """Check if a number is prime"""
    if n <= 1:
        return False
    if n <= 3:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

# Find all primes in the range
primes_in_range = [n for n in range(10, 84) if is_prime(n)]

print("All prime numbers between 10 and 83:")
print(primes_in_range)

# Get the largest prime in the range
largest_prime = max(primes_in_range)
print(f"nThe largest prime number between 10 and 83 is: {largest_prime}")

# For verification, let's check that it's actually prime
print(f"Verification - is {largest_prime} prime? {is_prime(largest_prime)}")
Based on the code execution, I can tell you that the largest prime number between 1 and 100, which is less than 84 and more than 9, is **83**.

I verified this by:
1. Writing a function to check if a number is prime
2. Generating all prime numbers in the range 10-83
3. Finding the maximum value in that list

The complete list of primes in your specified range is: 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, and 83.

Since 83 is the largest among these primes, it is the answer to your question.

Pricing and availability

Amazon Bedrock AgentCore is available in multiple Regions and uses a consumption-based pricing model with no upfront commitments or minimum fees. Billing for the AgentCore Code Interpreter is calculated per second and is based on the highest watermark of CPU and memory resources consumed during that second, with a 1-second minimum charge.

Conclusion

The AgentCore Code Interpreter transforms the landscape of AI agent development by solving the critical challenge of secure, scalable code execution in production environments. This purpose-built service minimizes the complex infrastructure requirements, security vulnerabilities, and operational overhead that have historically prevented organizations from deploying sophisticated AI agents capable of complex computational tasks. The service’s architecture—featuring isolated sandbox environments, enterprise-grade security controls, and seamless framework integration—helps development teams focus on agent logic and business value rather than infrastructure complexity.

To learn more, refer to the following resources:

Try it out today or reach out to your AWS account team for a demo!


About the authors

Veda Raman is a Senior Specialist Solutions Architect for generative AI and machine learning at AWS. Veda works with customers to help them architect efficient, secure, and scalable machine learning applications. Veda specializes in generative AI services like Amazon Bedrock and Amazon SageMaker.

Rahul Sharma is a Senior Specialist Solutions Architect at AWS, helping AWS customers build and deploy, scalable Agentic AI solutions. Prior to joining AWS, Rahul spent more than decade in technical consulting, engineering, and architecture, helping companies build digital products, powered by data and machine learning. In his free time, Rahul enjoys exploring cuisines, traveling, reading books(biographies and humor) and binging on investigative documentaries, in no particular order.

Kishor Aher is a Principal Product Manager at AWS, leading the Agentic AI team responsible for developing first-party tools such as Browser Tool, and Code Interpreter. As a founding member of Amazon Bedrock, he spearheaded the vision and successful launch of the service, driving key features including Converse API, Managed Model Customization, and Model Evaluation capabilities. Kishor regularly shares his expertise through speaking engagements at AWS events, including re:Invent and AWS Summits. Outside of work, he pursues his passion for aviation as a general aviation pilot and enjoys playing volleyball.