iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📖

re:Invent 2025: Building Production-Ready AI Agents with Strands Agents SDK for TypeScript

に公開

Introduction

By transcribing various overseas lectures into Japanese articles, we aim to make hidden valuable information more accessible. This project addresses this concept, and the presentation we are featuring this time is here!

For re:Invent 2025 transcript articles, information is summarized in this Spreadsheet. Please check it as well.

📖re:Invent 2025: AWS re:Invent 2025 - Build production AI agents with the Strands Agents SDK for TypeScript (AIM3331)

In this video, Ryan and Nick from Amazon explain Strands' TypeScript release and its approach to building agents. Strands is a framework that can execute agent loops with just a few lines of code, employing a model-driven approach where developers let the LLM infer workflows instead of defining them in detail. The TypeScript SDK was released as a 0.1 preview, supporting model providers like Bedrock, OpenAI, and open standards such as MCP, A2A, and OTEL. New features introduced include prompting techniques using Agent SOPs (Standard Operating Procedures), modular prompting with Steering functionality, and an evaluation library. Notably, Strands agents were utilized in the development of the TypeScript SDK itself, integrating with GitHub workflows through Refiner and Implementer agents, which increased development efficiency fourfold.

https://www.youtube.com/watch?v=NzZOm-kaO94
*This article is automatically generated, aiming to preserve the content of the original video as much as possible. Please note that it may contain typos or incorrect information.

Main Content

Thumbnail 0

Announcing the Strands TypeScript Release: Build Agents with Just a Few Lines of Code

Good morning, everyone. I think I'm going to be the last thing standing between you all and lunch. My name is Ryan, and I'm a Principal Product Manager at Amazon. My colleague Nick is with me, and he'll be up later. I've been really excited to talk to you all week, but I lost my voice yesterday, so please bear with me if my voice cracks or if I have to pause to drink water.

Nick and I are both from the Strands Agents team. We're really excited to talk to you about Strands at this conference. Many of you may have seen our booth, but today we're going to talk about our latest release, TypeScript, and some of the other releases around that. I'm going to take some time to talk about this simple interface for building agents, and I'll also talk about what's behind it. What does it mean to build an agent with just a few lines of code?

Thumbnail 50

This code sample will actually work if you install the Strands and Strands Tools packages from PyPI. If you're using Bedrock as your default model provider, you can run this and get an agent loop running. So I'll go through how this works and why it works. And of course, I'll talk a little bit about the TypeScript release.

Thumbnail 70

Nick will come up and talk about how we built it. So let's move on from here. Strands, TypeScript release, a model-driven approach to agent loops, and so on. First, I think many of you are here today because we're releasing TypeScript, so let's get that out of the way first.

Thumbnail 90

We're very excited about this. It's a preview release as of yesterday, meaning 0.1 on GitHub and NPM. However, it's the same style of Strands. We've really worked hard to bring everything we've had since we released the Python SDK in May to TypeScript. It's the same simple interface, a few lines of code, running an agent loop. However, there are some limitations to this preview release until it reaches parity with Python's 1.0. So please be aware that this is about building single-agent systems as of today. We haven't added multi-agent patterns yet. Several other features might also be missing. For things like OpenTelemetry, we plan to follow up quickly in a few weeks, but today you can use basic features like Bedrock and OpenAI models.

You can use async and non-async with streaming. You get full agent state and conversation management for multi-turn agents, and of course, if you've been dabbling with Strands, you can use hooks to inspect and modify its lifecycle. Hooks are powerful features for observing or modifying behavior at different points in the lifecycle.

Thumbnail 160

What is a model-driven agent: A tool execution loop for goal resolution

So let's rewind a bit and talk about what it means to have a model-driven agent in both Python and TypeScript. Some of you might be building agents right now. For those of you, this might be old news, but for those who haven't entered this space yet, I think it's worth pausing here. That's because many people think of agents as just making an LLM call and seeing what happens when you give it internal or external tools like MCP. That's not entirely agentic. It's a very broad term, and I think the industry is still solidifying its definition.

Our definition here is that an agent executes tools in a loop to solve a goal. The design behind Strands, which I'll be talking about, is all about enabling that loop. So you can think of Strands as managing that agent loop at the top. The input you give to that loop is the prompt. That's the system prompt, and of course, it's also what the user prompts the application with. And what you get back is the final result.

Inside the loop, there's a powerful reasoning model equipped with tools, and a goal assigned through the system prompt. It performs its own reasoning to determine how to use those tools and whether the information it has obtained is sufficient to solve that goal. It might execute more tools if needed, and it completes that cycle as many times as necessary to return a satisfactory response to the user. Throughout this entire lifecycle, I'll talk about how you can control it, how to guide the model, and how to add ways to constrain the model. But in essence, that's it. By simply embracing this architecture where the model figures out its own workflow, you can get very far, very fast. I'll talk more about this.

Thumbnail 270

Components of an Agent Loop: Reasoning Models, Tools, and Modular Design

But first, to further contextualize what's inside that agent loop. First and foremost, it's a model. That's the number one thing you need, and specifically, a reasoning model that can use tools. Nowadays, there are so many options. It hasn't always been that way. Of course, think of the major AI labs. You can use Gemini, OpenAI, Anthropic models. You can also use Nova's latest reasoning models. Many of these models are available on Bedrock. But Strands is designed to work with any model provider. For example, you can also access OpenAI's and Anthropic's first-party APIs directly. And many more providers have the same interface.

You can also bring your own custom gateway. For example, LiteLLM support is integrated into Strands with today's Python library. We'll be adding this to the TypeScript interface in the future. Many customers also bring their own gateways. It doesn't have to be LiteLLM. There's an entire abstraction mechanism called custom model providers, where you can define in Python or TypeScript how you interact with that gateway product or another model provider that comes to market. Many of the open-source contributions to Strands are model providers by the users or vendors of those models themselves, adding support for Strands to be used with those models.

Finally, you can think of that top layer as a choice of reasoning model, but it doesn't have to be just a reasoning model. For example, a very common pattern is that you might have picked Claude 3.5 Sonnet as the reasoning model for your agent loop. You can assign, for example, Stability's image generation model as a tool. This is one of the tools available in community packages provided by the Stability team. That reasoning model can use other models that are not reasoning models but are designed for other purposes, like image generation. You can combine them this way within the same agent. You can call many models within an agent.

Thumbnail 400

Thumbnail 410

When equipping agents externally, we adopt open standards. I'll talk a bit more about design principles later. But in short, you can bring any MCP server, and for agent communication, you can use A2A. All traces and trajectories can be sent to your favorite Open Telemetry provider via OTEL. Tools are very important for the agent loop, so I want to pause here and talk about what options are available.

Obviously, there's MCP. You can use any MCP server with Strands as an MCP client, supporting all different authentication types. As the MCP specification evolves, we continue to evolve our client implementation. In both Python and TypeScript, you can bring your own functions and use them as tools. If you already have software or want to incorporate libraries you want to use and expose them as tools to your agent, it's really easy to do that.

In the Python package, we provide a number of built-in tools. TypeScript only has a few of them, but we'll be expanding those in the future. But they work similarly. You can install and import them via packages, just as if you had built your own tools as functions. I want to pause and talk about the pattern of using an agent as a tool, because I think this is often overlooked as a way to build agents. You might have heard this as a supervisor pattern in the context of orchestration.

Consider this as building Strands with this powerful reasoning model. And you might want to build an agent that's really good at interacting with, say, a backend system. For example, a customer I worked with was using an MCP server for Atlassian Jira. They wanted to create an entire agent specifically for understanding how to interact with their Jira system, and they were using features like Strands hooks to parse tool calls. That's because Jira stories can be very large and can impact the context window. The entire job of that agent is to figure out what interesting things it wants to return about this story. They encapsulated all of that as this sub-agent. And then they exposed that as a tool to a higher-level orchestrator, or supervisor agent. You get modularity with this approach, and my point is that you don't always have to use deterministic tools. The agent itself can be a tool.

Finally, you can think of that top layer as a choice of reasoning model, but it doesn't have to be just a reasoning model. For example, a very common pattern is that you might have picked Claude 3.5 Sonnet as the reasoning model for your agent loop. You can assign, for example, Stability's image generation model as a tool. This is one of the tools available in community packages provided by the Stability team. That reasoning model can use other models that are not reasoning models but are designed for other purposes, like image generation. You can combine them this way within the same agent. You can call many models within an agent.

Thumbnail 400

Thumbnail 410

When equipping agents externally, we adopt open standards. I'll talk a bit more about design principles later. But in short, you can bring any MCP server, and for agent communication, you can use A2A. All traces and trajectories can be sent to your favorite Open Telemetry provider via OTEL. Tools are very important for the agent loop, so I want to pause here and talk about what options are available.

Obviously, there's MCP. You can use any MCP server with Strands as an MCP client, supporting all different authentication types. As the MCP specification evolves, we continue to evolve our client implementation. In both Python and TypeScript, you can bring your own functions and use them as tools. If you already have software or want to incorporate libraries you want to use and expose them as tools to your agent, it's really easy to do that.

In the Python package, we provide a number of built-in tools. TypeScript only has a few of them, but we'll be expanding those in the future. But they work similarly. You can install and import them via packages, just as if you had built your own tools as functions. I want to pause and talk about the pattern of using an agent as a tool, because I think this is often overlooked as a way to build agents. You might have heard this as a supervisor pattern in the context of orchestration.

Consider this as building Strands with this powerful reasoning model. And you might want to build an agent that's really good at interacting with, say, a backend system. For example, a customer I worked with was using an MCP server for Atlassian Jira. They wanted to create an entire agent specifically for understanding how to interact with their Jira system, and they were using features like Strands hooks to parse tool calls. That's because Jira stories can be very large and can impact the context window. The entire job of that agent is to figure out what interesting things it wants to return about this story. They encapsulated all of that as this sub-agent. And then they exposed that as a tool to a higher-level orchestrator, or supervisor agent. You get modularity with this approach, and my point is that you don't always have to use deterministic tools. The agent itself can be a tool.

Thumbnail 530

Strands' Design Principles: Simplicity, Extensibility, and Open Standards

Now, I'm going to take a step back and tell you a little bit about why we built Strands and the story behind it, but as a precursor, this is a snapshot of our contribution guidelines that you can find in our repository, specifically the principles related to how to contribute to Strands. We've compiled them here in the slides, but they're all available in the GitHub repository. For those of you who aren't familiar with Amazon's processes, we're very principle-driven. These define how we make decisions. They are intended to be in tension with each other and can be used as a guide when we, as a team or as an open-source community contributing to Strands, are making decisions about trade-offs. I also think they tell the story of how we tried to build this software for you.

One is simplicity, and really what's important is that it's simple with the same abstraction from prototyping on your laptop all the way to running it in production. In other words, getting a simple agent loop running should be very simple. Integrating telemetry and logging later on should be just as simple. We've already talked a little bit about extensibility. Model providers are a good example of extensibility, and tools are another good example of extensibility, but this applies to everything. Hooks are essentially a mechanism to take over control at certain points in the lifecycle, before and after tool calls, before and after model calls, and so on.

And you can plug in your own hooks. We provide some as well. For example, there's a built-in hook that can help you capture retries from Bedrock models for transient failures. We're embracing this extensibility and applying it everywhere we possibly can. And of course, these are all intended to work together. Everything is intended to lead you down the right path. As we layer on more features, the defaults built into our abstractions are intended to give you a good experience, and you can customize many of these.

And being accessible to both humans and agents is something that has newly emerged in a way. As a concrete example of this, if you go to StrandsAgents.com and you actually add slash latest slash llms.txt to the end, you can get a table of contents in this markdown format. Basically, this llms.txt or llmstxt.org, I think that's the canonical URL for this, is a way to expose an HTML to markdown twin for coding assistance and model training documentation. So you, as a human, are reading an HTML document on the website, but we're exposing markdown for models, making it available on a standard protocol that models know or systems around models know how to find. We've already talked about common standards. As new ones emerge, we'll add them. Today, we already support the major ones.

Thumbnail 690

Experience at Amazon Q Team: From Concept to Production in Weeks

So let's go back to storytelling. As I mentioned, Nick and I are from the Strands team. Before that, we were part of the Amazon Q team, or what's now known as the Quiro team. Specifically, we were working on a project to enable other Amazon teams to extend Q. You can imagine that it's not just one team building these super agents at Amazon. Many teams are collaborating and sharing their expertise. And this is, Nick, how long ago was this, over a year ago now, almost two years. It was quite a while ago, and this was in the very early days of agentic.

And we were observing that many teams were building their own solutions using the Converse API and other APIs. They were just doing LLM calls in a loop around the workflows they had built, or they were using simpler off-the-shelf frameworks, but it still wasn't truly agentic. They certainly didn't have the agentic loop we're talking about today. They had to spend a lot of time on deterministic workflows. This meant that engineers with good ideas or product teams who wanted to build something would take months to get something up and running. From idea to prototype, and then even longer to get it into production.

So what we focused on, of course, because our requirement was to onboard dozens of teams and integrate agentic capabilities into a single customer experience, we started working on a different approach. And that led to, for example, these results from the Quiro team. They were able to go from concept to production in weeks. That was our driving goal. We iteratively improved this internal framework, adding the right features and finding the right developer experience to achieve these goals, and as a result, these teams and other teams across Amazon were able to move really, really fast.

Thumbnail 800

In May of this year, we released it to the public. That was the first preview release of Strands. We were fortunate to receive very generous mentions on social media from prominent people. As Swami mentioned yesterday, the Python SDK has been downloaded over 5 million times on PyPI. Obviously, with TypeScript, we're just getting started, but we hope it will be just as helpful to all of you. And I've met many of you. I've talked so much about Strands and the agents you're building that it's part of the reason I lost my voice. And from that initial release in May until now, I'm really excited to see the signals that all of you are starting to build much faster with Strands available. And you're seeing the same results that we were able to build for our internal customers.

Thumbnail 850

From Workflow-Driven to Model-Driven: Shifting Developer Work to LLMs

So, let's return to the model-driven approach in this architecture. In summary, the point here is that there's very little for you as a developer to think about. You think about the goal you want the agent to achieve, the system prompt. And I'll talk about techniques for more fine-grained control later. You think about the tools you want to equip that agent with. These tools themselves can be agents, and this is how you get modularity and break down complexity. And then you integrate this into your application, send a prompt, and get a result. That's the agent loop.

Thumbnail 890

And I wanted to differentiate this from the very broad choices in the market today: either building it yourself or using frameworks available from open-source providers. This is a shift between workflow-driven and model-driven, as I've named it.

The difference here is who is defining the steps required to break that goal into a plan, make tool calls, analyze the results of those tool calls, determine if more are needed, and iterate that loop until a result is returned to the customer. Who is doing that work? What we experienced on the Q team was that every development team was doing that work, describing these workflows using abstractions provided by these open-source frameworks, or as bespoke systems they built themselves. They were doing all the work, and what that leads to is a fairly brittle system that takes a long time to build. Developers had to anticipate new scenarios. For example, a new category of customer questions in a chat experience. To support that use case, they had to create and maintain more workflows, and if something broke, they had to troubleshoot these different steps.

Thumbnail 980

The model-driven approach, to be fair, is not just Strands' approach. Many frameworks this year, alongside Strands, have emerged to support this approach. This approach shifts where you push all the work to the LLM and then spend your energy on guiding that LLM to different capabilities. So, if this isn't clear, let's look at a practical example. Think of early ChatGPT. You'd go into a chat experience, and it was fairly powerful. If I was in this scenario, for example, trying to debug why I couldn't connect to a remote EC2 instance. I'd go into ChatGPT and ask it how to do this, and I'd get a pretty good experience. And I'd get a series of questions in natural language, basically a tutorial. It might have been better than reading different documentation or different blogs. I'd get a much better response, but it wasn't at all personalized. This is all old news, I think, you all understand now. This was the pre-agent era. You'd ask a question, and steps would come back, and they might have been helpful or not.

Thumbnail 1030

So, when it became agentic, and here I'm defining agentic as using tools with agents, when you let the agent figure out its own work, for example, in a Python framework, you could also go down the path of describing the steps of an LLM call and breaking the agent down into specific calls. And for the first step, let's predict checking network connectivity, and then making a direct call to a tool that handles VPC APIs or security groups, and iterating through that workflow until you get a response. This works well. There's nothing wrong with an agent that works this way. The challenge is that you had to do a lot of upfront work to describe those steps. And you have to parse the tool results to determine if you need to proceed to the next one. Perhaps you find yourself in a hybrid mode where you let the model decide whether to proceed, but you're still describing what the next step is.

Thumbnail 1100

If you decide, oh, I want to support container connectivity or on-premise compute solutions with this experience I'm building here, it's not easy. You not only have to add new tools, but you have to add new workflow steps and integrate them into the flow you've built. This is the essence of a workflow-driven agent. Conceptually, a model-driven agent is much simpler. Give it a goal, give it tools. That's it. We saw the first example with the International Space Station. There was also the example of the weather in Las Vegas. These are very simple examples of agents, but they work well. And they work well because you're simply setting them up with an orchestration framework like Strands.

Now, this sounds too good to be true, and to some extent, it is. You can get prototypes running really fast, but the number one question I get in meetings with customers is, okay, but I want to give it better instructions. I want more predictability from the workflow. I want it to be faster, more cost-effective in terms of token usage. There might be dangerous outcomes that the model could take. I absolutely want to prevent those dangerous outcomes. How do I guide it? How do I constrain it?

So, what I'm saying here is think of this as a spectrum, and think about where you're shifting the work, and how big that chunk of work is. If you're thinking in terms of workflow-driven agents, you do a lot of upfront work to define the workflow, you iterate on that workflow, and control is intertwined with the workflow. Guidance and steering, prompting, tool usage, parsing tool results, all of this is intertwined, and it's all monolithic. The model-driven approach starts by forgetting all of that and giving the model free rein to figure out its own workflow, and then layering on other controls when the situation demands it. You can do that at a more granular level for specific situations. So, let's break that down.

Thumbnail 1220

Agent Control Spectrum: From Autonomy to Steering and Constraints

On the left side of the spectrum is pure autonomy. You give the model a goal, you give it tools, and it does the job. The tool in this situation is actually the system prompt. Try to make it a really good one. My advice is always to make it goal-oriented. Models love to achieve goals. If you tell them what success looks like, they'll probably do a pretty good job of finding success.

Research assistants are a good example of this, right? The model has all the knowledge it needs. You can give it external sources, tell it what format you want, and it will do a pretty good job. And you can layer things like Strands' structured output. This is a mechanism that allows you to say, for example, I want to enforce a desired Markdown format, or I want to output JSON so it can be integrated into a front-end application. This can be layered on the left end of the spectrum without having to think about workflows or where its output comes from. Ultimately, my abstraction framework, Strands, is simply forcing structured output and working with the model to regenerate until it matches my schema. It's all handled for me, and now I have a model-driven approach that returns the desired output.

Now, what if you need a little more guidance? When we talk about steering and SOPs, you might have seen that last week we launched a new repository for Strands called SOPs. If you haven't, please check it out. I'll put a link at the end of the session. This is really a prompting technique, specifically for system prompting, that we developed internally at Amazon. Beyond the Strands team, there's a very large community that has been working on this technique, and we've decided to offer it through Strands. It's very useful for this use case. Nick will go into more detail, but this is basically an approach to structure your prompts more and present specific steps you want the agent to execute within the prompt. And you can use keywords like "this must be done" and "this should be done," and the model does a fantastic job of both following your guidance and adapting, right?

And that's a big point, because it might sound like we're going back to a workflow-driven approach, but what if a remote API call fails? If you're just doing LLM calls, the workflow has to deal with it, but the model can find a way to retry or find another data source it can call to achieve the steps you've described in the prompt. Then, on the other side, you start thinking about how to apply constraints to the model. That's where you're really trying to protect sensitive outcomes. Hooks are an example of this, and they're included in Strands itself. So, you can inspect after a tool call to make sure the result has passed, for example, PII guardrails, or you can parse specific information. These can all be executed as part of Strands hooks, and essentially, you can step out of the agent loop, do some work, and then bring it back into the agent loop.

Another example of this, external to Strands, is the policy engine feature provided as part of AgentCore's gateway product. This is for situations like a hosted MCP server, where you can have MCP endpoints for remote targets. These targets can be APIs or Lambda functions. We can now assert deterministic policies on these gateways. In other words, you can protect data sources before a tool is called, and by externalizing that logic from the agent, you can treat the agent as untrusted. And the gateway provides deterministic control. For example, can an agent call a specific API for a refund before the customer is deemed eligible, and do they have a token indicating eligibility, right? Don't let the model figure out how to exploit the system. In fact, you can block the model from making a tool call until it provides the necessary proof.

Thumbnail 1460

Thumbnail 1470

So, think of this along a spectrum. This is essentially the long story short. There are controls along the way. If you start building agents at StrandsAgents.com, our website will guide you through that. And we'll move on to talk about some examples that we've launched in the last few weeks. I mentioned SOPs. Nick will elaborate on these, but just to reiterate, these are natural language instructions. You can incorporate them, in this case, via Python imports, or via MCP servers that you can use today in TypeScript. And the format looks like this. The top half—this is not one file—but the top half is an example of an SOP, and the bottom half is an example of how to load an SOP with the Python SDK.

Pay special attention to these steps. They are obviously abbreviated, but you can have headings like Step 1 is a setup step, Step 2 is this, and within that, you have these keywords, these RFC 2119 keywords. These are just standard keywords, and of course, models have encountered them in their training data. The SOP mechanism is really about leveraging that knowledge the model has, and models follow these keywords extremely well.

This is something we've been experimenting with internally at Amazon for a long time, and you just have to follow it. Give it steps, give it constraint keywords within those steps, and the model will do a good job.

Thumbnail 1520

What we're also thinking about, assuming that goes well and gets us down a good path, is that it can end up being a large prompt. So what we're starting to experiment with, and what was released as an experimental feature in the Python SDK this week, is what I think of as modular prompting. We call this Steering, and that's the name of the feature. This is a way to break down important bits of instruction and inject them into those lifecycle hooks I keep talking about. This mechanism is truly one of the extensible engines of Strands, where you can choose to use an LLM as a judge or a deterministic Python library. You can bring in Cedar, if you need to, and use those outside the agent to determine the trajectory the agent is taking.

So, you can look at the trace and essentially pass the entire agent state. For example, for a prompt where an LLM is the judge, it verifies whether the tone and voice of this response from the model are appropriate for these brand guidelines. And it gives responses like yes, no, and feedback. Strands will then essentially pause in the agent loop, interpret those instructions, and if the agent has complied with those instructions, it simply returns control of the agent. And if it hasn't complied, when returning control, it gives feedback to the agent, and the agent absorbs that feedback, retries, and comes back through the hook.

This is the idea behind Steering, and I think it will be a pretty interesting direction for us. Hopefully, it leads to fewer instructions given to the model upfront, so we can work with larger context windows, reducing the chance of the model receiving excessive steering data upfront, and improving token efficiency over time. And we're also experimenting with connecting this to episodic memory systems, so that feedback loop happens only once in a particular agent's lifecycle, and it gets better at following those instructions over time without needing external hooks.

Thumbnail 1650

Another launch we made this week is the evaluation library, which is a bit tangential to my story about controlling agents, but I think it's useful because at some point in the lifecycle, past the prototyping stage, you need to evaluate your agents. And Strands has had this gap for a while, not providing the libraries you need to quickly take an agent out and build evaluators. We're also adding functionality to help you generate synthetic datasets relevant to your use case using Strands-based agents and build evaluators around those datasets.

You can also use custom evaluators. We have several built-in ones, and this is designed to work with your preferred online evaluation systems. That includes AgentCore's evaluation system that launched this week, or any other systems you're using. And you can combine the story of building on your laptop, something native to the SDK experience, with online systems for large-scale evaluation.

Challenges in Building the TypeScript SDK: Writing Better Code with Agent SOPs

So, that's my part. Next, I'll talk about Strands TypeScript. We launched it this week, and it's available as a preview, but we took a slightly non-traditional approach in some ways. I'm excited to have Nick talk about that. First, we built it using Strands, but it wasn't vibe coding. It was actually a pretty nuanced approach, and it was also a collaborative experience for the Strands team and the agents team. So, I'd like to invite Nick to the stage to talk about how we built it and how these features helped us.

Thumbnail 1760

Thanks, Ryan. Hello, everyone. Thank you for coming today. My name is Nick. I'm a Senior Software Engineer at AWS, and I was the technical lead for building and developing the TypeScript SDK. As Ryan said, as part of this presentation, I want to talk about how we actually built this new feature, this new language for Strands. When we decided to support a new language for Strands, we wanted to be intentional about how we built that language. We looked back at our experience building the Python SDK, and we really wanted to focus on the developer experience of writing that code, not just the easy developer experience of writing an agent in a few lines of code, as Ryan described. We spent a lot of time developing and writing code for Python, so we wanted to look back at that experience and see how we could improve it.

Thumbnail 1810

Strands for Python was originally written by a dedicated group of Amazon engineers. They first built the Python SDK as a prototype. They used a lot of these early Gen AI tools to bootstrap development, write code faster, and get an agent loop up and running.

We spent a lot of time trying to get all the code written using these early AI-generated coding tools, but the problem was that a lot of it was vibe coding. The code that came out of these tools wasn't very high quality. I spent a lot of time, and we as engineers spent a lot of time, reviewing, understanding, guiding, and steering these agents to write code that we were comfortable shipping, that we could take into production, and that you could use. I recently went back and analyzed the early release of Strands for Python, and it was about 15,000 lines of code.

Thumbnail 1880

As we transitioned to TypeScript, we were going to build a codebase of a similar scale. So we wanted to be intentional about how we could actually help developers write all of this code. Specifically, we wanted to figure out how to write better and faster code using agents. And we've broken this down into two parts because we came up with interesting solutions for how to achieve each.

Thumbnail 1890

To write better code, we looked at Agent SOPs. Ryan introduced this earlier, but at a high level, it's a specification for writing agent instructions. It's literally a standard operating procedure for agents to follow to achieve certain tasks. The great thing we found using Agent SOPs is that they're excellent for steering agents towards somewhat repeatable behavior.

SOPs were originally conceived by James Hood, one of our principal engineers. He was also frustrated with these early AI-generated coding tools. He spent a lot of time trying to get good code out of them, and a lot of time debugging to get good results. Out of that frustration, he came up with this idea of Agent SOPs and shared it with the builder community within Amazon. It was very, very successful. Last I checked, over 5,000 of them had been created internally and were being used everywhere. People within Amazon love this idea. As Ryan said, we wanted to release this as part of Strands because we wanted to share it. It has been very successful in helping engineers guide and steer agents.

Thumbnail 1970

Here's an example of an SOP. We looked at it with Ryan earlier, but I want to dive a little deeper into how it works. There's an overview, there are parameters, and then there's a list of steps the agent follows to complete the task. As I said earlier, it's really easy to understand why this gives agents repeatable behavior. When an agent is following these steps to achieve some goal, it's really obvious what the agent is doing.

As a developer trying to write an agent to accomplish these tasks, this SOP means it's debuggable for me. If the agent is going through these steps and does something unexpected at step 3, as a developer, I know that I need to go into step 3 and provide more instructions. We had several teams at Amazon writing system prompts for these agents, and developers were worried or afraid that if they messed with the system prompt, or the wording, or the way it was written, the agent would do something unexpected.

But with SOPs, it's clear and understandable. It becomes much easier for developers to go update this and have confidence that their changes aren't going to break the system. Later, I'll show you a video of how we actually did this in developing the agent for TypeScript.

Thumbnail 2050

As I mentioned earlier, this idea of Agent SOPs has been very successful within the Amazon builder community. Among the things we've released are prompt-driven development and code assist. These were two of the most popular SOPs because they really found a way to guide agents in an impressive manner. Prompt-driven development is an SOP where an agent guides you through a question-and-answer process to refine an idea into an implementable state.

The agent asks you questions to clarify the problem. If it doesn't understand some aspect of what you're asking, it uses tools to investigate, and if it's within a codebase, it explores the codebase. And it keeps track of its progress, so you can see what the agent is thinking and what it has done. The really good thing about this is that, as we discovered with early Gen AI coding tools, agents want to take the shortest possible path to solve a problem.

And the shortest possible path isn't always the path I want to take. Sometimes, I want to guide it to write code in a way my team is familiar with, or I want it to use a function I've written that the agent might not be aware of. So, by having this interactive question-and-answer process with the agent, I can clarify the problem.

This makes it very clear to the agent what it needs to do, and it aligns with what I want it to do. The output of this SOP is an implementable task. It's a Markdown file about how to implement a feature in a codebase.

We've developed another agent SOP called Code Assist. This takes that task and actually implements it. When writing code, it follows a test-driven development approach, writing unit tests first, and then the application code. We've found that by having the agent write code this way, it writes much better code. After it finishes writing the feature, I can go in, review all the code that was written, and then provide feedback on how to update or revise it to better align with what I want. After a few iterations, if I say it's complete, the agent can commit this code.

Thumbnail 2170

GitHub as a Team Member: Refiner and Implementer Agents in Practice

So, using these two agent SOPs together, we essentially solved the problem of how to get agents to write better code. We solved the second part of the problem, how to get agents to write code faster, in another interesting way. My team's developers typically develop software and write code in this manner. On one side, there's a team of developers, and they communicate with individual developers through GitHub. And individual developers might create pull requests or issues to track the progress of the code they're writing. They create and update pull requests, adjusting the feedback they get from the team in their local IDE. They're probably running Cursor, or CLI, or Claude Code, and doing that communication.

They are the intermediaries in this process, telling the agent what feedback needs to be given to get this change approved by the team and incorporated. But we spend a lot of time on that. Staging our local repository, staging our local codebase, understanding and reflecting and responding to feedback, writing all the test cases for the code. So instead, we wanted to shift this and make GitHub a member of the team.

Thumbnail 2230

We leveraged GitHub's actions feature, GitHub Workflows. These are triggered by specific events on pull requests or issues, and they trigger Strands agents. As Ryan said, we're writing Strands code with Strands. GitHub Workflows trigger Python Strands agents to contribute code to the TypeScript repository. Python code is now writing TypeScript code for us. By using this system, we're essentially adding a new member to our team through GitHub.

Agents act as another contributor on our team, and I no longer have to be the intermediary. I don't have to understand the team's feedback and pass it to the agent. Instead, the agent is doing it itself. It's reading GitHub issues and GitHub pull requests and making those adjustments, so I don't have to. And this is how we solved the problem of writing code faster. I'm no longer the intermediary. The agent is doing it for me.

Thumbnail 2290

So, using this new paradigm of agents in GitHub, we needed to recreate Prompt-Driven Development and Code Assist, the SOPs I mentioned earlier. What we came up with are these two agents we call the Refiner and the Implementer. These are slight variations of the Prompt-Driven Development and Code Assist SOPs, adjusted for GitHub. The way the Refiner works is, when given an issue describing a general feature to be implemented in the codebase, it reads that issue, checks out the repository, explores the repository, and understands the implications of implementing that feature in the codebase.

Then it creates a list of questions and posts them as comments on the issue, so I can review them, update them, and give instructions on what to do next. This question-and-answer process via issue comments repeats until the agent determines that the task is ready for implementation, and then it can pass that issue to the Implementer agent. The Implementer agent, like Code Assist, takes an implementable task, which is an issue, and creates a branch on GitHub. It checks out the codebase, implements that code, that feature, using test-driven development, commits to the branch, and creates a pull request.

For that pull request, just like any other member of the team, I can leave feedback on the pull request, and the agent reads that and updates the code for me. And then I just iterate. The real amazing efficiency here is that I don't have to sit down and spend 30 minutes writing and debugging code. I can let the agent do that, and I can go do something else that agents can't do. And this is where we're getting real efficiency in writing code faster with agents.

Thumbnail 2390

So, I want to show you some examples of us actually using this to write code in the TypeScript repository. Here, you see an issue outlining a task that I want to implement in our TypeScript codebase. As a developer, I know how I want to proceed with this. But if I just gave this to an agent, it would take the shortest path to implement this feature, which is essentially what we call vibe coding.

Thumbnail 2420

Thumbnail 2430

To avoid this, I enter the slash strands command to kick off a GitHub action and trigger the refiner agent. You'll immediately see a label added to the issue indicating that this agent is running. The agent then takes this issue as instruction, is placed in the codebase, and explores and understands how it should implement that issue. It leaves comments here, asking clarifying questions about how that feature should be implemented.

Thumbnail 2440

Thumbnail 2450

Thumbnail 2460

Now, while this is running, I can go do other work and come back later to answer the clarifying questions. So I'm just going to go through and answer all the questions, one, two, three, four, five. And once I'm done, I can type slash strands again to trigger the refiner agent. And I repeat this until the task becomes implementable. This takes a few minutes. And once it's done, the agent gets triggered again. And the agent gets dropped into the codebase again. It's able to read my updated comments, and it eventually determines that this feature is implementable. And it leaves a comment for that. This is going to pop up in a second. There it is.

Thumbnail 2480

Thumbnail 2500

The agent responded to my pull request with comments, and it also updated the issue description for the implementable task. If I refresh the page, you can see the agent actually edited the description here. By using this refiner agent, I've essentially clarified the task so that the agent can begin implementation. Now, in this next video, I'm going to trigger the implementation agent. It's very simple. I type slash strands implement, and again, using a GitHub action, I trigger a new agent, the implementer, to start the implementation process.

Thumbnail 2520

Thumbnail 2530

Again, it adds a label, creates a branch, and drops the agent in there. And it starts writing code in our codebase following this test-driven development methodology. This will take about 30 minutes. And again, this is where the efficiency comes in. I can do other work while the agent is doing that. I don't have to sit there coding and debugging. I'm just reviewing and improving the quality of the code coming into the codebase.

Thumbnail 2550

An issue was created here, but I want to point out something interesting. This relates to steering and refining agent SOPs, but the agent actually couldn't create a pull request. When designing these agents, we found that in some cases, the agent didn't have the permissions to do so, and my GitHub token as a developer was required. So, we added an additional instruction to the SOP to provide a link for me to do it. And this is how we've been debugging these SOPs. Here I clicked on the issue, which is the pull request. And this is the code the agent created for me. I didn't do any of this.

Thumbnail 2580

Thumbnail 2590

Thumbnail 2600

So, I'm going to skip ahead a bit here. Now, I've instructed the agent to update the description. Again, I'm not doing any of this. I'm telling the agent to do it. You'll see the comment right below. And I took a moment to leave some feedback on the pull request. It's about how I want the agent to update the code. There were a couple of things I didn't like. I wanted to change the naming of the discriminated union. And then I trigger the agent again.

Thumbnail 2610

Thumbnail 2620

Thumbnail 2630

Thumbnail 2640

Thumbnail 2650

And a few minutes later, it writes the code, updates the pull request, and leaves a comment. We've refined this SOP a bit, so it doesn't just leave a comment, but it responds to the pull request comments I've made and provides insightful answers on how it should implement them. If it's confused about how to implement a feature, it will actually ask me for clarification before proceeding. This is again, refining ideas. And as you can see here, it responds to the comments I've made, and it says, I'm done, I've implemented this feature, I'm done, I've implemented that too. And then it scrolls up, and we'll actually look at the code to show that this feature has been implemented. By using this system, the team is really excited and motivated to write code, and they're finding a lot of efficiencies. It's not just me using it; the whole team is using it, and many members are getting really good results.

Thumbnail 2660

Efficiency Gained through Collaboration with Agents: Agents as the Largest Code Contributors

When I showed this system to Aaron, a principal engineer on our team, this was the feedback I got from him. I only spent an hour in total across three short 20-minute sessions, and I finished what would have taken half a day to write code. In other words, I saved a principal engineer about four hours of his time, and instead, he was able to implement that feature in about an hour. And I essentially quadrupled the efficiency of a principal engineer. It's not always this efficient and time-saving, but we're finding more and more efficiencies in how we use this system.

As I said, the real benefit here is asynchronous. I can kick off an agent, or kick off multiple agents to implement multiple features, and go do something else in the meantime. Something that agents can't yet do. And we've found that usually low to medium complexity tasks are the best for agents to work on. As a developer, I can focus on high complexity things and break them down into medium and low complexity tasks so I can delegate them to agents.

Here's a graph that I really love. We've actually been tracking the contributions agents have made to the codebase. And we've found that agents are our largest code contributors.

Thumbnail 2720

My username is up in the top right. I've got about 14,000 lines and 43 commits. The agent is a little above me and is our largest contributor. All of our engineers are contributing code using this system. But I also want to emphasize that as engineers, we're still writing code. The agents aren't writing all of it, because there are still complex tasks that agents can't necessarily do. But we're finding efficient and really good ways to leverage these agents, and it's helping us speed up our TypeScript development.

Thumbnail 2760

Here are a few final things I want to tell you. We, as engineers, have learned to spend less time writing code and offload that to agents. If I were responsible for a feature in a codebase, I'd have to check out the repository, context-switch to understand the feature that needs to be implemented, review the codebase, write code, and debug it. After debugging, I'd have to write tests for it. Instead of spending time on those things, I can offload that to an agent.

My time is much more valuable spent reviewing that code, refining it, improving its quality. And providing high-level guidance about the direction and design in which these agents should implement the code. We're finding ways to parallelize our time. Offloading the coding and debugging process to agents, and I can go do things agents can't do. By using this system, we've essentially learned to treat these agents as members of the team.

Remember the GitHub workflow diagram I showed you earlier, the one where we contribute to GitHub? The agent is another member of the team. As for TypeScript, these agents are currently the largest code contributors on our team, and I think that's really exciting. We'll continue to develop this system, so you can track our progress in the TypeScript SDK repository. This is how we built TypeScript. I'm really happy to explain how we built it, and I'm really looking forward to all of you trying it out. So I'll hand it back to Ryan for some closing remarks.

Thumbnail 2850

Summary: Strands' Flexibility and Future Developments

Thanks, Nick. Yes, I hope you learned a lot today about what Strands is and how it can help you. Our team has obviously used it to gain efficiency with codebase agents. We've worked with customers building chat experiences, data processing pipelines, and research assistance. Strands is very flexible for all these use cases, and with TypeScript, you can get even closer to applications running in browsers or client applications.

One thing I didn't cover today is that this week we've been working with the amazing folks at Copilot Kit. They've implemented AGI integration for Strands, and that's also available on our website. The QR code goes to StrandsAgents.com. Also, all the repositories we talked about today—SOPs, evals, and of course, the Python and TypeScript SDKs—can be found at GitHub.com/Strands-Agents.

I'm Nick. I'll be here for about another 10 minutes, so feel free to ask any questions. And we also have some Strands stickers available. If any of you are returning to the Venetian Expo Hall today, there's a Strands booth. You should be able to find it in the big AWS square in the center of the Expo Hall. There's a bright red-orange excavator moving around, and it has an agent built into it. So, Strands is running on local compute, managing sensors, and it's connected to a cloud-based environment to perform industrial manufacturing use cases and excavation work.

If you're nearby, please come see the booth. You can get more stickers and swag there, and you can talk to our wonderful staff about anything you need. Otherwise, yes, please come up to Nick and me. And thank you all for spending your time with us today. Thank you so much. It was great.


  • This article is automatically generated using Amazon Bedrock, while maintaining as much of the original video's information as possible. Please note that there may be typos or incorrect information.

Discussion