iTranslated by AI
re:Invent 2025: Latest AWS Lambda Features - Managed Instances and Durable Functions
Introduction
By transcribing various overseas lectures into Japanese articles, we aim to make hidden valuable information more accessible. This project addresses this concept, and the presentation we will cover this time is:
For re:Invent 2025 transcription articles, information is compiled in this Spreadsheet. Please check it.
📖re:Invent 2025: AWS re:Invent 2025 - Building the future with AWS Serverless (CNS211)
In this video, Usman Khalid and Janak Agarwal, leaders in AWS Serverless, explain and demonstrate the latest innovations in Lambda. Using Serverless MCP Server, an AI coding assistant, they built a Well-Architected Framework-compliant CRUD API in about 5 minutes from a natural language prompt. With Lambda Managed Instances, they selected the latest instances like Graviton 4, achieved 28.5% utilization through auto-scaling, and realized significant cost optimization compared to traditional EC2 operations. Lambda Durable Functions support long-running workflows that can execute for up to one year, with no billing during idle periods, efficiently streamlining asynchronous processes like LLM integration. Furthermore, tenant isolation now enables customer data separation for SaaS applications. The strategy aims to balance an enhanced developer experience with scalability.
- This article is automatically generated based on the content of the existing presentation, aiming to retain as much original information as possible. Please note that there may be typos or incorrect information.
Main Content
Session Start and Agenda Introduction: A Strategic Turning Point for Lambda
I'm going to try to start a little bit early, so I want to feel the energy of the room a little bit. Janak and I are prepared to show you some amazing slides and some great demos. We're actually going to build something together here. We don't usually do that in a breakout session, but it's going to be pretty cool.
So how many of you, just by show of hands, how many of you are developers? Wow, a lot of developers. That's good, because I think you're going to enjoy the building part of it. How many of you have heard about two really big transformational launches for Lambda at this re:Invent? Okay, a couple of you have raised your hands. So that means it's going to be beneficial for a lot of you developers. How many of you are engineering leaders? Okay, a lot of engineering leaders. Fantastic. So I think today is going to be a little bit of a mix of both.
I'm going to introduce myself in a moment. I'm going to talk to you a little bit about the Lambda strategy, or really the serverless strategy overall. What are we trying to do? We'll actually share that. I've been talking to a lot of customers this week, so I'll share some of those anecdotes. There's a little bit of a surprise at every re:Invent. I think this is my ninth re:Invent, so let's get going. I want to share my story with you, and then we're going to build some cool stuff together.
So we're going to talk about building the future with AWS Serverless, but I've actually tinkered with the title a little bit. I was going to talk about the future of serverless, but one of the things I want to share with you all is what is our strategy, what is our belief as to how people are going to build in the future, and why serverless is so important here. I'm going to let my partner introduce himself when he comes up for the first demo, but I am Usman Khalid. Folks, I am in charge of serverless compute.
I've been at AWS for about 12 and a half years, and I started my career in a little-known service called Auto Scaling and Auto Scaling Groups. This was the precursor to, and still underpins, Lambda, ECS, EKS. I've been in the serverless space since 2018, 2019, with services like EventBridge, Step Functions. Those services are also under my purview at AWS. So I've been with this team for a good number of years now, and still going strong. And my passion has always been developers. I was a developer myself. If you Google my name, you'll find I built a breakdancing puzzle game that not too many people bought, but hey, it was a great learning experience. Just in case you're wondering, I am not a breakdancer.
The reason I tell that story is because I'm basically in charge of shaping the strategy for serverless. And Janak and I have really been working on this in good partnership. So that's why we want to share with you. Over this next hour, we're going to share what we've actually done, and how it ties into the overall strategy and where we're taking serverless, because serverless, we celebrated the 10-year anniversary of Lambda last year. But it's really at an inflection point right now. And all of us, all of you in technology, a lot of developers, a lot of engineering leaders, you're seeing this inflection point in technology as well. The kind of changes that the internet brought in the early 2000s. I was a little bit too young for that, but these gray hairs tell a different story. But with AI, and hey, I'm not going to talk about AI ad nauseam, and I'm still learning how to use it myself. And no, I don't think it's going to change all of our jobs. But I do think it's already accelerating us. So I'm going to share some of those anecdotes.
So we'll talk about the strategy and the agenda. We have this build challenge. We're going to build a couple of things. Janak's going to kick that off. And then next, I'm going to talk about some other innovations that we launched a week before re:Invent, because you might have missed it. And then finally, we'll recap and share where we're going next. So that's what we're going to cover today. Sound good? Yes, nods? Awesome.
The Importance of Development Speed in an Era of Change and the Role of Serverless
Now, I love this quote. All of you engineering leaders out there, and engineers yourselves, you know that change is always there. But my favorite quote is that all change is about change. I actually found out that PJ O'Rourke originally said this. He's a journalist, American journalist. And hey, I've already shared with you, we're going through a change, right? And the big insight for me, talking to customers at re:Invent, and specifically talking about serverless and the future of serverless, is that really their old platform strategies, their old way of doing things is no longer working.
What I hear time and time again from developers, from dev leaders, from platform leaders, is that there is a bigger struggle than ever before between engineering teams and platform teams, because engineers just want to try something, and the cost of trying something has plummeted. And I'm not talking about vibe coding, I'm talking about real production engineering. This is also what we're doing with Lambda and serverless. We, and this is part of what we're going to show you, we're going to show you how we've been able to accelerate using AI as a tool, as an accelerator.
When things move this fast, if your deployments, if your operations, if your patching effort is getting in the way, if you're turning the entire SDLC on its head, quite frankly, and this is not just AWS, multiple customers are seeing the same thing,
And so that's why I've had more customer conversations this year than I've had in the last couple of years, because people are coming up and saying, "Hey, we had a community-based platform, but what more can we do with serverless? Our developers want to go faster and faster, and they don't want to carry all of the responsibility for managing code." Really, what I'm trying to say is, we're in a period of evolution. You all feel it. I don't have to convince you of it. We don't know what this endgame looks like. I certainly don't, and I don't claim to, but we're going through a period of evolution.
One of the key parts of our strategy at serverless is that we've always focused on developers. Maybe not focused enough on the developer experience, but we have focused on it, and that has been a key pillar of our strategy up until now. It was all about speed and evolution and creating evolutionary architectures. And I'll show you what I mean by architecture. If we live in an age where things are changing rapidly, and we do, if your systems, your processes, your people can evolve, those are going to be the most successful. The companies that actually evolve and migrate are the ones that are going to be most successful, so shouldn't your architecture be that way too?
I didn't ask how many system architects are in here, but I'm going to guess a lot of you, a lot of engineers, are also system designers. You all know that there is no right architecture, right? With serverless, there's only right trade-offs. And yes, we're going to build highly evolutionary architectures, but coming into the early stages of the conversation, I'm not here to sell you on something. I'm going to be as balanced as possible. Obviously, I'm very passionate about this space, but there are trade-offs. When you have highly evolutionary systems, it's very hard to manage change. It's very hard to observe what's broken, because things are more decoupled, and they're moving very fast.
One of the things that is on the minds of some of your platform team owners is control. As developers want to go faster, faster, faster, how do I apply governance? How do I keep everything compliant? And yes, there are trade-offs that we have to manage. But the last thing I'm going to tell you when I have this conversation is, look, evolution is not new to technology. Just because generative AI has captured the minds of so many people and has been very disruptive for us as an engineering community, speed has always been important in business. And the faster you go, the better chance you have of winning.
The Essence of Serverless: Evolutionary Architecture and Reducing Developer Responsibility
So let's talk about serverless. I don't know if this is the biggest secret, but Janak believes this is the biggest secret in serverless. But look, serverless has always run. I literally run a fleet of hundreds of thousands of servers, including bare metal servers. So they're particularly fun to patch and scale. But what we've done is we've created a façade for developers. Developers don't have to worry about managing servers, managing runtimes, scaling, load balancing, request routing. A lot of these things, we take care of directly.
We have hundreds of thousands of servers in our fleet. From US East 1 to as far away as Asia, and I think we have an Africa region too, right? Right, all around the globe. But to you, to the developer, to the engineering leader who has built a serverless architecture, this is what it looks like. And in this particular diagram, I'm just playing with the idea that I want to build an agent or an application that can guide users in times of national emergency. So yes, I'm talking to a lot of MCPs here. I'm talking to a lot of different services that are available. I think FEMA is in here too.
What I mean by "evolutionary" here is that each feature, each aspect of my idea, is really one horizontal line passing through Lambda. Obviously, I'm using AI, I'm using Bedrock to host my models. Let's imagine in this particular architecture, I want to add a new feature. I didn't put it on the slide on purpose, but let's say I want to add a notification feature where customers can sign up for notifications. And I want to be able to send them SMS or email notifications. If they sign up, that's just a branch. I don't have to update a bunch of microservices. It's just a branch.
One of the key things is, we know tools are evolving rapidly, even in the last six months rapidly, but one of the things I've seen my engineering teams do is that smaller changes, smaller contexts, smaller files are much better, iterate much faster, and AI is much more accurate. Building these things is literally a one-shot. I mean, white coding is, quite frankly, I think a marketing term at this point. How about that? All of these are single-responsibility Lambda functions, super easy to manage, super easy to update, super fast. Janak's going to come up later and he'll show you some of these.
And look, at the end of the day, when you build that architecture, yes, there were a lot of pieces. You probably had to use Infrastructure as Code to actually make and manage the system. But once you set it up, I like to talk about the "ilities."
Maybe some of you have platform teams, and other people will manage servers and scale them for you. They're focused on reliability and durability, but that's actually not correct. Even for platform teams, they don't know your software, so a lot of things about security and scalability also filter into the software that developers write. And if you're choosing underlying infrastructure technologies, then you're responsible for those things. If you start thinking about your ideas going global scale, you know how hard that becomes.
And look, at the core of it, the reason why customers love, and why build, excuse me, I went too fast, the reason why customers love building on serverless is because we manage all of the "ility" parts that I showed you on the previous slide. This is what serverless is all about at the end of the day. It's not about having servers. It's about having no servers to manage, no infrastructure to manage, and being able to express your logic in the fastest way possible.
Speed is key. So I've already told you, speed always wins, and that is our number one goal. Our number one goal has always been, how can we get to market the fastest? How can we realize the ideas that customers have in the fastest way? And I'll share an interesting anecdote, many years ago, I think it was 2013 or 2014, about a year after I worked on Auto Scaling, my boss asked me, "Hey, what's the mission of Auto Scaling?" This was before Lambda, before ECS, before Kubernetes. I said, "Hey, it's the fastest way to take an idea and scale it so that the best ideas win." Obviously, we've come a long way from just using Auto Scaling VMs, but that statement at least to me has remained true. All the time I've been in this space helping people move faster and faster, because this is important. I think this is important in everything from why customers adopt the cloud, to how you build cloud native to move faster.
I really want to share a simple anecdote. This is also a very recent anecdote, just in the last couple of months, but CyberArk, a platform engineering team, basically built their entire platform on serverless. And they were able to do the automation work that they needed to do, and basically save about four months out of 12 months it took to build a new service. So this quote itself is probably about six months old. Given the state of the tools, I think that number is probably even lower now. And again, once something is written, I mean, I think that's the endgame, I also have a Tesla, and unfortunately I paid for Full Self-Driving. It's not happening years later, but at the end of the day, it's about liabilities. The last "ility" is liability, and with serverless, more of that responsibility is on our side. At the end of the day, my engineers and I are responsible for the scale, patching, and security of your applications, not you. So yes, there is still shared responsibility, but a lot of it is on our side.
And look, it's not new technology anymore. Just as we're about to get into what we're going to talk about, I think what we've done in the last 12 months and what we've launched at re:Invent has truly transformed this 10-year-old technology. But serverless is already everywhere. Obviously, I'm not going to go through all of these. I just wanted to highlight how many major companies are running large-scale applications, scale applications, on Lambda today.
Janak Takes the Stage: Achieving Production-Quality Code Generation with Serverless MCP Server
Alright, let's go. Enough about the context of why we're here. I think many of you are familiar with this feature, so let's get right into it. And I'll introduce my partner, Janak Agarwal. He's going to introduce himself and talk about actually building things. Thank you. Can you hear me? Can you hear me now? Alright, thank you again, Usman.
Hello everyone, I'm Janak Agarwal. I was a developer for about eight years, so my perspective on serverless is shaped by both being a developer who had services that needed to work perfectly in production, and now being in a role where I'm trying to build tools and services that developers can trust to run critical workloads. So I still like to think of myself as a good developer, but probably not. But I am a product manager, and I lead PM for Lambda.
Lambda has been around for 10 years now. And any product or technology that has been around for 10+ years, there are always some notions that form. Preconceived notions, sometimes even biases, about what that technology can and cannot do, what it's good for, what it's not good for. But there are always inflection points. And I believe, I truly believe, that serverless is at one such inflection point right now. So what we're going to do next is we're going to have some fun. We're going to spend the next 30 minutes or so building an application. And along the way, I'm going to show you some of the features that we've launched that now allow you to bring workloads into serverless that you previously couldn't.
So here's what we're going to build. It's a note-taking application. It'll have create, read, update, delete capabilities.
We're going to scale the application, and then we're going to build some new features that our customers want. Note encryption and decryption, and sentiment analysis. Finally, for those of you who like to write really long notes, and research tells us that our attention spans are getting shorter, we're going to summarize your notes with AI.
So let's move to phase one. We're going to build out the underlying CRUD API right now. And I'm not going to show you my typing speed. What we're going to leverage here is what Usman was talking about, vibe coding. And the key to vibe coding, successful vibe coding, is a technology we released earlier this year. That is Serverless MCP Server. What that allows your favorite AI coding assistant to do is to translate natural language prompts into better, well-architected code that is production-ready and runs super fast to standards.
So let's get started. We're going to go through three phases. The first one is, we're going to install the MCP server. My personal preference, I also install the Doc MCP server. I've found that that really helps when I'm using new technologies that have been announced recently. And I'm sure you all will agree with me, I hope you all say, "Yes, that's right." And Doc MCP server alleviates that. For those of you who know, you know. But I like to use it.
In the second phase, we're actually going to write the code for the CRUD API. It's going to be a full serverless architecture. So we're going to use API Gateway for ingress, Lambda functions for our CRUD operations. We're going to use a DynamoDB table for our serverless database, and we're also going to use CloudWatch structured logs. Observability is really important in serverless architectures. And since I still like to think of myself as a developer, I like types. So we're going to use TypeScript for this.
And then finally, the build and deploy phase. What you're going to see here is that the MCP server, by default, uses SAM as a tool for building and deploying. SAM stands for Serverless Application Model. It's purpose-built for serverless building and deploying. It makes building and deploying super simple, and it can also simplify the local testing phase.
Automatic Incorporation of Best Practices by MCP Server
So what I've done is I've installed the Serverless MCP server. And over here is an image showing the tools. At the time I captured this image, which was about a week before re:Invent, there were 25 tools. These tools are currently available for your AI coding assistant. Some of the key tools are getting guidance on what workloads are good for Lambda, what are not good for Lambda, how to build and deploy web apps, how to build and deploy event-driven architectures, including Kinesis, Kafka, etc. It also knows how to get CloudWatch metrics. You need to know what metrics to get from where, and it will help you fine-tune that.
Next, I'm going to write the code. I throw the prompt that we saw on the previous slide, and in about five to ten seconds, my AI assistant, Quiro, tells me that the code files are already complete, and the next step is to build and deploy. Here I've zoomed in on some images to show you what was done internally. You can see the project structure that Quiro, that is the AI coding assistant, created with the help of the MCP server. You can see the template.yaml file. This is out-of-the-box Infrastructure as Code support. The package.json simplifies application dependency management. The TypeScript files are obviously the business logic, and the tsconfig understands the build steps to execute when simplifying the build or transpiling TypeScript to JavaScript. And error handling is also automatically incorporated.
Without the MCP server, this was not included. I don't have the previous images, but this is a later image. You can see that sufficient input validation is being performed. It retries three times for DynamoDB writes. That's good practice. Some additional best practices are integrated here. You can see there's a global error handler here. Consistent HTTP status codes, structured logging with CloudWatch, all of this is available automatically from version zero of the written code. This means that the code generation process is now much more compliant with the Well-Architected Framework.
The next step is to build and deploy. I provided the build and deploy commands to the assistant. SAM takes over and performs the deployment using the YAML file, TypeScript configuration file, and package file. In the process, it applies best practices. I'll zoom in on this image a little later. You'll see the CloudFormation change stack, which shows the delta of all the work to be deployed to AWS. It uploads the code to S3. Soon, hopefully, it will complete. Here's the change set. And if you go to the console, you'll see the functions starting to come up.
The console should load momentarily. There you go. Here are all five functions we wrote, ready to handle traffic. Here's a zoomed-in image. During the build, SAM detected that our API had no authentication. It asked for confirmation if this was okay, and since it was a demo, I selected yes. Here's a snapshot of the change set you're guided through. It's adding all these files, rules, and permissions for the database and functions. The build and deploy steps are also made much simpler by SAM.
Think about it, I don't know if I've been talking for more than five to seven minutes, but in about five to seven minutes, we have a fully functional backend and cloud operations. A cloud API is fully deployed using natural language, with the Well-Architected Framework incorporated. When I talk to customers, they say that generative AI has indeed accelerated the code generation process, but the actual shipping of software isn't getting faster. That's because more processes are incorporated into the shipping cycle. There's Infrastructure as Code, there's code review, and so on, and none of this is truly sped up by generative AI. That's where serverless has been trying to innovate across the entire stack.
We apply best practices when generating code, during the build step, and during the deploy step, as you just saw. The key point I want you to note here is that the MCP server helps generate code with incorporated best practices. I work with many developers, and their productivity is high, and they're actually shipping software faster. That's because Infrastructure as Code is incorporated. What they like most is the consistent quality of the generated code. Many customers complain that some people just write code and send it for code review without understanding what it is. By generating code of consistent quality, such problems can be minimized.
Hands-Free Scaling: Addressing Needlepoint Traffic
Let's move on. Let's say our application gets featured in some news media. As a result, we suddenly get a huge surge of concurrent access. How do customers address this scenario? They tell us they over-provision capacity. Over-provisioning means provisioning for peak times with the understanding that at some point, as application developers, they'll come back and try to optimize costs by applying appropriate scaling policies. However, this peak-time provisioning leads to higher costs and a lot of manual maintenance. Tomorrow never comes. Scaling policies always need continuous optimization, but instead of focusing on optimization, we find ways to build features.
How do we handle this in serverless? In serverless, we offer hands-free scaling options. Our scaling speed is almost the fastest among all compute options available to customers. We provide 1,000 execution environments every 10 seconds. Think about it, if a function's execution time is 100 milliseconds, you're actually getting 10,000 requests per second every 10 seconds. Out of the box, without lifting a finger. Let's run a load test. We designed a very basic custom load test. It's been about a minute here, and you can see that the traffic literally increased by 700 to 800 times in 25 to 30 seconds. Not a single error, not a single throttle. Lambda was able to absorb it instantly, and we didn't have to lift a finger for it.
This is the power that Serverless offers. This is the kind of workload where Serverless truly shines, making many new users and end-users happy. And all of this can be achieved with no idle costs. The key point is that our scaling speed is really fast, the fastest among all computes. Let's call this type of traffic "needlepoint traffic." Imagine building an application where hundreds of thousands of spectators in a stadium have to scan a QR code every time a goal is scored. It can handle that seamlessly. Or if there's a flash sale, and many people are wearing that company's shoes, and new shoes are released, and a new flash sale starts, it can handle that seamlessly.
Testing is very simple because we are literally only testing the application logic. We are not testing scaling at all; it just works. Lambda provides that out of the box. You've seen us build multiple features within the application, four or five APIs, and all of them scale independently at the same speed, without affecting each other. In other words, the noisy neighbor problem is virtually eliminated. And all of this can be achieved without managing infrastructure. So far, so good. Let's move on.
Introduction of Lambda Managed Instances: Blending EC2 Flexibility with Lambda Simplicity
We're good people, so we listen to our customers. And based on customer feedback, we're building new features: note encryption, decryption, and sentiment analysis. If you think carefully as a developer, the workload profile here is changing. It's no longer just a CRUD API; it's a more CPU-intensive workload. Since we listened to our customers, this feature is also gaining popularity. That is to say, the aspect of scaling to zero is no longer as important. There's always traffic to serve, and there are always users to serve. In other words, there's always some steady-state traffic. And here, loosely defining steady-state, it means a peak-to-normal traffic ratio of about 2. How do we handle this in serverless? Well, that was difficult.
When I talk to customers, they say things like this: in this phase of the application, they really want to drive optimization. They want to optimize costs, optimize performance, perhaps by leveraging the latest compute, memory, and network-intensive instances, etc. And they want to achieve all of that with the developer experience they currently use, love, and are familiar with, with all integrations and full serverless operations. In other words, they don't want to lose focus from their core business logic; they want more choices while continuing to leverage the practices they use today.
So, what were they doing before this week? They were moving away from serverless, away from Lambda, and focusing on redesigning their solutions. And that was a very inefficient use of engineering time. The focus on business logic diminished, and maintenance costs continuously increased, and again, it was very inefficient. So we wanted to design a better way to support such scenarios in Lambda's serverless offering.
I'm delighted to introduce Lambda Managed Instances. The mental model behind this feature is that we want to provide all the simplicity, developer experience, integrations, and tools of Lambda, and combine that with the particularity and flexibility of EC2, meaning the choices of compute, network, and memory that you get. So with Lambda Managed Instances, you have access to the latest instance families like Graviton 4, and you'll likely have access to Graviton 5 as soon as it's released. You'll get the latest generation of memory compute, network-optimized instances.
All of this remains fully managed. As Usman said earlier, serverless doesn't mean no servers; it means we manage the servers. Here, we're giving you choices, but we still manage the servers, so we handle scaling, patching, request routing, everything. You can continue to use the same broad event source integrations that you used, and still use, with Lambda. And with Lambda Managed Instances, we've also added a new feature called multi-concurrency. This is the ability to process multiple requests from the same execution environment, which has been a long-standing request from our customers. And when you combine this feature with EC2's price incentives like Savings Plans, Reserved Instances, etc., your costs are truly, truly optimized.
And I'm about to show you how.
Implementing Lambda Managed Instances and Demonstrating Cost Optimization
To use Lambda Managed Instances, you simply create a capacity provider. When you create a capacity provider, you have the option to specify your preferred choices. I want to emphasize the word "option" here, because these are truly optional settings. You can configure instance types, scaling policies, maximum and minimum values, etc., but it's all optional. You can also let go and let Lambda and AWS take care of it. Once that's done, you create your functions the same way you do today. You just configure them with the capacity provider. Then, Lambda handles scaling, patching, execution, and provisioning instances. It selects the right instance for your workload and drives a continuous optimization loop for utilization.
Now, adding servers to serverless is a difficult thing to get right. So we did a lot of research to make the experience as simple as possible. Let's see how we did it. I'm currently in the Lambda console. I'm going to head to capacity providers, which appears as a new feature.
Create a capacity provider, give it a name, specify your VPC, subnets, security groups, and an operator role that has access to operate EC2. Then we go into the advanced settings where the real action happens. Here you can choose architectures like Graviton or x86. You can include or exclude specific instance types, or you can let Lambda choose. You can apply scaling policies, such as maximums to cap costs, or minimums to always have instances pre-warmed to handle traffic. You can also add tags for tracking purposes.
And when we come back here, you'll see the capacity provider is active. At this point, no EC2 instances have been launched yet, because we've only created a building block called a capacity provider. There's no charge for creating this capacity provider. The next thing we wanted to get right was the same developer experience as Lambda today. Let's see how we did that. Here's the updated function creation flow. It's exactly the same, but with one new parameter. That's the capacity provider configuration.
And here, I want to highlight two additional features. One is the multi-concurrency support I just talked about. And you also have the ability to customize your memory to CPU ratio. Just like EC2 instances, you can now choose your memory to CPU ratio in Lambda to match compute-intensive, memory-intensive, or general-purpose instances. So you can imagine that this now allows you to run a new class of workloads in Lambda that you couldn't before.
So next, I'm going to throw this create config command to my assistant. And it's going to create the function, and if I go back there, and I go to the configuration tab, lo and behold, it's configured for capacity provider. I'm going to go in and for demo purposes, change the memory setting for the function to 4GB. And I'm going to quickly go back to the other setting and shift the multi-concurrent setting to 64 concurrent requests from the same execution environment. I'm going to change my memory to vCPU to 4 to 1. And I'm going to hit save, and when I go back to the capacity provider, you see that the function is active.
And at this point, my EC2 instances are created. Here's my EC2 console. And if you look closely, when I first provided the subnets when creating the capacity provider, I provided them across three availability zones. So my EC2 instances are now also spread across three availability zones. This means my instances, and by extension my application, are now finally AZ balanced. And at this point, once the function becomes active, the EC2 instances are spun up, and the execution environments are pre-warmed with multi-concurrency support enabled, ready to handle traffic.
So let's start a load test with a synthetic workload. And about 13 minutes into the load test. We beefed up our load testing tool for this. As you can see, the traffic is much more steady state. It's still increasing, but by our definition, it's still steady state. And you see a scale-up from three to seven. We achieved about 25% utilization. The utilization of memory and vCPU, and the instances, the underlying EC2 instances, is still very healthy, between 15 and 35%.
And remember, the higher the utilization, the more cost optimization you get. So the key point here is that utilization is built in. And about an hour into the load test, traffic is still steady state, increasing but steady state. We have exactly 932 throttles, but again, throttles are good. They can be handled in your code with retries or queueing, but zero errors. The error rate is still zero. We're scaled up to 21 instances now, and if you observe closely, both scale-up and scale-down are being triggered. In a way that tracks the CPU utilization of the capacity provider, we're still at about 25% utilization, 22.5% in this case, to be precise.
And we let the load test run for about 90 minutes. And here are the key results I want to highlight. When I talk to customers, many of them say they incur this technical debt when they provision for peak when their workload profile shifts. And when they incur this technical debt, they don't apply scaling policies correctly, or they don't take the time to optimize. As a result, they achieve low utilization. At about 6% utilization, my synthetic workload, the load test, would have cost $8.50. But if I manually improve it to 9%, it becomes $5.67. However, with Lambda Managed Instances, or LMI, auto-scaling is actually built in. You have to actively choose to turn off auto-scaling, so it's optimized from the start. And at the lower end of the scale, you can immediately achieve about 25% optimization.
This was literally a 60 to 90-minute test, and at the 130-minute mark, as you saw earlier, the utilization had reached 28.5%. So, with that utilization, the EC2-only cost would have been $2.05. Now, in addition to this, Lambda Managed Instances apply about a 15% management fee for EC2 instances. Think about what you get for that 15%. Auto-scaling, provisioning, patching, continuous optimization based on changes in your workload profile. You literally don't have to redesign your solution from Lambda. So you save all that time. You can continue to use the same CI/CD pipelines, the same observability toolsets, and all the same integrations are available.
So let's talk about some key points here. With Lambda Managed Instances, the mental model is to give you the simplicity of Lambda and serverless, and the particularity and flexibility that EC2 offers. And maintenance is simple. For example, if four out of the five functions that make up your application require scaling to zero, you can leave them as they are. The one function that has reached a steady state, or a different workload profile, a CPU-intensive function, you can move it to Lambda Managed Instances. We continuously optimize utilization to provide cost benefits. There's zero infrastructure management here. And with the new ability to set CPU to memory ratios, you can bring in many more workloads that were difficult to run on serverless before.
Usman Returns: Three Innovations Removing Serverless Barriers
And Managed Instances, they just work. All those ilities, as Usman was talking about earlier, reliability, patching, scalability, availability, all of those continue to be the responsibility of AWS and Lambda. And all of that comes to you with the same developer experience that you get with Lambda today, without redesigning anything. Next, we're going to build workflow-based architectures. I'm going to bring Usman up to show you how to build that and also talk a little bit about our strategy. So thank you very much.
Alright, thank you, Janak. I really think LMI deserves some applause. That was really, really cool. I have a couple of cool stories. He's going to be mad at me because I didn't do this in rehearsal. The engineers were actually really mad at his demo. Did you guys see what he was doing in that demo? He was basically creating spike traffic of 1000 TPS for one second. For those of you who operate distributed systems and scale distributed systems, you know that that is pretty much the worst workload. So the engineers were saying, you talk about synthetic workloads, this is ultimately synthetic. But again, you were able to see the results of how the system scaled and how it performed. It was fun for a couple of weeks leading up to getting that done.
And second, I wanted to tie it into what I told you all at the beginning. We talked about trade-offs. We talked about observability. We talked about control. We talked about evolutionary architectures and how you have to use Infrastructure as Code.
So let's talk about where we are right now in terms of trade-offs, given what Janak just talked about, and then I'll show you a couple of other things that we just launched. This was launched on Tuesday, what I'm about to show you. So for those of you who remember that diagram where I showed you the FEMA emergency response application, what was the hard part? Your application is now highly evolutionary, single responsibility. No monoliths. You're using Infrastructure as Code. And the power of AI that Janak showed you, by actually incorporating the best serverless practices and generating SAM, removes that trade-off.
One of the hard parts, for me personally as a developer, I told you all, I used to be a video game programmer, hardcore C++ developer. It was always hard to write YAML. It's still always hard. I don't know why I can't do it, but I don't have to now. In fact, we get really great results from our MCP Server with best practices built in. So that is one problem that has been removed from developers and infrastructure developers.
Second problem, and again, I'm going to be very candid and frank here, people were saying, hey, Lambda is too expensive when it scales. If my idea gets big, I'm going to hate it, or my boss as an engineer is going to hate it, because the costs are going to be too high. And as Janak showed you with Lambda Managed Instances, how we can achieve incredible utilization. Folks, I told you I used to operate Auto Scaling. I've created hundreds of Auto Scaling groups in my tenure at AWS. I don't think I ever actually ran code on any of those Auto Scaling groups because I was just testing the service. This is where you can literally go from infrastructure to code, to highly utilized code, in minutes, and that's the most incredible thing about Lambda Managed Instances.
So two fundamental things about serverless were, hey, it's expensive at scale, or if my idea gets big, I'm going to have to re-architect, or I have to deal with Infrastructure as Code, and that's complex. We've actually addressed those things really well this year. So let's talk about the third thing, which was, hey, Lambda doesn't run long-running workflows or long-running jobs, and if I have this long-running problem, I'm going to hit a 15-minute timeout and I'm going to really hate my life again.
Lambda Durable Functions: Enabling Long-Running Workflows with Built-in Reliability
So let's talk about workflows. And folks, I've been involved in workflow services for a long time. Auto Scaling was one of the biggest users of Simple Workflow, and still is, a little bit behind the scenes. I also own Simple Workflow. I'm also the engineering leader for Simple Workflow. One of the things we've talked about is, at least I've been talking to customers about workflows for 12 years. Developers didn't get it, unless you were working at Amazon, obviously, because we really understand it. Internally, we know that when you want to build reliable distributed systems, you need a workflow. And what's brought workflows to the forefront again, where everyone wants to talk about workflows, is obviously AI agents and AI-based workflows. So there's obviously a lot of interest in this.
But look, this is the system we're building. Why do we want to do this? Because orchestration is important. In applications, in the new types of applications that customers are building. In this particular case, we're talking about an extension of what Janak started. I want to be able to summarize notes from my note-taking application. These are some of the steps. I need to get the notes from the storage space. I think it's DynamoDB in my example. And then, this is the thing about LLMs. LLMs are asynchronous. You have to wait for a response. And if you want to scale LLMs, everything around it needs to be more asynchronous, whereas everything else is synchronous. You need to generate a summary and save the result. These are the steps that literally map to a workflow.
And look, if I were to write code today, this code could run anywhere. It could run on EC2, on any compute, this code can run on Lambda as it is now. But then you'd be responsible for the reliability of all these steps. If you're not using a workflow system, you're responsible for figuring out when to handle retries, when to roll back if something goes wrong. As you can see, there's a manual checkpoint somewhere. Unfortunately, there are no line numbers, but it's doing a lot of sleeps and waits. And while you're waiting, especially if you're waiting in Lambda, you're paying for compute.
Now, what we heard from developers was basically this: I don't want to write code like this, even if it's simple. AI writes this code for me really easily. I don't want to write this code myself and figure out how to make it reliable. Of course, I want to write code and use my tools and IDE. Pause and resume is a really powerful step, because if you look at the most powerful uses of AI today, basically with AI code generation, who is the human in the loop? It's you, the developer. So being able to pause and resume workflows is super powerful. And finally, I want to use my favorite programming language.
So, we listened to you. And now we introduce Lambda Durable Functions. No, Matt already introduced them. What you do with Lambda Durable Functions is simply write code. You write simple sequential code. Well, all code is sequential.
Reliability is built in. Reliability, retries, and workflow semantics are built in. We currently support Node and Python. More languages are coming. As I said earlier, the team worked super hard to get it ready for re:Invent. More language support is definitely coming, but Python and Node are super popular in Lambda, covering both of the most common languages.
Yes, you can pause and resume long-running operations. And while you're waiting, you're not charged at all for Lambda. Finally, this is the beauty of managing reality. It's still Lambda, after all. Yes, some of you may have heard about durable executions or workflows. What's very unique is that this is a compute service, Lambda, with reliable workflow capabilities built in. There's nothing else. You simply write a Lambda function, and I'll show you what that looks like.
So what do durable, durable functions actually do, and how do they work behind the scenes? They come with a very simple SDK. If you choose to, and I'll show you how, when you select a durable function in the Lambda console, the SDK is automatically loaded as part of your runtime. It has checkpointing capabilities. You decide when to take a checkpoint. By the way, this entire system is built on the same underlying system that powers Step Functions. For those of you familiar with Step Functions, a lot of this will immediately make sense.
You can take checkpoints, and in Step Functions, all states are checkpointed. In this case, you write the code, you decide when to take a checkpoint, and then you replay. The idea behind replay is that if you're waiting, or if you're resuming a function, or if something fails and you want to retry from failure and replay, that's what leads to reliability, but you don't have to replay everything that's already been checkpointed. You simply get those results back, so you never waste compute. To tie this back to what I said about long-running workflows, we also thought long and hard about Lambda Managed Instances. And customers said, since we're already paying for entire EC2 instances, shouldn't we be able to run Lambda functions for hours? But the whole point of doing that would be to build inherently unreliable software. So we thought, can we do something better? And we collaborated with the Step Functions team, and the teams worked together to build co-functions inside Lambda.
Alright, these are what enable durable execution. It's easy to get started. You select a function name, but you provide an optional durable configuration. That's how you turn it on. There are default values for execution timeout and retention period, so you're basically just saying you want this function to be durable. And I'll actually show you what a running function looks like.
I've already created one of these functions. They have a new tab in the console called Durable Executions. Again, for those of you familiar with Step Functions, you're used to workflow observability. We've incorporated a lot of that. So I'm going to launch a durable function. Don't worry, I'll explain the code shortly. And you can see that a new execution has started. I'm going to quickly click on it and examine it. You can see that it describes the steps where the system and workflow have already retrieved the notes. It's starting to process them. Next, it sends them to the LLM for summarization. And you can actually see the progress. So in my demo, there are obviously no failures, but if something were retried, you could quickly identify where the asynchronous tasks weren't working correctly. And there you go, it's all done.
These workflows can run for up to a year. Of course, the most typical use cases are short transactions, sometimes less than a second, sometimes a few minutes, but you can absolutely build human-in-the-loop systems that can last for a long time. So let's look at what the actual code looked like. I'll zoom in. First, I want to emphasize that for many developers familiar with Lambda, you know that Lambda has a context object, but when you create a durable function, you get a durable context object. And this durable context object has the ability to wait, to checkpoint, meaning to execute a step, and also has structured logging to see where you are in the workflow execution.
The durability part, again, is built in, so I'm actually going to show you the step of getting the note from DynamoDB in this particular case. The code you write is basically a step, and this comes from the context object. In the DynamoDB call, it executes the get item command there, and since there's structured logging, you can actually see what just happened in the console or in any observability tool you're sending logs from your Lambda function to.
One of the coolest things about Step Functions and workflows in general is idempotency. Idempotency is very important when doing transactions. Idempotency allows you to basically have one workflow, one type of unique workflow, and prevent a second version or instance of that workflow from starting. This is achieved through the name of the durable context.
When you start a durable function, if you give it a name or an ID at the start, you can maintain that consistency, so that two objects are not working on the same thing. This is a super powerful use case for a lot of applications.
And finally, I want to emphasize wait. Wait is as simple as calling context.wait. And while you're waiting, we shut down the execution environment, so you no longer have to pay. The thing you're waiting for can resume, or I'm using very simple code, basically wait, but you can actually use callback with condition to wait until a certain condition is true, and then the execution environment starts up. All of this basically saves you time while waiting, and you don't have to pay for compute resources or any resources during that time. And once the wait is complete, the workflow resumes exactly from the step after that wait.
Now, in the previous example, I showed you some steps. I rushed through the code, but what would it take to actually build such a workflow? But for the developers here, and the engineering managers who review developer architectures, you can imagine what it would take to build something like this in a traditional architecture. You'd use queues, and you'd use compute everywhere at various steps between queues, because you're not properly bundling things. Deployment becomes more complex, debugging becomes more complex. If something goes wrong, all the fundamental challenges of replayability, rehydration, and distributed event-driven architectures come into play. But with simple orchestration, simple deployment, and a simple SDLC, you can basically build really reliable workflows. And again, there's really nothing comparable to this technology. There are a lot of workflow technologies out there. But this is actually a compute technology with workflows built in.
And then, long-running Lambda functions are here, now running for up to a year. Anyway, I've covered these points already, so let's move on.
Enhancing Developer Experience and Adding Tenant Isolation Features
So, I've talked about what we're doing with MCP server and Gen AI development. We've done a lot. By the way, how many of you have installed the actual developer tools for VS Code? A few of you, great. For developers, I highly recommend you try our developer tools from the AWS Toolkit. Most of you probably use VS Code. For example, one thing you might be missing is that you can remotely debug Lambda functions. You can actually set breakpoints in a running Lambda function using our tools. So we're really focusing on the developer experience. But Janak has already talked about MCP and how it makes the developer experience super easy, especially with IaC and our best practices. I also talked about LMI, which allows you to run long-running, steady-state, large workloads with incredible discounts and EC2 instance choices. And I talked about durable functions for you. This really allows you to run reliable workflows, because reliability and durability are very important for long-running workloads. It's not a matter of if infrastructure problems will occur, but when. But we're not done yet. I want to talk about one more thing we launched about a week before re:Invent. This is especially relevant for security-sensitive SaaS applications.
Here's the scenario. Let's say you have three customers, three tenants, in SaaS and they're using Lambda. Lambda might be using global variables in your Lambda function, for example. There might be something in a temp file, or you might be using our temp storage. You might want to restrict it. In other words, you want to say that executions or invocations to this Lambda function should only communicate with a DynamoDB table and only retrieve data from the row of the DynamoDB table for the customer calling this function. In this case, you have a blue tenant, and they created a Lambda function. We created an execution environment for them, but something might be left over, a global variable might be left over. And then the yellow tenant calls. But the problem with the yellow tenant is that they might also leave some side effects in the Lambda function. Or if you wrote code like this, you might not be able to isolate those side effects.
And finally, of course, there's the green tenant, and again, requests might be sent to the same execution environment, and there are side effects. And this isn't unique to Lambda, right? This applies to any compute you run, EC2 instances, containers. One of the hardest challenges is how to isolate customers from each other without actually creating separate infrastructure for each customer. That's pretty expensive. You can do it. The way we traditionally did it in Lambda before this feature came along was to create a Lambda function for each customer. It sounds ridiculous, but there are customers and use cases where you have to do that. But you don't have to anymore. About a week ago, we launched tenant isolation. What this does is, within the invocation, meaning you're writing the same Lambda function, but you pass a tenant ID.
And what we're doing is creating a separate execution environment for you that is not shared with any tenant. The idea behind this is that if you have sensitive software, if you have AI-generated software, or if you're in an environment where you really want to isolate your customers from your infrastructure, you can now do that, and it's super simple. There's no additional infrastructure to manage, and you don't even have to manage additional Lambda functions.
Future Roadmap: Developer-Centric Continuous Innovation
Alright, we're almost done, but since we have a little time, I'd like to invite Janak onto the stage and take your questions. But before that, I want to talk about our strategy, because we're not done yet. I always like to leak a few things, but we're going to complete some big things by re:Invent. In the next six months, we're also working on other big things that we didn't make it in time for, so keep your ears and eyes open for more information. But all of them will be in the same direction. Lambda has always been for developers. Lambda has always been about speed, and we're really embracing that mindset.
What we're trying to do is remove obstacles. For example, objections from platform teams like "costs might get too high," "it's really hard to implement ISC correctly," and fundamental things like "can we really manage costs well with Lambda?" All of these will be incorporated, and we're really laser-focused on developers and how developers can move faster using our services. This has led to the last 18 months, where I've talked about remote debugging and some things I haven't shown you, but we're deeply involved in the area of developer tools to ensure that you don't have to be a serverless developer to benefit from serverless development, you just have to be a developer.
Alright, to recap the whole thing, we have a mirrored roadmap focused on developer experience, and the foundational pieces are in place. Next up is observability. Customers have asked us for OpenTelemetry support, and we want to provide native OpenTelemetry support. We've already made many launches in the past few years to make structured logs and Lambda OpenTelemetry compatible, but we want to provide full OpenTelemetry support for our customers. This is an important part. This is one of the key trade-offs I mentioned with evolutionary architectures, so we want to continue to improve observability.
More runtimes. You might have missed it, but we also launched Rust support in Lambda. That was about a week ago, I think, but more languages, more frameworks, more runtimes are coming. Look, LMI and Durable Functions have allowed us to unlock a whole new class of applications that weren't possible with Lambda before, but we're not done yet. There's still a lot we want to do. Because we believe that customers shouldn't have to choose to manage many of the "ilities" I talked about earlier, just because they have business needs that don't fit.
And look, integrations are truly fundamental. The way Lambda can achieve that speed is because so many things are all built in, from EventBridge to SNS, API Gateway, ALB, SQS, Kafka. You don't have to worry about how to make this technology work with your code. It's all built in, and it's our responsibility to make them work. We'll continue to do more on both the developer tools side and the integration side, so that your favorite CI/CD tools and your favorite observability tools will work out of the box with Lambda.
And look, you might not know this, but very recently, this was also just before re:Invent, and Janak and the product team were quite busy, but we also published our roadmap. If you scan this QR code, you can access the roadmap. Please give us your feedback. We look forward to hearing from you. So I'll invite Janak up. Janak, we have a few more minutes. Thank you very much to everyone who stayed until the end on Thursday. If you have any questions, I'll be happy to take them. Thank you. Thank you.









































































































Discussion