iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📖

re:Invent 2025: How Netflix Manages 2 Exabytes of Data with Amazon S3 Storage Lens

に公開

Introduction

By transcribing various overseas lectures into Japanese articles, we aim to make hidden valuable information more accessible. The presentation we will be covering in this project, which is based on this concept, is here!

For re:Invent 2025 transcription articles, information is summarized in this Spreadsheet. Please check it as well.

📖re:Invent 2025: AWS re:Invent 2025 - How Netflix uses Amazon S3 Storage Lens to track exabytes of data (STG214)

This video introduces how Netflix utilizes Amazon S3 Storage Lens to manage over 2 exabytes of data. Netflix ingested Storage Lens data into Apache Iceberg tables and visualized growth trends, detecting over 500 petabytes of unintended storage growth. They also combine S3 Inventory with Server Access Logs to calculate costs at the prefix level and identify unaccessed data. AWS has newly announced the Expanded Prefixes Metrics Report, which supports an unlimited number of prefixes and depths up to 50, 72 new performance metrics, and direct export to S3 Tables. The demo also shows how to query Storage Lens metrics via an MCP server using natural language.

https://www.youtube.com/watch?v=Q2YoHfhFuI8
Please note that this article was automatically generated while maintaining the content of the existing lecture as much as possible. There may be typos or incorrect information.

Main Content

Thumbnail 0

Session Start and the Need for S3 Insights

So let's get started. Hello and welcome to session STG 214. The session on how Netflix uses Amazon S3 Storage Lens to manage exabyte scale data. My name is Roshan Thekkekunnel, I am a Product Manager on the Amazon S3 Services. And joining me today for the session are my colleagues Christie Lee, who is a Principal Solutions Architect for Amazon S3, and our friends from Netflix, Austen Keene, a Software Engineer at Netflix, and Bi Ling Wu, who is a Data Engineer on the Data Science team at Netflix.

Thumbnail 60

First, a little bit of logistics. We have a lot of content to cover today, so we won't be taking questions during the session. However, my colleagues and I will be available in the hallway after the session to answer any questions that you may have. So let's quickly look at the agenda. We've broken down the session into five parts. First, I'm going to kick it off by providing an overview of the Amazon S3 Insights portfolio and then specifically diving deep into S3 Storage Lens, which is the focus of today's session. Then I'm going to invite Austen and Bi to come up and share some of the use cases at Netflix and how they leverage the S3 Insight services to optimize their storage deployments.

Thumbnail 120

And then I'm going to talk about some of the announcements that we made yesterday for S3 Storage Lens that will further supercharge your optimization workflows. And then finally, Christie is going to give a demo and show these new features in action while solving for real performance and cost optimization use cases. Before we dive into the first section, quick show of hands. How many of you are aware of S3 Storage Lens or any of the S3 Insight services? Awesome. So about 25 to 30% of the room has raised their hands. That's fantastic. I hope by the end of this session, all of you will learn about S3 Storage Insights, S3 Storage Lens, and how you can leverage these services to optimize your storage.

Thumbnail 150

First, I want to establish why insights are needed. When you store data, where you store the data and how your applications access that data matters for a variety of reasons. Your workload cost profile, your workload performance profile, and the security posture of your data set. So it is important to understand how your data is stored, where it is stored, and how your data is accessed. As the saying goes, what you cannot see, you cannot change. And in this case, you cannot optimize.

Thumbnail 210

S3 Insights Portfolio Overview

To address this, S3 has built a plethora of services that provide visibility into both storage and activity metrics. From aggregated metrics at the organizational level to very granular insights at the object and request levels. So let's look at the insights portfolio. First, we have S3 Storage Lens. This is an observability tool that provides daily insights into both storage and activity metrics all in one interface. It is the recommended first step to start understanding how your storage is deployed and how it is being accessed.

Thumbnail 250

We make it super easy for you. As soon as you create your first bucket, you get a default Storage Lens account that provides you with over 60 free metrics to help you understand how your storage is deployed. Now, if you need more granular data beyond the organizational and bucket level, we have services like S3 Metadata and S3 Inventory. These can drill down to object-level reporting. So you get a list of objects and their associated metadata. And with S3 Inventory, you can get that daily or weekly in either CSV or Parquet format.

Thumbnail 300

Now, if you need more real-time data, last re:Invent, we launched S3 Metadata. This provides you with near real-time information of all your objects and metadata. And S3 Metadata is delivered in managed S3 Tables. So no additional processing is required, and you can directly query the data with your preferred analytical tool and discover your data. Similarly, if you're looking for detailed request-level information, you can enable S3 server access logs. This gives you a detailed record of every single request made against your bucket.

And finally, we have Amazon CloudWatch. This is our monitoring service. This provides you with visibility into your storage metrics alongside all your other AWS service metrics and your own application metrics. And here you can do pretty much whatever you need to do in your operational workflow, including near real-time analysis, create dashboards, and set alarms.

Thumbnail 360

Thumbnail 380

Key Use Cases and Features of S3 Storage Lens

Now, let's take a closer look at S3 Storage Lens and how our customers are using it. The primary use case for S3 Storage Lens is cost management. Questions like, which are my biggest buckets? Which are my biggest prefixes? How is my storage growing over time? Are there any anomalies, any sudden storage growth that I'm seeing in a bucket or prefix? Storage Lens can help you answer these questions. The second most common pattern we see is around access. Which of my buckets or prefixes are hot, and which ones are cold? Which ones are receiving a lot of requests, and which ones are not receiving as many requests, and therefore might need to be moved to a colder tier? Storage Lens can help you answer these questions.

Thumbnail 400

Thumbnail 430

Third, it's all about performance monitoring. Questions like, which of my buckets are experiencing 503 errors, so throttling errors, or 403 authentication errors? You can identify these buckets and prefixes. These are potentially slowing down your application and are areas that you want to pay attention to. And finally, we hear from customers about security auditing with Storage Lens. Questions like, which of my buckets have unencrypted data? Which of my buckets do not have versioning enabled? Which of my buckets do not have the right replication rules? Storage Lens can help you answer these questions through its data protection capabilities.

Thumbnail 450

So let me walk you through some of the capabilities of Storage Lens. When you log into Storage Lens from your S3 console, the first thing you'll see is this overview page. Here you'll see a snapshot of all the data across your organization. So that's data across multiple accounts, across multiple regions, across multiple buckets. You'll see a summary snapshot here. And you can filter by category of metrics. You have performance metrics, cost optimization metrics, and all these metrics are broken out by category in this view.

Thumbnail 490

Thumbnail 510

And from here, you can start drilling down into Storage Lens. You can drill down to account-level metrics, region-level metrics, and storage class-level metrics. And what you're seeing here is bucket-level metrics. So these are the top 10 buckets in my account. The exact same metrics are broken out across these 10 buckets. And finally, Storage Lens has an advanced tier. When you turn that on, you get additional metrics and advanced capabilities like prefix-level metric aggregation. What you're seeing in this example are the top 10 prefixes by size across all my buckets with the same storage metrics and activity metrics broken out across these 10 prefixes.

Thumbnail 560

Netflix's Storage Management Challenges and Growth Tracking

Now, instead of me talking about how customers use Storage Lens, we have folks from Netflix here, which has one of the largest storage deployments in the world. So let's hear from Austin and Bi how they optimize their storage using Storage Lens and other insights services. Austin, over to you. Hi everyone, my name is Austin, I'm a software engineer at Netflix. For those of you who don't have internet access, Netflix is a streaming company. We provide everything from F1 to reality TV to Stranger Things.

Thumbnail 570

So let's talk about why tracking data is difficult in the first place. We have over 2 Exabytes of data stored in S3, spread across three main use cases, big data, media, and ML. These use cases have different access patterns. Sometimes they're accessing an abstraction layer built on top of S3. Sometimes they're large multi-tenant buckets where different teams own different prefix paths. So providing visibility into usage and cost across an organization is incredibly complex.

Thumbnail 600

So what do we do? We take the Storage Lens information, so that data, and we ingest it into an Apache Iceberg table. We then aggregate all of that into high-level growth trends, and we provide visibility to our application owners. This is an example of a dashboard that we have. This is a high-level usage dashboard that we use to understand the overall health of S3 at Netflix. We also create more tailored dashboards depending on the use case. So we'll have a media dashboard, a big data dashboard, where application owners can at a high level understand what's going on.

Thumbnail 640

But not everybody likes receiving DMs from me on Slack about their S3 usage, so we also provide better tooling. This is where growth tracking comes in. What we do is we provide automated alerts to application owners. When their storage footprint goes above a certain threshold, they'll receive an automated message that will then direct them to a more detailed dashboard. So as you can see here, this user received a Slack alert. They went above a certain percentage in their growth rate over a period of time, and they can go to this dashboard to understand what's going on.

As I mentioned earlier, sometimes we have very large multi-tenant buckets. Different teams own different prefix paths, so being able to get down to the low-level data and really dig into what's going on is very important. With just this mechanism alone, we've detected over 500 petabytes of unintended storage growth. This might be a media bucket where a studio partner just dumped assets and forgot about them. This might be a misconfigured table where data is just piling up costs, and nobody is using it.

Thumbnail 700

Just recently, we had 6 petabytes of growth in a log bucket for our containers, and the team that owned that bucket actually had no idea of its size until we reached out. So just having a simple mechanism to let people know what's going on can really start to save you costs. Another thing we do with our S3 Storage Lens data is data placement optimization. You may have heard that data has inertia. So once you place a large amount of data somewhere, it's very expensive and time-consuming to move it elsewhere.

Optimizing Data Placement and Leveraging Platform Insights

If you're an application owner and you have data in S3, you're going to point your application at S3, and that's going to be it. And for the vast majority of use cases, that's completely fine. But if you have low latency or high IO requirements, you can run into issues. We experienced that when we first stood up our ML training platform with ML training jobs. We found that these jobs had very long cold starts. And that correlated to idle GPUs. What was going on was S3 was throttling as it was trying to hydrate data. We were trying to save a little bit of money on storage costs, and we ended up spending a lot more money on idle GPUs. So what we did is we looked at our S3 Storage Lens request and throttling metrics to identify which applications were having issues, and we placed them on more performant storage like EBS or FSx Lustre.

Thumbnail 770

This eliminated the cold start times, we ended up spending a little bit more money on storage, but we ended up saving costs by optimizing our GPU utilization. Another thing that we do is we take all of this information, and we turn it into platform insights. So we take all of our S3 Storage Lens data, and we provide it to our platforms through internal APIs. As I mentioned earlier, we build a lot of abstraction layers on top of S3, so we run into this problem of how do we tell a user that their S3 bucket is growing when they don't even know they have a bucket in the first place.

So all of this helps with that. What you're seeing here are two different dashboards that our users interact with as part of their applications. We take our S3 Storage Lens data, and we directly put it into the UI so that users don't have to go looking for this information. We know that application owners best understand their data, and we don't want to tell them how to use S3, or if they should even use S3 in the first place.

Thumbnail 820

This works very well. What we really want to do is take this data, make it available to them, and allow them to make the best decisions. Now for those of you who haven't been listening to me so far, now's the time to pay attention. If you're not currently using S3 Storage Lens, I'm going to tell you how I think you should onboard. The very first thing you should do is track trends. Take the data, put it in a dashboard, and review it periodically, just like any other on-call review.

Just having a group that reviews and follows up on this information is enough to start making a difference. But if you're a little bit bigger, you're going to want to invest in automated growth alerts. Take this information, send it to the relevant application owners so they don't have to go looking for it, and you don't need a central authority managing all that information. Lastly, if you're even bigger, you should invest in platform-level insights. If you build a lot of abstractions and platforms on top of S3, you're going to want to take the information that you have and provide it to everybody in a place that's easy to access. This is a quick overview of what we do with S3 Storage Lens. Now I'll hand it over to my colleague Bi, who's going to talk more about what we do with S3 telemetry.

Thumbnail 880

Building Cost Visualization and Prefix-Level Analysis with Bi

Thank you, Austen. Hello everyone, my name is Bi. I'm a Data Engineer at Netflix. While Austen's use case was about the broad growth and metric tracking of the S3 ecosystem, my goal focuses specifically on cost and efficiency. There are three key areas where deep insights make a concrete difference, and Netflix chose to focus on these. First, we aim to visualize the cost of each prefix for our top 250 buckets across the entire ecosystem.

This allows us to pinpoint exactly where our spending is going and identify opportunities for optimization. Second, we aim to provide advanced analytics on iceberg table storage and prefix usage. This means analyzing how our data is structured and using that information to make smarter decisions about storage. Lastly, S3 houses millions of iceberg tables, with all their data residing in S3. So we aim to provide clear visibility into the costs associated with those tables themselves. By doing so, we ultimately have fewer surprises and can manage our resources more effectively.

Thumbnail 950

Thumbnail 960

So in this section, we'll actually dive deep into how we build these cost metrics for all prefixes, down to an unlimited prefix depth, as our underlying dataset. I'll take a step back here. To enable efficient analysis of storage utilization, we need to track storage metrics at every prefix step. On the right, you'll see a sample file directory representing what we're working with and what we're trying to achieve. Bucket A contains prefixes A through D, with an arbitrary number of objects in between. Again, the goal is to be able to understand the size and cost at each step for an unlimited number of prefix steps.

Thumbnail 1000

Now, you might be asking yourself, isn't this exactly what S3 Storage Lens is for, and why are you reinventing the wheel? Well, you're not wrong. When I started this effort, my team and I evaluated several S3 services that might fit our needs. Back in 2023, we faced some limitations that led us not to choose S3 Storage Lens. First, S3 Storage Lens did not include prefixes that represented less than 1% of the total bucket size. This was an issue because some of our buckets are really large and quite multi-tenant. For example, a prefix could represent a large portion of an iceberg table, but still not meet the 1% requirement for the overall bucket.

Second, S3 Storage Lens had a maximum prefix depth of 10. While this was good, it still wasn't enough for us. Finally, as many of you in this room know, cost is not solely determined by size but also by intelligent tiering and storage classes. At the time, S3 Storage Lens lacked this dimension and granularity. Due to these three gaps, we decided to use S3 Inventory and S3 server access logs.

Thumbnail 1060

Thumbnail 1080

So let me walk you through how we combine these two datasets to derive cost. Here again is our file directory. As I mentioned earlier, this is a file directory we might deal with, and I've simplified it to what you would get in an S3 Inventory report. You get size, prefix path, intelligent tiering, and a storage class consolidated into one field which is super useful for us. In this slide, I'm showing you an example of how we created a prefix rollup from the inventory report. Similar to S3 Storage Lens, each row represents the storage class and the total size. We have a total recursive cost, which is the cost of that prefix and everything under it. So at prefix step zero, this would be the size of the entire bucket. And we also have an exclusive cost, which non-recursively represents only the objects at that level. Using this prefix rollup, we can then apply pricing to intelligent tiering, storage class, and size to get the total cost of a bucket at each prefix, down to an unlimited depth.

Thumbnail 1120

Cost Reduction with Server Access Logs and Iceberg Table Analysis

Alright, so now we have our prefix costs. But we want more. We want to save some real money. We want to reduce our AWS bill. I hear you. Unfortunately, a very easy way to save money is to delete files. But how do you do that without deleting important files? This is where server access logs come in. In this diagram, I'm highlighting how we combine server access logs with the prefix rollup we built earlier to understand which files are being used and which ones are not.

Thumbnail 1170

Based on this example, we can see that prefix C has been accessed recently, and we can recommend keeping prefix C. However, we can see that prefix B has not been accessed since 2010. This is approximately 15 years ago, and we can recommend deleting this file. While we understand this is only 30 megabytes, in a large environment like Netflix, this can be a huge opportunity for cost savings.

Now, if you were to take this back to your organization, I have a few recommendations. You need to understand the type of data in your bucket and consider all these recommendations carefully. Some files are not always accessed, it could be every six months or once a year. There are a few levers that you can tune to make this work well. One example would be to expand your time range of what you consider acceptable access or recent access. Another way would be to change the recommendations you provide. So instead of suggesting deleting a file, you could ask the data owner to move it from a frequent tier to cold storage.

Thumbnail 1230

That's also an option. One way we've implemented these access-related recommendations is through Iceberg table storage. So, let me take a step back here and talk about prefix rollups again. As I mentioned earlier, this is the underlying set. Having information at this granularity enables many use cases.

While we've primarily been exploring prefix costs, this can easily extend to Iceberg table costs. For example, we have an Iceberg partition inventory, and we can see that table A has prefix B, and prefix C has partitions. We can easily link this back to our rollups. And presto, now we know not only what storage tier each partition is in but also the size and cost of each partition. And if you sum up the partitions, that's the cost of your table.

Thumbnail 1280

Again, by layering on access logs, we can extend our access-based retention recommendations to specific Iceberg partitions. Reiterating what I've talked about so far, here are some key takeaways about these analytical tools. S3 Inventory, as Roshan already pointed out, is a periodic snapshot. And it's great for detailed object-level analysis. Server access logs capture every request to these S3 objects and cater to a wide range of use cases. S3 Storage Lens is for the bigger picture. It's for tracking organization-wide growth.

Thumbnail 1310

Ultimately, these tools complement each other, providing a comprehensive view of your S3 environment, from the smallest object to the largest trends. This suite of storage insights is an incredibly powerful tool for ensuring the health of your organization at a megabyte to exabyte scale. Special thanks to the team at Netflix, and I'll pass the baton back to Roshan to cover some exciting new features. Thank you, Bi. Thank you, Austen.

Thumbnail 1340

Thumbnail 1360

New S3 Storage Lens Features: Expanded Prefix Analytics and Performance Metrics

Alright, so hopefully you got a sense of how Netflix is continuously optimizing their S3 storage. And hopefully, you've also gotten some ideas of how you can optimize your storage deployments. Let's now pivot to some new announcements about new feature releases in S3 Storage Lens that we announced yesterday. Personally, it's very interesting for me to hear from folks like Bi and Austen how customers are using our services to optimize their storage. And my team and I have been working very closely with Netflix and other customers to understand their end-to-end workflows and to further streamline their optimization workflows.

Thumbnail 1380

Thumbnail 1400

So we've been focusing on three key areas. The first is around visibility. Exactly what Bi was trying to explain here, but customers like Netflix need metrics for all prefixes in a bucket, not just the top prefixes. So that's one area that we've been focused on. The second is around performance insights. Customers want to understand how they can improve the performance of their application when accessing data in S3. This is very similar to the machine learning and data analytics use cases that Austen explained. You don't want your compute to be waiting to get data from S3, that costs money.

Thumbnail 1420

Thumbnail 1440

And finally, it's about ease of analytics. Storage Lens has a lot of metrics, and it has a lot of dimensions. You can analyze it by organization, by account, by region, by bucket, all the way down to prefixes. And it's about how we can make this analysis even simpler. So we are super excited about the three launches we made yesterday. The first one is a new capability that we've added to Storage Lens. This is called expanded prefix analytics. Essentially, you can now get metrics for all prefixes in your bucket with Storage Lens.

Thumbnail 1460

Thumbnail 1480

The second is we've added new performance metrics. Specifically, these metrics are around how your application is accessing data in S3. It provides visibility into that interaction. So it helps you identify inefficient requests and eliminate those bottlenecks. And the third is around ease of analytics, and we've launched the ability to export metrics directly into S3 Tables. These are automatically created managed S3 Tables where you can start querying your data immediately with your preferred analytics tool.

Thumbnail 1500

Let me walk you through each of these features and their use cases. So first up, expanded prefix analytics. We've seen the use cases where Netflix wants to understand at a prefix level, the storage consumed by each prefix, the activity, and the cost of requests per prefix. Similarly, customers ask questions like, which are my prefixes that have had no activity for the last 100 days? Questions like these require precise knowledge of each prefix and their respective metrics.

Thumbnail 1540

Yesterday, we launched a new capability called Expanded Prefixes Metrics Report. This is available at no additional cost in the advanced tier of S3 Storage Lens. Let's compare the two reports. With the default metrics report, you have a storage limit at a 1 petabyte size limit, and the maximum prefix level is 10. This gives you about 100 prefixes per bucket today. With the Expanded Prefixes Metrics Report, you get no storage size limits. You get maximum prefix level depth up to 50, and you get billions of prefixes per bucket. There are no limits on the number of prefixes. And this is available to you at no additional cost.

Thumbnail 1590

72 New Performance Metrics and S3 Tables Export Feature

Next, about performance insights, we've added 72 new performance metrics to the advanced tier of S3 Storage Lens. Again, this is at no additional cost. This brings the total number of metrics in S3 Storage Lens to 198. As I mentioned, these metrics are specifically designed to highlight how your application is interacting with data in S3. And I've broken down these metrics into three big categories that I'll walk you through.

Thumbnail 1620

Thumbnail 1630

So first up, we have a category of metrics that help you optimize your application's performance. An example of that is new request and object size distribution. This allows you to see in S3 Storage Lens how objects are distributed by size for each prefix. That ranges from 0 kilobytes to greater than 4 gigabytes. Similarly, you can see the distribution of requests coming into each prefix. Small requests range from 0 kilobytes, all the way up to 4 gigabytes in the same bucket category.

Now, why is this important? If you're seeing a lot of small requests or a lot of large requests, that's not good. That means your application is not getting the best performance. It could be because you have a lot of small objects within a prefix or bucket. An easy fix there is to consider consolidating your small objects into larger objects. You get better throughput with larger object sizes. It could also be that you have appropriately sized objects within your bucket or prefix, but you're seeing small or large requests. In that case, you might investigate further to see if there's any room for improvement with the application on the client side.

Thumbnail 1710

Another example is a new category of metrics called Concurrent PUT 503 Errors. As the name suggests, this is a category of 503 errors, so throttling errors, but these are generated when multiple applications are writing to the same object. This can be self-healing with the right back-off mechanisms, or if it's a multiple writer scenario, you want to build a consensus mechanism to avoid these types of errors. Again, these are categories of errors that are slowing down your application and can be easily avoided.

Thumbnail 1740

Thumbnail 1750

The second category of metrics directly addresses latency and cost, and we call these Network Origin Metrics. In S3 Storage Lens, you can now see the origin of requests coming into your prefixes. Is it from an application running in the same region as the bucket, so the home region, or is it a cross-region request from another region? This is something you want to pay attention to. You want to avoid cross-region requests as much as possible, because that leads to increased latency and increased costs. With this capability, you can now place your application and your data in the same region.

Thumbnail 1790

Thumbnail 1800

Finally, the third category of performance metrics are those that can potentially enhance your application's performance. This is achieved by identifying whether your application is frequently accessing a small subset of objects. If you have 10% or 5% of objects within a bucket or prefix that are accessed daily, that can be a potential candidate to move to a caching layer or a high-performance storage layer, and that can enhance your application's performance. So that in a nutshell are the new metrics in Storage Lens. Again, this is available at no additional cost in the advanced tier.

Thumbnail 1830

Thumbnail 1850

Thumbnail 1860

Thumbnail 1870

And finally, for ease of analytics, we've added a lot of new metrics and support for billions of prefixes. How do we make the analysis simpler? Today, there are three primary ways for you to consume Storage Lens metrics. First, through the S3 console. The charts that I showed you earlier are available as predefined charts. The second is to publish these metrics to Amazon CloudWatch to view these metrics alongside your other service metrics. And the third option was to export them to an S3 general purpose bucket in CSV or Parquet format, which you can then process and start querying your data.

Thumbnail 1880

Now, we've added a fourth option. This is to export these metrics directly to managed S3 Tables. These are automatically created. You can set a retention period to expire data from these tables, and you can immediately start querying them with your preferred analytics tool, including MCP servers, which enable natural language-based queries. So I've talked a lot about the new features. Let me invite Christie Lee to show you how these features work in action. Thank you.

Thumbnail 1920

Christie's Practical Demo: From Dashboard Creation to Performance Analysis

Thank you, Roshan. Alright, so I'm going to take you through a couple of demos today. I'm really excited about it. Storage Lens has been around since 2020, and since launch, we've been continuously improving it. We've just heard a lot more requests from customers, so I'm really excited about it. However, earlier, when I took the room, it looked like about 20% to 30% of you were familiar with Storage Lens. So we're going to have to start at the basics. How do we get started?

Thumbnail 1940

Thumbnail 1950

Thumbnail 1960

This screen should be familiar to all of you. How do we create a dashboard? In the AWS console, you can open up Amazon S3, and you'll click on create Storage Lens dashboard. Of course, you can do this through CLI, API, SDK, however you like, but we're going to show it to you through the console screen today. A lot of customers, because they want to try it out first, might want to start with a few buckets of metrics, but you can, of course, scale this out to your entire organization.

Thumbnail 1970

Thumbnail 1980

Thumbnail 1990

Thumbnail 2000

You can customize the metrics that you set. You can choose to go to a free tier and just try it out first, see what's available to you, or you can go directly to the advanced metrics and look at all the metrics that are available, all 498 metrics that make up Storage Lens. You can choose which ones you want to use. For this demo, we're going to choose all of them. But these are configurable, so if you change your mind later, you can come back and change the settings.

Thumbnail 2020

Thumbnail 2030

In advanced metrics, customers also love to see prefix aggregation. This is also customizable from 1% minimum and then a prefix depth of 10 prefixes, 10 prefix depth. As part of this configuration, we'll export to CloudWatch. And you'll notice here that you have two different exports. One is a metrics export, which is what we've always offered as part of Storage Lens. What's new is the ability to export directly to S3 Tables. This is a managed Iceberg service for S3.

Thumbnail 2040

Thumbnail 2050

Thumbnail 2060

And of course, you can run both. You can also export to a general purpose bucket in the traditional way, have it stored as CSV or Parquet, or you can send it directly to S3 Tables. One or the other, or both, whatever you choose. If you're happy with the settings, then we'll submit it. Alright, so now that we've set up a dashboard.

Thumbnail 2070

Thumbnail 2080

Now, the metrics for Storage Lens update daily, but when you first create a dashboard, it takes a little bit of time for those metrics to generate. So we're going to switch over to a dashboard that we've pre-created. We've given it time to generate those metrics, and we're going to look at a couple of different views. In these views, we're going to look at a couple of sample metrics. We won't have time to look at all of them, but I wish we could. We're going to start first with latency improvements. This has an impact on both performance and cost management. As many of you know, when you move from one region to another, there's a cost involved in data transfer.

If you can localize your application to the region where it's trying to access the objects it needs, then you can save on those costs. From a performance perspective for latency, it means you can access those objects much faster if you can get them closer to where you're accessing them. So we're going to look at a view of how to identify cross-region access.

Next, we're going to look at one of the most popular views, especially for cost management, which is tell me which prefixes are most accessed. Is it a very busy, very hot dataset, or is it truly cold? These might be ideal datasets for you to consider setting up life cycle rules and looking at moving to more cost-optimized storage classes. And then we're going to look at one of the new metrics for identifying errors.

We have a lot of applications, and over time, customers have been demanding more and more from S3 in terms of performance. This ranges across machine learning, AI use cases, analytics use cases. But being able to pinpoint where you're having errors at a prefix level is really insightful. Because it helps you make decisions about should you consider alternative storage classes, should you consider caching that data? We want to give your application the best performance possible for cost optimization. Especially since we don't want idle GPUs.

Thumbnail 2190

And last but not least, I have a little story. I was working with a customer a couple of months ago, and they were having a lot of errors. And I wish I had this view back then, which is showing you the object distribution. It turns out that their application was misconfigured, and it was doing a lot of 1 kilobyte small reads and writes over and over and over again. And once we found it, we could solve it, but these views would have identified it right away.

Thumbnail 2220

Thumbnail 2230

Thumbnail 2240

Alright, so we're going to go into the dashboard. The first thing I'm going to show you is how to quickly identify where your cross-region access is coming from. We're going to show you this in two different views. And we're going to show you not only cross-region but also compared to in-region access. In this first view, we can see cross-region traffic happening on two different days.

Thumbnail 2250

Thumbnail 2260

Thumbnail 2270

Next, we're going to scroll down to the data transfer metrics. Here we can normalize, relative to all data access, which ones are cross-region and which ones are in-region. Now, some of you might be saying, well, I don't plan on localizing this data, maybe it's replication traffic, and I expect to have cross-region data transfers. No problem. We have a filter where you can filter out replication bytes.

Thumbnail 2290

Thumbnail 2300

In the next view, we're going to look at which prefixes are most accessed. So tell me where my hottest datasets are. To do that, we're going to switch over to the prefixes tab. Once it loads, you'll quickly get a graph of requests that are happening in your bucket. Typically, customers who are operating at the scale of hundreds of thousands of TPS are also a little bit worried about potential errors. Are there errors happening? Are there retries happening because of these high request counts?

Thumbnail 2310

Thumbnail 2330

So we're going to try to filter by Concurrent PUT 503 errors. We can see that traffic is happening there, and that's a little bit concerning. But you might also want to normalize this. Tell me what percentage of the overall requests these error codes make up. In this particular prefix, it's about 0.03% as highlighted. We can do better, but it's not that bad. But there's still room for improvement.

Thumbnail 2340

Thumbnail 2350

Thumbnail 2360

Next, using the filter view, we're going to zoom in on a particular bucket where we know that these errors and high requests are happening. And the view I'm going to highlight here is object size distribution. In this graph, we can see object counts, object reads, and object writes across different size distributions.

Here we can easily see how many of the requests are really small objects. For example, 1 kilobyte, 2 kilobyte, 4 kilobyte small objects, all the way to the other end of the scale where you're accessing gigabyte sized objects. This is an easy way for us to get it. Of course, you can derive this from server access logs and other insight solutions, but from here, it's a quick and easy way to understand what's happening at your prefix.

Thumbnail 2400

Alright, so that gave you a quick taste of what you can do with the dashboard views that are currently available.

S3 Tables Export and Natural Language Metrics Analysis Demo

So let's move on to what you can do with the S3 Tables export for Storage Lens metrics. A quick note, a side note, is that export to S3 general purpose buckets has been available for a while now, and this includes saving your data in CSV or Parquet format, which you can then ingest and use however you like. So one of the presumed questions from customers would be, why would I choose to export to S3 Tables? One of the benefits is that you can connect to existing analytical services. This is especially helpful if you're already familiar with or working with Iceberg. But another benefit of S3 Tables is that it provides managed compression and managed snapshot management that you would normally have to care about, but with S3 Tables, you don't have to worry about it.

Based on how the metrics are broken down, we've broken them into five different tables. The first one is bucket properties metrics, which includes the default settings for the buckets that you have configured. Next are the default storage and default activity metrics. Remember the 1% prefix threshold and the depth 10 aggregation that you set in your dashboard? That is included in these default storage and activity metrics. Now, the view that Netflix is probably most excited about is the expanded prefix storage and activity metrics. This provides you with up to 50 prefix depth and unlimited prefixes, so you get a full view of what's happening across your bucket, your account, and your entire organization.

Thumbnail 2510

Thumbnail 2530

For this demo, we're going to use Athena as our preferred analytical tool, but of course, you can use any tool that works with S3 Tables. We're going to run a couple of different views just to show you what you can do with the metrics. Tables are automatically configured as part of the S3 Tables integration, but you can preview metrics that are available in both default metrics and expanded metrics. I've pre-set a couple of SQL statements, so we're going to enter those and run them.

Thumbnail 2550

Thumbnail 2560

In our first example, we're going to look at default activity metrics. Let's see if we can find where my 503 errors are happening in my bucket. This is for finding out where you might have hot datasets or hot prefixes that you might want to consider moving to a cache or to more performant storage. From here, we can quickly see that there are about 800 results. Remember, these are aggregated results for some prefixes, not all prefixes, and certainly not anything beyond depth 10.

Thumbnail 2580

Now, if we switch this over to expanded prefix activity metrics table, we can see all our prefixes. So you get a full view of all the prefixes, no matter the size, no matter the depth, all the prefixes that you've configured in Storage Lens, all the prefixes that you want to be able to visualize. Super helpful, you get this granular visibility easily and quickly.

Thumbnail 2610

Thumbnail 2620

In our next example, we're going to look at the cost management aspect. This is usually about how do I find my coldest prefixes, how do I find my biggest prefixes, where can I make the most cost savings across my organization. So we'll do a quick search in activity metrics and we'll sort by request counts. But so far, we've only looked at activity metrics. We're not able to contextualize this because we also want to be able to sort by the size of the prefix. Some of these prefixes are going to be quite small, some are going to be much larger.

Thumbnail 2640

Thumbnail 2660

So what you can do here is do a little bit of a table join between storage metrics and activity metrics. So you'll get your biggest prefixes by size and the number of requests that you're seeing against them. We'll highlight this to make sure that the join is happening on the prefix. It'll take a couple of seconds to run, but once it's complete, you'll be able to see where your biggest prefixes are that are truly quiet.

Thumbnail 2680

And there we go. We get a slightly different view. We're showing all prefixes, so the count of results is the same, but this time, you get a list of your top 10, top 20, top 30, to see which prefixes are truly cold, and where you'll get the most value from trying to optimize. A lot of times, customers will want to work with their application owners here and say, hey, this is your prefix, this is your bucket. Would you like to review it again?

Thumbnail 2710

So this empowers you with the data to go and do that. So for our final demo, we're going to look at how you can interact with Storage Lens metrics using natural language. For this demo, we're going to set up an AI code assistant. Of course, you can use Amazon Q or anything else, any AI code assistant you like. For my demo, I'm using Kira. There are other popular ones, but we're also going to connect to a Model Context Protocol server, so an MCP server, to allow our AI code assistant to understand external data sources through an LLM.

Thumbnail 2760

Thumbnail 2770

Thumbnail 2780

And we're going to be interacting with S3 Tables, so that means because Storage Lens metrics are stored in S3 Tables, you can interact with them that way. So let's get started. The first thing I'm going to do is show you how to set up Kira. Kira launched a couple of weeks ago. For those of you who are familiar with VS Code, the IDE is inspired by VS Code. So it's super easy and fast to set up and get started.

Thumbnail 2790

Thumbnail 2800

Thumbnail 2810

In this vanilla Kira installation, I'm going to start by prompting it with a question: what S3 Tables do I have in US East 2? It's trying to figure out what I'm asking. It's asking about tables, it's also asking about buckets. So it doesn't really know what to do because I haven't set up an MCP server. So let's fix that. The first thing I'm going to do is put in our S3 Tables MCP server configuration. All you need to do is make sure that you give it the right credentials to be able to access your AWS account.

Thumbnail 2840

Thumbnail 2850

Once that's set up, you can save it. If I expand on the left, you'll see that there are a number of queries that it understands it can perform in terms of APIs to interact with S3 Tables. And we're going to leverage that as part of our prompt. So I'm going to ask it the exact same prompt again to show you that it's working: what S3 Tables do I have in US East 2? Great. It knows it can use the MCP tool that we just configured. It understands that I want to talk about US East 2, and it's waiting for me to give it permission. Go ahead. You can configure it so that it doesn't prompt you, but for this demo, I'm having it prompt me.

Thumbnail 2870

Thumbnail 2880

Thumbnail 2890

Great, we found a couple of namespaces. A couple of these are Storage Lens related, not all of them, but a couple are. Next, I'm going to prompt it to go look at a particular namespace where I have Storage Lens configured for all my buckets. And I'm going to ask it essentially the same question that we were running in Athena queries. Tell me my top prefixes based on size with the least number of requests. And I'm not going to tell it where exactly to look other than the namespace, and I'm going to make it try to figure it out.

Thumbnail 2910

Thumbnail 2930

The nice thing about this is there's minimal human intervention. I've just given it a prompt to say, tell me my least utilized prefixes, and the system is going to do a little bit of exploration. It doesn't actually know anything about the schema, or what's inside the tables, so it's going to try to do a couple of select statements to find out what's available there. What kind of metrics can I derive to answer the question that the user is asking? It'll look at storage metrics where I can get usage trends, but at the same time, it'll also look at activity metrics where I can get API requests, which I need.

Thumbnail 2940

Thumbnail 2960

And we already know that to connect both prefix based on size and requests, we're going to need a join. But I haven't told my coding agent that it needs to do that. I'm going to make it figure it out itself. So sometimes it'll go back and forth a little bit, but all I'm doing at this point is clicking, yes, run this command, yes, run this command, and the system is trying to construct the right SQL statement so that it can find out the answer to my question about the largest prefixes with the least number of requests.

Thumbnail 2970

Thumbnail 3000

Perfect. It took a couple of select statements, but it eventually got there, and it was able to output the buckets and prefixes that are larger in size but have very few requests. Super cool. All this without necessarily knowing anything about the schema, without necessarily being a SQL expert, using natural language to query Storage Lens metrics makes it super easy to get started. So with that, I thank all of you for joining us today. I want to leave you with a couple of key takeaways. If you haven't checked out Storage Lens, please do.

It's part of an entire storage insights portfolio across our logging solutions. We also have comprehensive usage views for Storage Lens. We've expanded the number of metrics available, so please go check it out. The free metrics are a great way to start, but you can also try out advanced metrics. For customers who have larger environments or want granular visibility, you can opt into expanded prefix metrics to get up to 50 prefix depth and unlimited number of prefixes.

You can get additional cost and performance optimizations from 72 new performance metrics that are included in the advanced tier. And finally, if you want to try out natural language processing or leverage your available analytical tools, please check out the S3 Tables export. Hopefully, this gave you a good starting point. Thank you all, and I hope this session was informative.


  • This article is automatically generated using Amazon Bedrock, maintaining the information from the original video as much as possible. Please note that there may be typos or incorrect content.

Discussion