iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📖

re:Invent 2025: Introducing S3 Access Points for FSx for NetApp ONTAP and AWS Service Integration

に公開

Introduction

By transcribing various overseas lectures into Japanese articles, we aim to make hidden valuable information more accessible. The presentation we will be covering in this project, based on this concept, is here!

For re:Invent 2025 transcription articles, information is summarized in this Spreadsheet. Please check it as well.

📖re:Invent 2025: AWS re:Invent 2025 - Introducing Amazon S3 Access Points for FSx for NetApp ONTAP (STG217)

This video announces S3 Access Points for Amazon FSx for NetApp ONTAP. Luke Miller and Jacob Strauss highlight the current situation where 80% of hundreds of exabytes of unstructured data stored in file systems worldwide are underutilized, and explain how this new feature enables ONTAP data to be accessed via S3 API. Without data movement or copying, it can integrate with AWS services like Amazon SageMaker, AWS Glue, Amazon Athena, and QuickSight, while maintaining traditional access via NFS and SMB. The demo uses credit card data from a financial institution as an example, demonstrating everything from creating access points to querying with Athena and AI-powered analysis with QuickSight Q, showing how value can be extracted even from secondary data for disaster recovery.

https://www.youtube.com/watch?v=NVZV0gfV-jA

  • This article is automatically generated, maintaining the original lecture content as much as possible. Please note that there may be typos or inaccuracies.

Main Content

Thumbnail 0

re:Invent Session Begins: Data as the Foundation of Modern Innovation

Alright, welcome everyone. We've made it to day four of re:Invent. Thank you for being here. This is STG217, an introduction to Amazon S3 Access Points for FSx for NetApp ONTAP. My name is Luke Miller. I'm a Product Manager on the Amazon FSx service. And joining me today is Jacob Strauss, a Principal Engineer for File Storage at AWS.

Thumbnail 40

We've got a packed agenda for you today. We're going to start off with an overview of FSx for NetApp ONTAP. Then we're going to talk about S3 Access Points for FSx, what that means for your ONTAP data, specifically how you can get more out of your data. We're going to walk through some use cases, talk about how this works at a high level, and then we're going to walk through a demonstration showing you how to get started, how to access your data, and some sample integrations with AWS analytics services.

Thumbnail 70

Thumbnail 90

But before we jump into the main topic, I want to take a big step back. At AWS, we understand that innovation more than ever depends on data and your ability to make effective use of that data. That's why we often say, "data is the foundation of modern innovation." And when you look at enterprises today, you find that there are hundreds of exabytes of data stored in file systems around the world. To put that in perspective, that's more data than the entire internet contained just a few years ago, and that's sitting in file systems today.

Thumbnail 110

Thumbnail 130

80 percent of that data is unstructured data. We're talking about rich contextual information that truly represents your business knowledge, the organizational memory, documents, videos, images. And finally, all this data costs money to store, has management overhead, but only a small portion of that data is being utilized today. That means most of your organization's data might be sitting idle, not contributing to innovation or decision-making.

Think about what that means for your organization. Decades of organizational knowledge, customer insights, research findings, and business intelligence trapped in file systems, disconnected from your analytics platforms, your AI models, and your business applications. So the question is, how do you unlock it? How do you bridge the gap between where the data exists and where it needs to be to drive innovation? That's what we're here to solve today, and that solution starts with Amazon FSx for NetApp ONTAP.

Thumbnail 190

Overview of Amazon FSx for NetApp ONTAP and Existing Use Cases

Amazon FSx for NetApp ONTAP is the first and only storage service to offer fully managed NetApp ONTAP file systems in the cloud. It provides all the familiar features, performance, and APIs of NetApp ONTAP file systems that your team knows and trusts, combined with the simplicity, agility, and scalability offered by AWS. This means you can retain all the enterprise features your organization relies on, such as snapshots, clones, ransomware protection, multi-protocol support, and storage efficiency for cost reduction, while also leveraging cloud-native benefits like elastic scaling, pay-as-you-go pricing, and global availability.

Thumbnail 250

Customers are already realizing more value from their data on FSx. With FSx, you can eliminate the undifferentiated heavy lifting of self-managed file storage and instead focus on innovation. For example, you no longer have to spend valuable time, expertise, and capital on hardware procurement, software updates, patch management, troubleshooting, and the various other operational overheads required to manage storage.

Thumbnail 280

FSx for NetApp ONTAP supports mission-critical applications across a wide range of industries and use cases, from enterprise IT applications, user shares, and line-of-business applications to backup and disaster recovery. Let's look at a few.

FSx for NetApp ONTAP serves as the foundation for corporate file services, home directories for thousands of employees, departmental shares for collaboration, and workspaces that need to scale dynamically with business needs. We also see diverse line-of-business applications. Financial services firms run trading platforms and risk management systems. Semiconductor companies handle large workloads for chip design. Healthcare and life sciences organizations manage genomics data, medical imaging data, and clinical research datasets, all requiring the advanced data management capabilities that ONTAP provides. And of course, data protection use cases, backup repositories, and secondary data copies for disaster recovery targets, leveraging ONTAP's built-in replication and snapshot technologies.

Thumbnail 390

Customers can run these critical applications on FSx for NetApp ONTAP today, using existing native AWS security, compute, monitoring, and data management services such as AWS Directory Services, CloudWatch, and AWS Backup. In addition to these existing file-based applications and services, customers want to do more with their data. Over the years, AWS has built a wide range of AI, machine learning, and analytics services that integrate with S3, and customers want to be able to use these services with their ONTAP data.

Thumbnail 450

For example, customers want to build, train, and deploy machine learning models with Amazon SageMaker. They want to process and catalog data with AWS Glue. They want to query data on the fly with Amazon Athena or visualize data with QuickSight to drive better business outcomes and interact with their data. And finally, they want to build generative AI-powered applications with Amazon Bedrock and leverage their data with Amazon QuickSight, AWS's generative AI-powered business intelligence platform.

Announcing S3 Access Points for FSx for NetApp ONTAP: Making ONTAP Data Accessible via S3 API

Therefore, we are excited to announce the launch of S3 Access Points for FSx for NetApp ONTAP today. This allows you to access your FSx for NetApp ONTAP data as if it were in an S3 bucket. By using this new feature, customers can access their ONTAP data using the S3 API and a wide range of S3-integrated AWS services. S3 Access Points are S3 endpoints that customers use to securely manage access to shared datasets in S3 buckets and FSx for OpenZFS file systems, and we have now extended this to support FSx for NetApp ONTAP file systems as well.

Thumbnail 510

This means that customers storing data in ONTAP file systems, whether on-premises or in AWS, can access that data as if it were in S3. This new feature provides the easiest way to get more out of your ONTAP data. With S3 Access Points for FSx for NetApp ONTAP, the rich enterprise file data we just discussed—from departmental shares to clinical research datasets to financial transaction data—all instantly becomes S3 compatible. This means it becomes accessible via Amazon S3 and the S3 API, allowing seamless connection with a wide range of AWS analytics, AI, and compute services.

Thumbnail 560

Thumbnail 570

What's more, there's no need to move any data out of your ONTAP file system, and at the same time, that data remains accessible via file protocols like SMB and NFS. Now, let's take a closer look at how S3 Access Points for FSx maximize the potential of your file data. First, let's start with what I believe could be one of the most transformative use cases: integrating decades of enterprise file data with AWS's latest AI-powered research and business intelligence tools.

Think about your organization for a moment. How much organizational knowledge is dormant in your file systems right now? Research reports, clinical trial data, engineering documents, legal contracts.

Engineering documents and legal contracts represent petabytes of data with the potential to revolutionize how your teams work, but this data is disconnected from modern AI applications. We hear from customers that they want to build AI-powered research assistants using Amazon Bedrock and QuickSight, but their unique data is stored in file systems, not S3. Until today, that meant building complex ETL pipelines, duplicating data, managing synchronization, all while trying to maintain security and governance. With S3 Access Points for FSx, this becomes simple and easy.

Thumbnail 650

Thumbnail 670

When you attach an S3 Access Point to an FSx volume, QuickSight can index the file data. Bedrock knowledge bases can be used for RAG workflows, and Q Business can answer questions about decades of organizational knowledge. The data never moves. It stays within the file system and remains accessible via NFS and SMB for traditional applications, while simultaneously being available via the S3 API for AI services.

Thumbnail 710

Thumbnail 720

Thumbnail 730

Specific Use Cases: Disaster Recovery, Analytics, Application Modernization

So, let's look at an example. As I mentioned earlier, we often see customers using FSx for NetApp ONTAP as a secondary site for disaster recovery. Currently, customers manage a primary ONTAP environment on-premises for their primary workloads and deploy FSx for NetApp ONTAP on AWS as a secondary copy. Currently, this data is sitting idle and unused, replicated from the primary site to FSx using ONTAP SnapMirror, ONTAP's built-in data replication technology. And now, with S3 Access Points for FSx, you can leverage this data. You can attach an Access Point to the destination volume and integrate it with QuickSight.

Thumbnail 740

Now, let's look at analytics. As I explained earlier, you have valuable data—genomic sequencing, seismic exploration, manufacturing telemetry—stored in file systems, structured to how existing file-based applications operate. But until today, if you wanted to run analytics with Athena, you had to copy that data to S3, manage synchronization, and pay for duplicate storage. Now, with S3 Access Points for FSx, you can point Athena directly to your FSx data through an Access Point. Glue can crawl and catalog the file data as if it were in S3, and EMR can run Spark jobs against that data.

Analytics services can't tell the difference. They're using the S3 API, but the data remains in FSx and is accessible via NFS and SMB for file-based applications. For example, imagine a manufacturing organization that wants to build an end-to-end solution for quality control analytics on manufacturing data. Years of process data are ingested and stored in on-premises ONTAP. Now, they can replicate that data to FSx via SnapMirror, to an FSx for NetApp ONTAP file system, and perform real-time analytics with Athena and Redshift without maintaining a separate S3 data lake.

Thumbnail 830

Now, let's change gears a bit. I want to talk about new opportunities for application modernization. Today, your ONTAP data serves existing applications that have been running for years or even decades, and these rely on file protocols. SAP systems, electronic health record platforms, trading applications—all expect to mount file systems via NFS or SMB. You can't just flip a switch and rewrite them overnight. But at the same time, teams want to build new cloud-native applications with serverless architectures using Lambda that automatically scale, pay as you go, and developers love to use.

With S3 Access Points for FSx for NetApp ONTAP, you can serve data to both existing file-based applications and new object cloud-native applications. There's no need to change existing applications or move data out of your ONTAP file system. Let me give you a real example. Imagine a media company with video files on FSx.

Editors access and edit these files via SMB on Windows workstations, but when new content arrives, they want to automatically generate thumbnails, extract metadata, and perform compliance checks. With S3 Access Points, you simply connect an Access Point to your FSx volume and configure Lambda to trigger on a schedule or polling-based mechanism. When a new video is uploaded via SMB, Lambda automatically processes it using the S3 API. You can have functions to extract metadata, generate thumbnails, and perform AI-driven content analysis, all serverless, accessing the same file data that the editors are working on.

Thumbnail 970

This is how you extend and modernize without disruption. It doesn't force a big bang migration or re-architecture. You gradually add cloud-native capabilities with the same data, in parallel with existing workloads. Next, I'll hand it over to Jacob. He'll explain how this works.

Thumbnail 980

Technical Explanation: How S3 Access Points Work and Authentication/Performance Characteristics

Yes, thank you, Luke. So let's start by setting the stage for what your new environment will look like when you add S3 Access Points to your FSx file system, starting from what you have today. First, today's situation is a very simplified diagram in terms of what your actual system looks like, but you have an FSx file system. It resides in a particular AWS region. You have clients running on EC2 instances or other compute platforms, and you have users interacting with those applications, and these are running on traditional file system protocols, NFS for Linux applications, SMB for Windows applications. This is the situation today.

Thumbnail 1030

So let's talk about the new elements we're adding here. By creating an S3 Access Point for this file system, you can do some new things that weren't possible until today. I'll explain this setup process in a bit more detail later, and we'll also do a demo, but first, I want to walk through what these steps are. When you create an Access Point, as a result of that setup process, you get an alias for the name of that Access Point. This allows you to use this name anywhere you would have used an S3 bucket name today.

This means you can make direct calls via the S3 SDK. Whether it's the CLI or an SDK embedded in your application, you can use that Access Point alias instead of the bucket name anywhere you're asked for a bucket name. This allows you to make put, get, list, and head calls directly, and they are forwarded to the FSx file system via the Access Point. These can be executed within the same AWS region. These commands can also be executed from EC2 or other compute platforms, but they don't necessarily have to be. They can be executed over the internet. They can be executed on machines you manage or other applications.

This is a capability that was not possible until today. Because until today, you were limited to accessing FSx data from within your VPC. So now, if you want to bring data in or export data from other locations, this is a way for applications to circumvent that. And in addition to directly controlling applications and clients, you can also pass the Access Point alias to other AWS services integrated with S3. That could be something like Athena, or EMR, Glue, and so on. We'll look at a few examples of these later.

Now, I want to quickly talk about two other topics before diving deeper into them. One is how authentication and authorization work in this environment, and the other is about performance characteristics. When thinking about how authentication works, think of it as having two layers. One is the S3 authentication model, and the other is the traditional file system model. And both will be executed. Both will work as expected.

For example, let's say an application is sending a GET or PUT request to my S3 Access Point. The first thing that happens is the Access Point itself checks the policy defined for that specific Access Point. It determines whether these specific users are allowed to perform GETs on these prefixes, or whether these specific users are allowed to list or delete objects, and it decides whether that specific request is authorized. If authorized, the Access Point forwards the operation to the FSx file system, where another decision is made based on whether the Unix or Windows user and group settings are permitted for the specific file the operation is trying to access.

Both of these occur. And the way to think about them is that they both occur in a layered fashion.

Moving on to performance, we are not creating multiple copies of the data. In fact, there is only one copy of the data, and it resides within the FSx file system. What this means is that if you ask questions like, what is the overall throughput level when accessing the file system via the Access Point, or what is the total number of IOPS, the answer is that it's the same as the underlying file system. If you want to make provisioning decisions about how much throughput is allowed or the number of IOPS that file system supports, those will be available through the file system interface or via the S3 Access Point.

Another implication of having only one copy of the data stored in FSx itself is that you don't have to worry about what happens if you make an update from one side and when it becomes visible from the other side. In reality, there is only one copy of the data. When you create a file or put an object via the S3 API, it immediately becomes visible to applications using other modes, other interfaces.

Thumbnail 1290

I also want to talk about how this setup is used in some of the most common scenarios. One of them is when you have a large existing dataset. For example, if you've built a library of data over years or decades in your on-premises data center, often you'll set up an FSx file system running on AWS and configure replication between them, usually for disaster recovery purposes. And now, when you add S3 Access Points on top of the file system running on FSx, you gain many new capabilities that didn't exist before. The data that was previously just there in case you needed to spin up applications in a new environment in the event of a disaster can now be leveraged to gain value from that additional copy of the data. Because it becomes available as a read-only dataset or a read-only library for all these services integrated via the S3 API. For example, you can perform analytics on your disaster recovery data copy, gaining a lot of new value from this data that wasn't possible before.

Thumbnail 1370

Thumbnail 1380

Thumbnail 1400

Thumbnail 1410

Demonstration Part 1: Creating and Configuring S3 Access Points

So now, I'd like to begin a demonstration of some of these sequences. To make this a bit more tangible, I'll walk through what those steps look like, and then I'll hand it over to Luke to explain how they work. The demo will proceed in three different parts. The first part is about how to get started, how to create an Access Point, how to configure it, and what decisions you need to make to get things working. Then, we'll look at examples of accessing data using both the file interface and the S3 interface. We'll show how you can mix and match or switch between reading and writing files and S3 put and get on the same underlying dataset, and how that actually works. Finally, we'll look at some examples of these service integrations. How you can leverage integrations with AWS analytics services like Athena to gain value from your data and query and utilize it in ways that weren't possible before today.

Thumbnail 1450

Thumbnail 1500

So, I'll hand it over to Luke to move this forward. Thanks, Jacob. Let's switch to the demo. For this demonstration, let me provide a bit of context. We're going to look at a hypothetical scenario. We are a financial institution. We have credit card data, transaction data, and we'll play two roles. One is a storage administrator who sets up the Access Point, and the other is a data scientist and user who uses analytics services to perform analysis on a sample dataset. Let's start with a familiar place. This is the AWS Management Console, specifically the FSx console. In this demo, we have an existing FSx for ONTAP file system.

Thumbnail 1510

As you can see, this is a 10 terabyte file system with 384 megabytes per second. Again, this is provisioned storage, provisioned throughput, as Jacob mentioned. Within an ONTAP file system, you can have one or more volumes. Think of an ONTAP volume as a logical container for your data. When you create an Access Point, you attach it to a volume. In fact, you can create up to 10,000 Access Points on a single volume. 10,000 is the maximum quota for S3 Access Points in an account and region, and you can place all of them on a single volume. You can also distribute them across multiple volumes within a file system, or across multiple file systems.

Thumbnail 1570

Thumbnail 1580

So let's navigate to the volume called Volume One and go to its details page. We can see it's a 100 gigabyte volume with a Unix security style, so Unix permissions. The first thing I want you to notice here is a new tab called S3. From this new tab, you can see all the S3 Access Points attached to this specific volume, this container of data, and you can see there are already two here. We'll come back to these later.

Thumbnail 1600

Step 1 is to create an Access Point. First, we'll give it a name. We'll call this my EC2 user S3, or V1 S3 AP. Here's a good opportunity to dive deeper into the authorization story that Jacob mentioned earlier. Again, you can think of this as two levels of authorization. You're going to get native AWS IAM authorization in the same way that access to an S3 bucket is authorized. There's an IAM principal caller. It has associated policies and permissions. There's a resource, which in this case is the Access Point. It has a policy, and access is either granted or denied. These permissions are evaluated by S3.

But beyond that, we want to make sure that the caller actually has permissions to the underlying file data. Again, this is secure access. The way that works is that when you create an Access Point, you associate a file system identity with it. In a Unix environment, that could be a Unix user. In a Windows environment, that could be a Windows user. The way that identity, that user, is used is that all access through that Access Point will be authorized in the file system as if it were that user.

Let me give you an example. Let's say you have an FSx for ONTAP file system integrated with Active Directory. Within that Active Directory, there's a user named Alice. Alice has her home directory within this volume and has full read/write permissions to her home directory, but not to Bob's home directory. You can create an Access Point and associate Alice's identity with that Access Point. This means that access via the Access Point will be authorized as if Alice were reading and writing data. For example, if you use that Access Point to perform a get object operation on an object within Bob's home directory, access will be denied. Because Alice does not have file-level permissions to it.

Thumbnail 1760

So there are two levels of authorization when accessing data via S3: IAM authorization and file system level permissions. For this demonstration, we used a Unix security style volume. We'll create a Unix user. We'll use the EC2 user, which is the default local Unix user for Amazon Linux instances. In addition to file system identity, there are several Access Point mechanisms and properties that allow customers to secure access. Next is network configuration. Jacob spoke about this briefly earlier. You can choose whether to make the Access Point accessible from the internet or restrict access to within your VPC.

For this example, I'm going to choose the internet because I'm going to demonstrate uploading and downloading data from my local laptop over the internet.

Just like S3 bucket policies, you can set an Access Point policy, and it works the same way. A bucket without a bucket policy authorizes access in the same way as an Access Point without an Access Point policy.

Thumbnail 1820

Finally, some of you may be a bit uneasy about hearing that data can be accessed over the internet. I want to clarify the difference between access over the internet and public anonymous access. S3 Access Points for FSx have S3 Block Public Access enabled by default, and this cannot be changed. This means there is no anonymous access via Access Points.

Thumbnail 1860

So, let's create the Access Point. Returning to the list view, you can see a third Access Point for this volume. And it's in the creating state. Creation takes about 10 to 20 seconds.

Thumbnail 1880

Thumbnail 1900

Thumbnail 1910

While it's being created, let's look at the Access Point details. Again, these are Amazon S3 Access Points, exactly the same as the Access Points you attach to buckets. You can see that you have the Amazon Resource Name, or ARN, of the S3 Access Point. There's also a link to the S3 console, from which you can also view the details of this Access Point in the S3 console. Here you can add tags or specify an Access Point policy. There's also a link back to the FSx console. You can also view the file system identity and permissions you configured.

Thumbnail 1920

Let's go back to the volume. As you can see, the Access Point we just created is now available. One more thing Jacob mentioned earlier that I want to re-emphasize is the alias. Access Points have an automatically generated alias. You can use the alias anywhere you would use a bucket name for data access. So, if you're performing a put object, you use the alias instead of the bucket name. If you're integrating or connecting with an S3-integrated service, you specify the alias as part of the S3 URI. I'll demonstrate both of these.

Thumbnail 1980

Demonstration Part 2: Data Access via File Interface and S3 Interface

Now that we have an Access Point, let's actually use it. What I'm going to show you next, as Jacob mentioned, is basic data access. I'm copying the Access Point I created. To briefly explain, I have two terminals. The terminal on the left is my local laptop. The terminal on the right is SSH'd into an EC2 instance in the client VPC where the volume is mounted.

Thumbnail 2010

Thumbnail 2020

First, I'll make a note of the alias for the Access Point we're using. Then, I'll SSH into the EC2 instance here and confirm that this volume is mounted here. And let's look at the contents of the volume. Again, we are a financial company. We have credit card datasets, and we're going to perform analysis on them.

Thumbnail 2050

So, let's look at this sample dataset. We have data for cards, data for customers, fraud alerts, merchants, and transaction data. The transactions are actually partitioned by year, month, and day, which we'll look at now. On the right, what we're doing here is what you can do today, just accessing the data via NFS.

Thumbnail 2060

Thumbnail 2070

Thumbnail 2080

Now, from my laptop, I'll perform an ls command against the credit card directory prefix. And that returned all the directories within that volume. Again, cards, customers, fraud alerts, and so on. Let's try performing a list of objects V2. We'll look at the transactions prefix. And you can see a lot of objects are returned.

What I want you to notice first here is the storage class. When you access data via FSx's S3 Access Points, your ONTAP data will appear as having the FSx ONTAP storage class.

Thumbnail 2110

Let's perform some additional read-only operations. We'll perform a head object command against the customers CSV file. You'll notice that just as you can put metadata on objects, you can also put metadata on objects within an FSx for NetApp ONTAP volume. Again, notice the storage class. Another important point I want to highlight is server-side encryption. You'll notice a new mode: AWS FSX. This represents the standard data-at-rest encryption you get with FSx for NetApp ONTAP. All FSx for NetApp ONTAP file system data is encrypted at rest using the AWS KMS key you specify when creating the file system, and that's the server-side encryption happening here.

Thumbnail 2160

Thumbnail 2170

Thumbnail 2180

Thumbnail 2200

Let's download this object. Now that we've retrieved the object, we can see its contents. My apologies, it's a bit hard to read. What I did here was, in the left pane, I'm looking at the contents of the downloaded file, the customers CSV, and on the right, I'm displaying the contents of the same data from the NFS mount. So far, I've only shown read-only operations, but it's not just read-only. You can also write data. Let's try putting some data.

Thumbnail 2220

Thumbnail 2230

Thumbnail 2270

Here, in addition to this dataset, we're going to upload some more data. This is customer rewards data, and you can see the upload was successful. If we go back to the NFS mount, we can see the contents of the new file we just uploaded. Now, do you remember when we created the Access Point, we assigned a file system identity? This is used for authorizing file system operations and also represents ownership. If we use the stat command to check the file we just uploaded, it shows the ownership. The owner of this file is the BISIS account. This is the Unix user I associated with one of the Access Points I created, and this demonstrates that this file was uploaded via an Access Point and has that ownership.

Thumbnail 2310

Thumbnail 2340

Thumbnail 2350

Thumbnail 2360

Demonstration Part 3: Integration with Amazon Athena and QuickSight Q for Analytics Services

Now, let's do something a bit more exciting. Until now, it was just the beginning of basic data access. Let's actually connect this data to AWS analytics services. As a data scientist at this financial company, I want to perform some analysis. I want to be able to query this data on the fly using a SQL-based query editor, so I'll use Amazon Athena. In Athena, we create tables that represent the underlying data. We'll create tables for customer data, card data, and so on. When creating a table in Athena, Athena queries an S3 endpoint. Let's create one table for customers. A new customer table has been created, and its properties are displayed. The location is an S3 URI containing the alias and a specific prefix for customer data.

Thumbnail 2370

Thumbnail 2380

Thumbnail 2390

Thumbnail 2400

We'll quickly create the remaining tables. Do you remember that transaction data is partitioned? We partitioned it by year, month, and day. Let's see what that looks like. All partitions have been detected.

Thumbnail 2420

Thumbnail 2440

We'll create the fraud alerts table. Finally, the credit card table, and the rewards table. Now, I'm doing this primarily manually using Athena, but you don't necessarily have to. You can use AWS Glue to automatically crawl and discover this data and schema. So let's go back again. Transactions are partitioned. We can view this table in Glue. Again, you can see that this is the schema representation for the transactions table. It points to an S3 Access Point alias at a specific prefix. Remember that this data resides in the ONTAP file system. The primary copy might be on-premises, and this could be a read-only secondary copy for disaster recovery.

Thumbnail 2470

Thumbnail 2480

Schema and partitions. Again, this is beyond the scope of this demo, but we could also have created a Glue crawler to do this automatically. Let's go back to the Athena query editor. We have our tables. Let's look at some data. We'll perform a basic select for the first 10 customers. This should look familiar. We just reviewed this data after downloading it via NFS.

Thumbnail 2490

Thumbnail 2510

As a data scientist, I now want to perform customer segmentation analysis. Here, we're performing a slightly more advanced select that joins two tables, and you can see the results being generated. For now, it's read-only, but you can also create tables from the select results. This is called CTAS (Create Table As Select). You'd want to use this, for example, if you're doing daily, weekly, quarterly, or annual reporting. If you're populating dashboards with data and want that query result to be fast.

Thumbnail 2550

Thumbnail 2570

Thumbnail 2590

So one of the things you can do is perform a select and actually create a table from those results. I'm going to actually show you that now. Here we're going to create a summary table, a customer summary table. This has a few different points. So let's run it. This performs a select on an existing table and creates a new table. It's created from the results and written to a different alias at a different prefix within the same volume. Not only is the location different, but here we're also performing it in a different format. We're writing Parquet files. Previously, we were reading CSVs. Remember, the data is stored in ONTAP.

Thumbnail 2600

Thumbnail 2620

Let's check. On the left, we see the customer summary for credit card data analysis, and we have the Parquet file returned from the table that was just created. Now, let's query this table and look at the results. And this is the customer summary table. This was the traditional way a data scientist would perform basic analysis.

Thumbnail 2650

Thumbnail 2660

Now, recently, AWS announced Amazon QuickSight Q. QuickSight Q is an AI-powered business intelligence platform, and with QuickSight Q, you can integrate your enterprise data as a knowledge base and perform similar analyses with AI assistance. So let's take a look. The next part of this demonstration will be done using QuickSight Q. Step 1 for QuickSight Q is that you have all your enterprise data. You want to make it accessible so you can automate workflows, conduct in-depth investigations, and perform analysis.

Thumbnail 2680

Step 1 of QuickSight Q is to create an integration, to create a knowledge base. I already have one. It's called the credit card dataset BI knowledge base. It's configured in the same way you would configure a knowledge base with QuickSight or QuickSight Q against an S3 bucket. The way this works is that when QuickSight Q creates a knowledge base, it periodically synchronizes and indexes the data, making it available to the chat assistant. I have this available knowledge base, and it's synchronized.

Thumbnail 2710

Thumbnail 2720

Thumbnail 2730

Thumbnail 2740

So let's move to the chat agent. Here we have a Data Science Analyst. What we're going to do is use this to perform similar analysis to gain insights from the same dataset. Let me show you a few examples. Here are previous reports and analyses. When instructed to analyze customer spending patterns in the credit card dataset, it returned results and was able to create tables and bar charts.

Thumbnail 2760

Let's start a new conversation. So, I'll run the prompt. The chat agent starts with thinking, reasoning. It's searching documents. As you can see here, it's reading the cards CSV again. This data is stored in the ONTAP file system. It will continue to analyze, process numbers, and search. This will take about a minute to run.

Thumbnail 2790

I want to step back here and emphasize what this enables. Customers have a massive amount of data stored in ONTAP, and that enterprise data truly represents the organizational memory and business context. Now, they can easily bring that data into FSx and leverage it with these tools in ways they couldn't before. Previously, trying to do this would mean copying that data to S3, paying for two copies, in different formats, re-architecting, and managing everything to stay synchronized. All of this can be done seamlessly through ONTAP's data replication technology like SnapMirror.

Thumbnail 2850

Thumbnail 2880

Here, it's reloading even more data within the knowledge base. What QuickSight is doing here is that it has a knowledge base pointing to an S3 location, which is an alias or Access Point. That Access Point is connected to my volume, and QuickSight is performing S3 API operations to read the data. It's reading merchant data, customer CSVs, customer rewards, and now it's analyzing the query. It's putting everything together. Again, this will take about one or two minutes.

Thumbnail 2900

Thumbnail 2910

Thumbnail 2920

Thumbnail 2940

And here are the results. Based on a comprehensive analysis of the credit card dataset, we identified several key customer spending patterns and behavioral insights. Customer segmentation analysis is complete. Merchant category spending analysis is also complete. We also performed an analysis of transaction characteristics on data stored in the ONTAP file system. Tables and bar charts were also created. All of this can be downloaded, exported, and used in any way to present this data to your team and drive better business outcomes. It even provides action recommendations.

Thumbnail 2990

Conclusion: The Easiest Way to Maximize Your ONTAP Data

So, to summarize. First, we created an Access Point. Using that Access Point and its alias, we performed basic data access S3 API operations, demonstrating read and write, and how it interacts with existing file-based access, NFS. Next, we showed how to integrate with Amazon Athena to query data on the fly. This is a more traditional method. And finally, we demonstrated AI-assisted analysis for similar exercises using QuickSight. Now, I'll switch back the screen.

Thumbnail 3030

To conclude, I'd like to leave you with a few key takeaways. First, S3 Access Points for FSx ONTAP are the easiest way to get more value from your ONTAP data. Your data continues to reside in your ONTAP file system and is accessible via file protocols. No re-architecture, no data movement, and no need to pull data out of your file system. And now, for the first time, that data is accessible via Amazon S3 using the S3 API, making a wide range of AI, ML, and analytics services that work with S3 available to you. Finally, thank you all for joining this session and for your time.


  • This article was automatically generated using Amazon Bedrock, maintaining the information from the original video as much as possible.

Discussion