iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📖

re:Invent 2025: Bloomberg Media's Next-Gen AI Media Platform and Disposable AI Strategy

に公開

Introduction

By transcribing various overseas lectures into Japanese articles, we aim to make hidden quality information more accessible. The presentation featured in this project, based on that concept, is:

For re:Invent 2025 transcript articles, information is summarized in this Spreadsheet. Please check it as well.

📖re:Invent 2025: AWS re:Invent 2025 - AI at the speed of news: Bloomberg Media’s vision for the future (IND3331)

This video explains the next-generation media platform that Bloomberg Media is building in collaboration with AWS. Brandon Lindauer, Robert Raver, and Loic Barbou introduce an end-to-end system that analyzes and understands petabyte-scale content and automatically generates new content. They describe how to build hybrid search, Vision Language Models, vector embeddings, and knowledge graphs using Neptune Analytics, leveraging Amazon Rekognition, OpenSearch, and Amazon Bedrock. Of particular note is the concept of a "disposable AI strategy," which emphasizes platform design that allows for easy swapping of models and services. For automated content creation using AI agents, a concrete example is shown of automatically generating multiple social media-optimized clips from a 30-minute interview, discussing the potential to extract new value from a 13-petabyte archive.

https://www.youtube.com/watch?v=5CkOnwmJkpQ

  • This article was automatically generated by utilizing Amazon Bedrock, aiming to maintain the original video's information as much as possible. Please note that it may contain typos or inaccuracies.

Main Content

Thumbnail 0

re:Invent Session Begins: Building an Intelligent Analysis Platform for Large-Scale Media Libraries

Well, hello everyone, and welcome to Friday morning at re:Invent. You made it. I didn't even realize I had a Friday morning session, but I'm glad you're here. We're going to talk about some really cool things today. My name is Brandon Lindauer. I'm a Specialist Solutions Architect focused on media and entertainment workloads. And we've got some great content here today. Just a quick preface before we jump in. This is a 300-level session. We're not going to show you a ton of code, but we're going to cover a ton of concepts, a lot of deep concepts, and we're going to go at a very fast pace. So you're probably going to have to watch the replay of this session a second and a third time before it all kind of starts to sink in.

And if there's anything that we don't cover deeply that you want to learn more about, there are a lot of other sessions at re:Invent that are posted online. For those of you online, hello from the past. This session is not about a single point solution. It's not about a single service. We're not talking about OpenSearch or Rekognition on their own. What we're talking about is building a complete end-to-end system, a platform that allows you to intelligently analyze, understand, and then take action on a large-scale media library.

Thumbnail 100

We're going to cover a vision of the future of Bloomberg with my colleagues Robert Raver and Loic Barbou. So let's get into it. We're going to cover a vision for the future of Bloomberg. We're going to dive deep into analyzing and creating metadata about this media. We're going to talk about how you ingest and organize petabyte-scale content. We're going to talk about how you build a hybrid search that understands both keywords and meaning. We're going to touch a little bit on knowledge graphs and how you use AI agents to orchestrate complex content workflows and create content from there, and where Bloomberg is on this whole journey. So let's jump right into it. We've got a lot to pack in, so we're going to start right away. Loic Barbou from Bloomberg.

Bloomberg Media: Global Business News Platform and Expanding Video Content

Good morning, everyone. My name is Loic Barbou. I'm the Head of Media Technology for Bloomberg Media. How many of you have heard of Bloomberg Media? Quite a few of you, that's good. And thank you for being here this morning, the last day of the conference. As Brandon said, we have a lot to pack in, and a lot of knowledge we want to share with you. For my part, I want to share the vision of Bloomberg and what we are doing in the world of AI and media.

Thumbnail 180

Bloomberg Media is a global news and business platform aiming to deliver data-driven insights to business leaders. We analyze the news, we extract all the information from the news, we curate it, and we distribute it to allow business leaders to navigate and make decisions in the complex global economy that exists today. And we do this by producing various types of content, like text-based content, like news articles, audio content through radio and podcast channels, and video through TV linear distribution, mobile, web, and social platforms. And that's both live and video-on-demand content.

Talking about the video space, it is currently our strongest engagement area, and it continues to grow. Demand is very high. To give you some context, and this is why we're going to focus on the video space today, to give you some context, approximately 60 million unique viewers watch these videos every month, reaching about 437 million households globally. And we do this through 48 major streaming platforms.

Thumbnail 280

Here are two examples of our linear distribution channels. The first one is Bloomberg TV. As you can see on the screen, this can be quite complex. It uses many components to not only integrate video but also news, delivering all these insights to users in real-time. The second is Bloomberg Originals.

This is a different type of content, long-form content that can be viewed as documentaries, but also very deep analyses of topics affecting today's world.

Thumbnail 330

Building a Hybrid Media Ecosystem and Serverless Architecture

To understand our AI journey, let me talk a little bit about the existing platform we currently deal with. The current media ecosystem, which handles workflows such as content ingestion, production, management, transformation, and distribution, has been designed and implemented at Bloomberg. And to achieve that, we use a variety of technologies. We use services within our own network, but we also use many cloud services, making it truly a hybrid global platform from a media workflow perspective.

A few years ago, we decided to invest a lot of time to figure out what we could do and how we could implement an ecosystem that provides the agility needed to respond to constantly changing business goals and new requirements. We also needed to adjust our delivery to a constantly evolving platform. And to achieve this, we built a serverless ecosystem running on AWS. This gives us the ability to leverage new components and provides a very easy way to introduce new services. The concept of dynamic workflows is also part of this, allowing us to adapt processing very easily depending on the amount of content we're dealing with and the changes in these media workflows.

In our world, there's also complexity stemming from the numerous content sources we handle. It's not just about distribution; it's crucial to understand where the content comes from. In fact, content can be ingested from live TV productions, or it could be contribution feeds, or raw content coming directly from events, or even content already stored in one of our repositories, which we refer to as historical content.

There's one thing that must be implemented in all workflows: speed. Nobody cares about old financial news. It needs to be accurate, and it needs to be trusted. So, in every decision we make in choosing how we actually process media, we make sure that we meet each of these requirements. And that's the same for everything we do with AI.

Thumbnail 540

The Future of Media AI: Archive-Driven Approach and RAG Challenges

So let's talk about AI. We've all used things like ChatGPT, and we've done some really cool stuff with it. But when we try to apply these generative AI concepts to media, things get a little bit more complicated here. In fact, a statement I made a few months ago wasn't even fully understood by some of my colleagues at Bloomberg, but it's now become clear that this is actually the only possible and viable way we can move forward. The future of media AI is archive-driven, not artificially created.

What does that mean? It means that if there's a video of me doing a backflip on stage, it has to actually exist. If you search for it, you can find it. This is not genuine synthetic content. This is not what we, as a media organization, aspire to do.

However, what we are trying to achieve here is to use all the content that exists, all the content that we've ingested and stored on our platform, to leverage that data, extract insights from that content, and assemble it to produce compelling videos that meet the specifications of the distribution target. This process is actually complex, because we really have to understand all the content, process it, and build a platform that can handle it.

Thumbnail 650

Our journey, like everybody else's, started with experimentation, and we found some good stuff. You start to see some of these capabilities, and you realize this is really cool, and it feels like magic. You hand an AI model a video, and it can extract metadata from that media. Who's in the video? What are they talking about? This works well, but it's in the context of a single video, and it stops working well as videos get longer. As videos become more complex, you lose context, and the information generated can be inaccurate.

There's a concept that's supposed to address this, and that's RAG. You may have heard of it, but RAG is really a solution to say that when you have a lot of data, you need to index that data, generate embeddings, and store them. And then all your AI models and workflows will leverage that information to generate something that actually comes from your data itself, not from the outside world. The problem is, when you try to scale this, it gets really complex. Because not all content is really created equal. The type of processing you need to do on that content actually changes based on the workflow, and every time you need to make a change, you have to completely re-architect parts of those workflows.

In fact, the content you indexed might not have the right information to achieve the output you want later, so do you have to re-index everything? Doing that on a large amount of content is very expensive and very complicated. Another thing that became clear is that when you start designing these media AI-driven workflows, you have to design the workflows, you have to implement the workflows, but then you have to test the workflows, and that takes a long time. By the time you're done testing, there's a new AI model, and there's a new way to implement an AI-driven pipeline that's much better than what you did. So how do you deal with that?

Another problem we saw here is that when you start implementing these complex workflows, you have to decide which vendors, which models to deal with. And when you find that the technology those vendors use no longer provides what you need right now, and you want to switch them out for something else, it's very difficult to remove them from those workflows and ecosystems. So, it's complex.

Thumbnail 880

And what we decided to do is instead of moving forward implementing something, we instead took an analysis of all the moving pieces we thought would need to be adjusted when we need to introduce new AI models, when we need to generate new types of content, when we need to extract information from content that doesn't exist yet.

We identified several areas that we thought would need to be adjusted each time we needed to accommodate changes within the platform. The first one is versioning. We introduced the concept of versioning to our models, embeddings, services, and workflow pipelines. What does this mean? It means that when queries and prompts are executed, you can actually trigger a specific version within that prompt, telling the model or embedding that you want to use this version instead of that version. So there are multiple layers, there are multiple levels of embeddings that are stored at different versions, and those can be targeted through different pipelines and workflows.

We've done the exact same thing, not just for versions, but also for quantifying the level of data that we extract as embeddings and the level of data that we generate through different steps of our workflows. So it means that there are workflows that I consider production and there are workflows that I consider non-production within the same ecosystem. And now I can have multiple flavors of a particular workflow that generates production data, that generates content and videos, but then also some that are not good enough yet, or have not been tested enough to be promoted to actual production output.

Having them coexist means that we can actually orchestrate them as part of one platform, and we can easily promote or demote certain parts of a workflow. So it provides dynamicism and agility within our workflows to bring in models, remove models, adjust processing pipelines, and get full observability into the output quality, the performance, and the cost of each step that executes those workflows.

Another concept that we've introduced, and this is a very important one, is the ability to deal with multiple databases of metadata and embeddings. The easy way to do this is to generate your embeddings, extract them, store them in a database, a vector database, and then basically use that for all your workflows. What we ended up doing is we needed the ability to store multiple types of embeddings that we could create using an audio embedding model, or Vision AI through video frames, or the transcript of a video. And they are stored in multiple repositories, multiple databases. There are different versions because we're using multiple models, which makes it complex in terms of the number of places where we actually store data.

We've created the concept of federated search with one search service that has the ability to select the right database to retrieve content from. But that can get complex when you try to decide which database to use. So we also put an AI layer in there that will analyze the prompt, analyze the query that comes to search, rewrite the query, and re-orchestrate those queries to point to the right repository, the right database. And also have the ability to merge outputs together, reprocess them, and return them.

So there is now that layer of abstraction between all the embeddings and the versions of the embeddings, and the metadata that could also be semantic or graph databases, and then the inference workflow that uses that data to actually create and generate content. And you might be surprised by the title of this slide, but at the end of the day, this is really the critical point for us. We realized that the platform is important, and the model is not. And what I'm describing here is what we quantify as the disposable AI strategy. Before bringing in one of our services or models, we actually ask ourselves, what do we need to do, what do we have to do when we need to remove it from our ecosystem? And once you start thinking like that, it changes a little bit the decisions you make.

So, those are the concepts. And with that, I'm going to pass it off to Rob to explain how we actually achieve these goals.

Thumbnail 1240

Content Ingestion to Analysis: Integrating Task-Specific Models, VLMs, and Vector Embeddings

Thank you, Loic. Good morning, everyone. My name is Robert Raver. I'm a Principal Specialist Solutions Architect, just like Brandon, and I have the honor of working with customers like Loic to build these solutions and platforms and to help them. So I want to talk about the journey from ingestion to the end of creating content. And when you look at this, I want to break it down into five buckets: How do we ingest content? How do we analyze it? How do we understand it? And then how do we create?

Thumbnail 1260

Thumbnail 1280

Thumbnail 1290

Thumbnail 1300

Let's go through this high-level architecture. You'll see that it's much like what Loic was talking about: decoupling our services, making them modular, so that we can swap out new models, have versioning for prompts, etc. So on the left-hand side is where we begin with ingestion. We're going to go through this, and we're going to lightly touch on, how do we get content in? Then we're going to look at how we analyze that content. How do we get meaning? How do we extract information? Then we're going to look at how we understand it better. And then we're going to look at how we search and then how we create new content.

Thumbnail 1330

Thumbnail 1340

As we go through this journey, when you look at the agent at the top right, it's very much like a human being. When the agent needs to search, it searches just like you do. It has the same experience. So let's talk a little bit about ingestion first. When we think about ingestion, we have to consider a variety of options. How do we get file-based media in? How do we take it and get it into the cloud? How do we get live content in and begin to analyze it? What archives do we have that we need to analyze? And then if we have third-party feeds or contributions, how do we get those in?

Thumbnail 1370

Thumbnail 1400

So what we really want to look at here is how do we build a funnel? How do we get all these sources in and aggregate them into a central place that can trigger workflows or analyze them? So that we're not having a lot of unique, specialized workflows and one-offs that we have to manage. We want to be efficient and we want to analyze everything the same way. So once we get to that stage, let's talk a little bit about analysis.

Thumbnail 1420

When we look at analysis, from the prior slide, we're ingesting content into S3, and then S3 sends an event, "Hey, there's a new file, there's a new asset." So we're going to look at how we trigger that workflow. And when you think about this, analysis for media has really changed in the last few years. And when you look at this, there are three different types of analysis. There are task-specific models. These are like your prior Amazon Rekognition, etc. There are Vision Language Models, or VLMs, as a new way to analyze media. And then there are embedding models. How do we generate embeddings, and what do they mean?

Thumbnail 1450

Thumbnail 1460

When we look at task-specific models, these are trained models that are looking for very specific things. They're looking at generating labels, or they're looking at something pre-defined, like a transcript. When you look at Vision Language Models, these are a little bit more freeform. These are things where I can take an asset and ask a question, say, "Describe this," and it's not just telling me, "Hey, there's a person there," but it's telling me, in natural language, what that person is doing. They natively accept video, images, and audio. And when we look at all these modalities, we want to be able to analyze audio, video, images, and even documents the same way, to create that unified space.

The most important thing is that these models often don't require additional training to do that. So it's zero-shot inference, where you can take media, ask a question, and get an answer.

Thumbnail 1520

Thumbnail 1560

Thumbnail 1580

Then we look at vector embeddings. Vector embeddings are probably my favorite thing right now. And what we have here is, how do I translate an image or a video to understand it semantically, or understand it in a different way, to represent it in numbers? When I create a vector, which is a flow of numbers, that allows me to quickly compare meaning. So is it just a dog, or is it a dog on a beach with a bandana? And how do I do that? When I do that, I take something like text, an image, or a video. I put it into an embedding model, and it outputs these vectors, these bunches of numbers. By looking at the numbers themselves, you're not going to get a whole lot. But what it does tell you is, when you put it into a vector space, almost like a three-dimensional space, you begin to build a 3D model of what's closely related and what's not related. Is that a Yorkshire Terrier over there? How close is it to a gorilla? You begin to measure similarities and meaning.

Thumbnail 1610

Thumbnail 1650

When we look at this from a multimodal perspective, it's going to allow me to analyze audio and compare it to an image and compare it to a video and compare it to a portion of a video. So the question is, if I have these three different ways to analyze content, and they've evolved over the last several years, which one do I pick? There's a lot of choices out there. But the short answer is all of them. You use each one of them, and you take what each one of them is good at, to build a rich understanding of the content that provides meaning. So when you begin to build a media management platform, what you want to do is you want to create a unified architecture that integrates all of these, correlates them, relates them, and surfaces the insights.

Thumbnail 1660

Thumbnail 1700

And when you look at that, and then you map that into the AWS services and our partners, there are several services and models that make that happen. We use Amazon Transcribe, and Amazon Rekognition Video for labels and for structural breakdown, for shots, for scenes. We can use Amazon Bedrock. This is where models are hosted, and we have first-party models and partner models, where you can do things like generating embeddings with Twelve Labs Marengo 3.0 or their new Nova multimodal embedding model, keeping your content in there. So let's look at it really quick. To give you an idea, I want to show you a demo of what this looks like.

Thumbnail 1710

Vision Language Models and Vector Embeddings: Realizing Natural Language Video Understanding

When you look at this, this is a prior session. That person, maybe the one in the back corner, you might recognize. And when you start to play this, how do you describe it? If I asked you what's happening in this video, how would you tell me what's happening? Would you say, person, blue, hand? You probably wouldn't. You'd talk in natural language about what you're seeing, what's occurring.

Thumbnail 1750

Thumbnail 1770

Thumbnail 1780

So when we take that and we look at it with the Nova model, this is a quick Jupyter notebook that I ran. This uses the Python Boto3 SDK. And with Claude, it's really easy, and by the way, when you run it with Opus, it generates this in a couple sentences. It's pretty amazing these days. But this is how I would describe it. And, uh, I asked it, the prompt was, tell me about the second speaker in the video. Describe the second speaker in the video. There was a speaker before Brandon, there was Brandon, and then there was a speaker after Brandon. Very much like here. And this is what it came back with. The second speaker is a bearded man, which is still true today, wearing a light blue shirt and blue jeans. When you look at the different methods, when you look at embeddings, it's just a bunch of numbers. It doesn't tell you this. But what it does tell you is how similar it is to a search term. When you look at labels, it doesn't tell you that. It doesn't know who the second speaker is. It doesn't understand context or temporal order.

And that's what we really want to do, right? We want to start using each one of these methods, take what each one of them is good at, and combine them and do it efficiently.

Thumbnail 1820

Thumbnail 1830

So if we went back in time and did analysis, the next thing is we move into search. And when you look at this middle section, this is really where everything happens, right? Ingesting data, analyzing it, unifying it, so that we can access it, specifically search and agents. When you look at the top right, we have a conversational interface that interacts with an agent, you ask it questions, and it uses it. Or you might use a single-turn search, but you still use that search API. So that we can provide the same experience to everybody.

Thumbnail 1870

Traditionally, with search, people used keywords, right? Some might have used file names. For example, trying to be clever, putting underscore IND331, and expecting people to know that's what Brandon or I were talking about. But it didn't work. So what we're really looking for is how people can find what they want in seconds, understand what it is, and be able to use it.

Thumbnail 1900

But search is hard. The way I search for something and the way you search for something might be vastly different. The way I describe what's happening and the way you do it are probably vastly different. So search really needs to adapt. Search needs to understand you and what you're asking, and it shouldn't be static.

Thumbnail 1930

So we take that data, that analysis we're talking about, and we categorize how we use it into three categories. Do we do keyword search, right? Robert Raver? If you noticed the example I gave earlier, I said, "Describe the second speaker." Did I say it was Brandon Lindauer? No, I don't know who it is. He's not that famous. I'm just kidding. I just wish I was as famous as you are. But keywords, right? If you tag him from a custom face perspective, keywords match immediately. With vector search, if you say, "Find people talking, or presenting, or on a stage," it finds similarities. We want both, right? So that's where hybrid search comes in.

Thumbnail 1980

When you look at keyword search, traditionally, it works on a data store like OpenSearch. It uses algorithms, looks at keywords within the indexed fields, looks at frequency, looks at proximity. Is it an exact match? Is it a fuzzy match? And it looks at how often it occurs to return that.

Thumbnail 2000

When we look at vectors and vector engines, we're thinking about how we take that, map that space, and then query against it. How similar is it, right? When we do a vector query, we're basically taking two vectors, those two numbers, comparing them, and deciding how close or how far apart they are. If I compare a Yorkshire Terrier running down the stairs to myself on stage, hopefully, they're pretty far apart, right? They're not going to be close.

Thumbnail 2040

So when we look at that and we find similar content, this is what you're actually doing. You're taking my vector, which is my search term, or a person speaking on stage, and you're sending it into this vector space, this index of vectors, and it's returning the top K, or the top few results. Similarity scoring, right? So it's returning to you, hey, here's something that's very close to what you're asking for, or, no, there's nothing close to it.

And typically, when you think about a multimodel model, like our Nova multimodel model, 12 Labs Marengo, some of the open source ones, it semantically understands what it is. But the key is that embeddings are specific to the model, and the meaning of those embeddings is also specific to that model. So one model might be trained on semantics, but the next model might be trained on my voice. Can it identify my voice from heuristics?

Thumbnail 2100

Implementing Hybrid Search: Analyzing Search Intent and Dynamic Weighting

And we actually go into search. We want to search this data. Here are some search terms: "Brian on stage with two other people" versus "Robert talking about search technologies."

How do you evaluate these? How do you know if I'm on stage, or if Brian is on stage, or if I'm talking, or if there are multiple people on stage? When you look at the modalities, if you want to determine who's there, or if there are three people on stage, you're going to look at it visually to confirm. You're not going to try to hear if you can confirm it. But if you want to hear me talking about something, you're going to look at my transcript, you're going to look at what I'm saying. If I'm waving my hands up here, you're not going to know if I'm talking about search or if I'm talking about Yorkshire Terriers. And by the way, I like Yorkshire Terriers, I have a few, if you haven't noticed.

Thumbnail 2150

Thumbnail 2180

So we want to create a hybrid, and the answer is, how do we use search and break it down into search intent? OpenSearch has built-in ways to do that, using algorithms for ranking. Or you can do it yourself at the application layer. In fact, we recommend testing with both. We're going to look at how we do that at the application layer, or a small middle layer. For example, if I say Robert Raver in keywords, I'm going to weight keywords much higher than vectors for inspirational movies. So we want to do that dynamically.

Thumbnail 2220

Thumbnail 2240

Thumbnail 2250

So when we look at the architecture a little bit closer, here we have our search input box. On the left side, we have our thumbs up, thumbs down, to provide feedback to help us fine-tune this over time. You type in your search term here, it goes to an API Gateway. Lambda responds to that, takes it, and says, this is my search term. What's the intent of that search term? What are they really asking for? So it goes to an LLM and says, hey, here's my search query, how should I look at this? At the same time, we're creating embeddings for the data stores or the embeddings we have and allowing us to query them. Bringing them back and allowing us to give you a final answer.

Thumbnail 2260

Thumbnail 2290

When we look at that, here's another little Jupyter notebook I have, and I did something similar. Up here, I have my prompt. Now this is a prompt that's created. I'm not saying this is the best prompt in the world, but it goes back to Louis's point, that's what we want to be able to version, improve, iterate, and measure. It says, I have these data stores, analyze the search term, and return the weights, tell me what I should be paying attention to. Going back to three people on stage versus two others, it doesn't make a whole lot of sense to me, but what you're actually looking at is, down here at the bottom, the LLM weight analysis. You can see that Visual is 0.9. It understands that to understand how many people are on stage, it's going to look visually. And there might be specific people, so it gives it some weight, but from an audio perspective, it's not going to do a whole lot. So we want to make sure we're searching the right areas to do that.

Thumbnail 2330

Now, going back to the other search term we had, Robert talking about search, you can see how that changes. People goes up. It's looking for people by the name of Robert. Transcript goes up dramatically. It understands that you're looking for what I'm saying. And the audio in this case is audio background. Someone making loud noises, a car driving off, whatever it is, no, it wants to understand someone is actually talking.

Thumbnail 2360

Thumbnail 2370

Thumbnail 2390

Once we get there, we really want to understand what we have, right? When we do that, we get back a bunch of results, but that doesn't necessarily give us the true contextual meaning of what a search result is. If I get back a hundred video results of me talking, how do I understand that? How do I understand how they're related? So that's where we look at knowledge graphs. We can use data stores specifically for these purposes to relate content.

Thumbnail 2400

Content Understanding with Knowledge Graphs: Visualizing Relationships and Evolving Stories

There are several ways to use graphs, but at the end of the day, a graph is taking these two nodes, a person and an asset, and creating these relationships. And there are several ways to do that.

Thumbnail 2410

Thumbnail 2430

You can do it chronologically, meaning what's happening over time, or by identity, domain, and so on. What we do is segment that data and use that graph, in this case Neptune Analytics, to perform searches and create meaning. When you look at the same analysis, the same data we created earlier, we're putting it into both a graph and OpenSearch and searching it in different ways. These are different data store types and different ways to extract meaning from the data, but they can use the same embeddings.

Thumbnail 2450

Thumbnail 2470

Now you can start seeing correlations. When you look at this, you start building this graph. There are people appearing in assets, assets are created from other assets, and there are stories happening in certain places. When you start doing this, understanding emerges. If I have a hundred assets, but I've actually created new assets from four other assets, and then I've sent them to YouTube and TikTok, creating even more assets from there, I can understand that those seven assets are actually the same thing. I've just created new assets. If you have search results and seven of them are the same, you probably want to identify that, make you aware of it, and say, these are really the same assets or derived from each other; this is the unique one.

Thumbnail 2500

Thumbnail 2510

Thumbnail 2560

What we do is take the data, that analysis, and feed it into OpenSearch and into that knowledge graph simultaneously, and then query it in the same way. When we're looking at it, if we're feeding data into a graph and it's relationship-based, we're taking that and creating what we call virtual entities. Many people search for content, look at what's in the content, and expect to say, here are a hundred assets, this person is in them, that person is in them. But do you think that way in your head, or do you really want to say, I'm interested in Rob, and I actually want to go to Rob and understand the information I have about Rob, the assets he's appeared in, maybe things that have been recreated? You take the same data and start creating these virtual entities within this graph, those are those nodes, and you start linking them.

Thumbnail 2570

Thumbnail 2580

Thumbnail 2590

Thumbnail 2610

You start to understand my voice, perhaps my face, other characteristics, and build all of that together. When you look at that, and we come back here and look, we've created a sample graph that we want to see. This is fictitious information, and we're looking at two summits that might have happened here, tech summits, maybe one in the spring and one in the fall. But you start to understand what the keynotes were saying, what the sessions were saying, and what stories were being told.

Thumbnail 2620

Thumbnail 2630

Thumbnail 2640

Thumbnail 2650

When you start to look at this, you begin to understand not just a single piece of information, a single asset, but what topics were discussed, which assets are related to those topics, what are the connections here, who was saying what. When you start to look at this, you gain an understanding of what's in your assets, how they're connected, and how things change or relate over time. In this example, by using this graph, you can say not just that Rob, Brandon, and Loic are in my assets and might have been speaking, but also who the repeat speakers are across my events, what the evolution of topics is, and how the narrative changes. We might be talking about RAG in 2024 and GraphRAG in 2025, and you can see how the story evolves.

Once you start understanding how these stories are evolving, what do you get? You can use that to create new content, create new stories. We want to spend less time on searching and less manual effort trying to understand what we have. And we want to quickly display that context, and use it to create high-quality content, stories, and searches. With that, I'll turn it back over to Brandon.

Automated Content Creation with AI Agents: Expanding Reach and Multi-Platform Deployment

Thanks, Rob. That was a concise but deep overview of a lot of the technology behind search. But in reality, this isn't just search. Nobody's asking for search. You don't want to pay for search. You want to pay for somebody to search. What you're looking for is discovery, technology that enables you to find what you need and allows you to correlate across multiple entities, events, and such. But finding data is only part of the story. If you can find content, great, you found it. But there's real value if you can take action on what you find.

Thumbnail 2770

Thumbnail 2780

Rob and I sat down with Loic, and we said, hey, let's talk about content creation. What does that look like for you? There's a lot you can do with the metadata we've generated and the different ways we can search and find things. There's a lot you can do. I'll highlight a few, and we'll dive deep into one of them. When we start to automatically create content, think about the thousands and tens of thousands of hours, the petabytes of data Bloomberg has in its archives. Traditionally, someone had to sit there and sift through all of it manually. If you can find it, if you can find it automatically, suddenly, what you can do, pulling together stories, creating new stories, becomes amazing.

So let's dive into expanding reach a little bit. Loic gave us an example: you have a 30-minute interview with a prominent actor. Great, you got the opportunity to sit down with them. You have all this amazing interview, all this content. Two minutes of it are going to air. Traditionally, those two minutes would air, and the rest of the 30-minute interview would be thrown into the archive, maybe to be used someday. Nowadays, there are a lot of platforms. There are a lot of ways to expand your reach. You can post to all sorts of social media, Facebookgram, Insta Snap, whatever the kids are doing these days. Each of them has a different way of reaching people, but it requires a little bit of content curation, and actually quite a bit of effort. So, if we want to expand this reach through YouTube, TikTok, Instagram, and so on, we're going to look at this automated process.

Thumbnail 2880

Thumbnail 2920

It starts with a user request, a natural language query. A producer goes in there and says, "I have this 30-minute interview, I want to cut it short and make it available on these platforms. I want it available on standard channels." You can use something like agents, use parallel execution, assemble them, and even let humans review them if necessary. This gives you a fully automated pipeline for content production. You haven't heard the word agent at re:Invent this year, I think, and AI either.

This is what we envision. If you want to do automated content assembly and production, a user comes in and says, "I want this 30-minute interview available on various social platforms." YouTube is great, you can post the full version there. Two minutes have already aired, but there are platforms where you want a 12-second version, platforms where you want a 30-second version, and platforms where you want a 2-minute version. You might need multiple 12-second cuts to reach different audiences or convey different topics. That's quite a lot.

We start with analysis, summarizing and understanding metadata. This is much of what Robert explained, understanding all of this, searching it, and understanding the user's intent when they do it. At the same time, understanding them, and understanding what you need to find, what you need to search for, allows you to drive the selection process, the selection agent. This moves forward, finding different parts of this video, useful for 10 seconds, useful for 30 seconds. This is your editor. This is where the editor sits and says, "This is useful, this is the part I want to use."

Assembly is also part of the editor, but it does the work of putting these things together. So it's like this: okay, here's a 5-second chunk I want to display. I have the 30-second clip I want, or I have these two things I want to put together as a 30-second clip. There might be different angles. You interviewed this actor, and you shot it with two cameras. You'd ingest these as two different files, and they'd be indexed separately, but the entities, search, and everything Robert just covered would be able to say, yes, these two clips, these two parts, are related as the same thing.

And you can have the selection agent, the assembly agent, look at it and say, you know, this angle is better here, this angle is better there, based on parameters, inputs, all sorts of things you can give it. You put it together, take that output, execute it, and pass it to another agent to say, make sure this meets our criteria. Make sure this actor doesn't look bad, make sure they're not saying anything they shouldn't be saying, make sure they're not violating content guardrails, not changing what's here. I mean, I've worked in video for a long time. It's easy enough to take someone's clip out of context.

As Loic said at the beginning, trust is key for Bloomberg. It needs to be accurate, and they need to be trusted by the general public. So it's about having a review agent say, let's make sure everything is good before we publish this. And in that area, you can also put a human in the loop process. Hey, I want humans to look at these before everything. I want humans to look at them and get a score back from the agent. Hey, I'm flagging this, or a score that says everything looks good based on these scores. All these things are possible.

And finally, you publish it. This can be fully automated. From starting with this interview and making it available, you can have agents publish to all portals. Now, in such cases, you don't always have to use agents. Let's not use AI where an if statement works. But there are specific situations where you say, hey, let the LLM decide if this should go to YouTube or TikTok, if these channels are better. You provide that feedback to the human in the loop.

Thumbnail 3170

So when we get into content selection, content selection can actually be multiple agents. You can break this down and have all sorts of things running in parallel. Find the segments of that video. You can adjust them as needed. You can sync them, you can find clips. The assembly agent might communicate back and say, hey, I have 25 seconds, and the goal is 30 seconds. Can you find something to fill it? I need B-roll related to this. There are all the graph relations. Okay, let's find something related to images, cutaways, videos, audio, all the other things we're moving forward with.

These agents will communicate with each other, automatically iterate, and collaborate. And of course, you can always put humans in the loop. And one of the coolest parts of this is that because all of this is happening between agents, because all of this is running on AWS using Bedrock, you can get the output of each. You can save it with complete lineage, so you know which angles were used for this, why they were used, what the scores were, what model score indicates this, which platforms were targeted with this cut-down, and so on. You can get all this information and have complete provenance of why you did this in this way.

Thumbnail 3270

And you can understand what the model is doing and continue to improve and adjust, because again, there's no one final way to do this, and there's always room for improvement. Regarding multi-platform assembly, I mentioned various platforms. There's 9x16, there's 16x9, there's 1x1. Some also include text overlays. If you're taking a wide shot and making it vertical, you might need to follow the action. You might also need to keep it a wide shot and add blur to the background instead of just blacking out the top and bottom.

Verticalization and various platform targeting have different methodologies and approaches, and you can use LLMs for that. You can use these tools to decide what to frame where, and what looks better. You can also decide whether to make a cutaway here. Ultimately, it's important to create something suitable for the right platform, so it's not just about making one cut-down. Instead of just making a 32-second cut-down, you're creating a 32-second cut-down and versions for multiple platforms.

Thumbnail 3330

As I said earlier, it starts with a user request. You orchestrate with agents, you can run many of these in parallel, assemble the results, and you can include humans there. This improves speed, volume, consistency, and traceability. The fact that you can reach more audiences on more platforms faster than ever before, and do it in an automated or largely automated way, is fantastic.

Thumbnail 3360

Thumbnail 3390

Platform Value: Leveraging 13 Petabytes of Media Assets and Creating New Revenue Opportunities

Rob showed you this at the beginning of the presentation. It starts with ingestion, we analyze this data, search it, understand it, and then take action by creating content. We've explained all of this, and the great thing is that this truly opens up new revenue streams. Because you can build this on AWS, you can do it in a well-architected way. It's highly available, highly resilient, secure, scalable to build and deploy, and includes insights, controls, and monitoring.

Thumbnail 3430

I'd like to invite Loic back up here. He's going to explain a little more about what Bloomberg is doing with all of this we've described. Thank you. I appreciate Rob and Brendan doing a great job explaining how we're doing specific things and about media workflows leveraging this new technology.

I want to pause here and re-emphasize the goal. I think you understand the complexity we're dealing with. The goal in doing this is really to have the ability to create a platform that can adapt to all the new business goals and changing requirements, and all the evolutions happening within AI. It's complex, it's intricate, there are many moving parts. It's impossible to redesign everything tomorrow when something better comes along. So this is really the foundation of all the work we're doing here. Yes, we can do all these things, and we can change them.

This platform will unlock several things for us and optimize how we manage and create content today. Firstly, by improving content discovery, management, and delivery, we should significantly reduce time to market. Secondly, it unlocks new delivery targets. Creating new formats targeted for platforms and doing it well is complex. Sometimes you have to manage workflows by using the lowest common denominator and pushing content that may not be optimal for those platforms, but you can still maintain a presence on those platforms. However, with this, you can create content that is truly suitable for those platforms, providing better monetization on those platforms. We've discussed this.

Everything we've discussed so far has been done by humans. We're not replacing humans; we're really helping them not spend time on things they shouldn't. I've saved the best for last. This platform will enable one thing that's impossible today. As was said in yesterday's infrastructure keynote, 90% of unstructured data is video. If it's unstructured, it's not well understood. With this technology, we can truly analyze and generate insights from our existing 13 petabytes of media, which continues to grow at a rate of 3,000 hours every day.

It's impossible for humans to analyze all of that. So we'll be able to analyze real-time news, correlate it with everything that exists in our repository, and create new types of stories or tell stories in a much better way than we can today.

Thumbnail 3630

I want to thank AWS for joining us on this journey. When I went to them and explained the concept, and it really was just a concept, they didn't close the door on me. What they did was say, well, what you're trying to achieve is something we can't do today, but if we work together and use AWS technology to shape it in line with your vision, we can accomplish something. And that's exactly what we're doing here. Thank you. And I hope to show you the evolution of what we're building here soon.

Thumbnail 3700

Thank you all again. Thank you for coming so early on a Friday morning, really early. You all are awesome. If there's anything you want to check out for learning, there are more resources here. If there's anything in this that you want to deep dive into, re:Invent sessions will be available soon. You can level up your skills with AWS Skill Builder, so please do that. And if you need anything, we're here, we're handing out free high fives, and reminding you what to write in the survey. So thank you all.


※ This article was automatically generated using Amazon Bedrock, maintaining the original video's information as much as possible.

Discussion