iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📖

re:Invent 2025: RAG vs. MCP Selection Criteria, Implementation Cost Comparison, and Hybrid Approaches

に公開

Introduction

By transcribing various overseas lectures into Japanese articles, we aim to make hidden valuable information more accessible. The presentation featured in this project, based on this concept, is here!

For re:Invent 2025 transcript articles, information is compiled in this Spreadsheet. Please refer to it.

📖re:Invent 2025: AWS re:Invent 2025 - RAG vs MCP: Making the right choice for enterprises (DEV342)

This video explains the choice between RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol). RAG is a framework for search and knowledge bases that integrates with enterprise data to reduce hallucinations, while MCP standardizes communication between agents, making it suitable for automation and system integration. The presentation provides implementation examples using various AWS tools, a cost comparison for 10,000 queries per day, and a decision-making framework. It recommends RAG for search only, MCP for action execution, and a hybrid approach if both are needed, also introducing cost-saving strategies such as prompt caching, reducing the number of tools, batch operations, and smart routing. The importance of designing with cost, latency, governance, and security in mind is emphasized.

https://www.youtube.com/watch?v=ic3JunCBjf4

  • This article is automatically generated, maintaining the content of the original lecture as much as possible. Please note that there may be typos or incorrect information.

Main Content

Thumbnail 0

Thumbnail 30

Thumbnail 50

Fundamental Concepts of RAG and MCP, and Implementation Architectures in AWS

Hello everyone. Thank you so much for being here. I'm really excited to see you all looking at this important topic. We're going to talk a little bit about RAG and MCP, and just broaden our perspective a little bit more about business and architecture, and what's the right choice. So before we start, how many of you have been using RAG recently? Okay, what about MCP? Yes, it's about fifty-fifty. But MCP is more popular these days, right? However, we need to understand how to handle some situations and how we can make the best choice for architecture and for our customers. So, let's talk a little bit about why this is important. The definitions of RAG and MCP are really simple. I'm going to talk about your options, the tools you can use to run MCP and RAG in AWS, or how to design your architecture. Then we'll compare them side by side. And most importantly, I'm going to provide you with a decision-making framework that can help your company.

Thumbnail 80

So why is this important? Today, data is the foundation of all AI-related things, all generative AI, and it's accelerating and evolving rapidly. More and more people are using RAG, and today MCP is very popular, with many people using it. So you need to understand exactly what you need to use. This is a critical choice when designing architecture for your customers or internally, and you need to know exactly what to use. This comes with some risks: complexity, cost, and especially security. And again, the goal here is to give you a decision-making framework.

Thumbnail 120

Thumbnail 130

So what are the options? RAG and MCP. RAG is the Retrieval-Augmented Generation framework. This is mainly used to empower LLMs and connect with enterprise data. You have a knowledge base, you have specific documents, you have a database, and you want your LLM to operate only on that data. In this case, you can reduce hallucinations and keep your answers tailored to your company's specific scenarios. It's typically suitable for search, Q&A, and knowledge bases.

MCP is the Model Context Protocol. It basically standardizes all communication between agents. It creates a pattern like HTTPS, or like old data, or like all communication protocols. It's typically used more when using LLMs as tools. You need to execute actions, and you need to perform some kind of action. You can also read data, but it's usually for other use cases. And it's usually suitable for automation, system integration, and tools that connect with agents.

Thumbnail 200

Thumbnail 210

This is one of the options you can use with AWS. This is just a stack, just an example. It's not an architecture. There are many more tools you can use. It really depends on the scenario you want to design and the problem you're trying to solve. But there are some tools in AWS that you can use to design a RAG architecture, and there are some tools you can use to design MCP or use an MCP server. Again, you can use either one, and you don't have to use all of them. Let's look at some examples here.

Thumbnail 250

So, here's a really basic example regarding RAG. All the architectures I'm showing you here are available in AWS reference architectures or documentation. Here, basically, a user accesses a RAG application and interacts with the LLM. And you have a vector database and all your documents, data, media, or whatever you need. This is exactly what RAG does with knowledge bases and information.

Thumbnail 280

This is also RAG, but it's a slightly more complex architecture. There are different scenarios you can use with models and within your company. You can use OpenSearch, you can use LangChain. You can also use Bedrock here. So it really depends on how you want to design your system or how you want to solve the problem.

Thumbnail 300

This is a reference architecture for MCP.

Thumbnail 330

With MCP, you can use Fargate, Lambda, and all the AWS authentication services you use to support MCP. It's a basic architecture, a bit complex, but in this case, it performs actions and does more. This is a slightly more complex architecture, but in this case, it uses AgentCore. Here, all the AgentCore runtimes execute MCP services and connect with other AWS services or external services. There is memory and other Bedrock services.

Thumbnail 350

Thumbnail 360

Comparative Analysis of RAG and MCP, and a Decision-Making Framework with a Hybrid Approach

So, now that we have a little understanding of architecture and how RAG and MCP work, I want to make a comparison between them. Again, RAG is primarily for data retrieval. It's really just for search. It doesn't perform actions, right? The content is relatively static. For example, company policies, documents, or risk tolerance. It's a bit lower, and the validation and implementation time is shorter. This means the complexity is not that high, and it can be implemented a bit faster.

Thumbnail 420

MCP is typically used for AI agents to interact with other tools. For example, when you need to connect with Salesforce or other services to perform actions. MCP can typically be used for automation, workflows, and orchestration for these. Here are some common use cases where you can use RAG. For example, you can use it for document retrieval or policy Q&A. Again, these are all documents you have, and if you want to access a knowledge base chat, you can use it. Each of these use cases has different architectures and can use different services, but basically, RAG is recommended for this kind of service.

Thumbnail 450

Thumbnail 480

When we talk about MCP, we see more automation, monitoring, integration with other systems, API orchestration, workflow triggers, and even DevOps automation. So, MCP provides a lot more tools and powerful capabilities for applications and LLMs to communicate with other systems. One of the important things about this comparison is precisely the cost. I've brought a comparison here. Again, this can really vary depending on the architecture. We're giving an example of 10,000 queries per day or 10,000 workflows per day. Here we're using vectors, embeddings, LLM inference, or MCP tools. Here you can see that with more tools and functions and execution, the price really changes. That's why you need to understand, especially about cost. What kind of architecture you should use, or if you should use a hybrid architecture.

Thumbnail 520

So, here's a decision-making framework you can use when thinking about what kind of architecture you want to implement. Again, RAG alone is basically for providing information from documents, knowledge bases, and databases, and when you want to get all the information based on that context. It's perfect for search-only information. MCP is for simple, independent actions that don't require complex business context. If you don't need business context, if you don't need documents, you can use MCP. But now, you can use a hybrid architecture. You can use both. Because you can have RAG provide business context, and use MCP as a tool to access everything. So it's perfect for intelligent workloads or workloads that need to perform some kind of action or communicate with other systems.

When we talk about costs, there are several strategies you can adopt. First, use prompt caching. You don't need to send prompts every time or pay for tokens. You need to reduce the number of tools you use in MCP. Because if many MCP servers are running with many tools inside, every time you perform an action, the cost doubles. So you really need to design it well.

In this case, use batch operations. You can process many things at once to reduce costs. And it's really important to implement smart routing. You can reduce costs by allowing you to decide whether to use MCP or RAG without using an LLM, before using an LLM.

Thumbnail 660

Simply put, you can ask this: Do you need knowledge grounding? Then use RAG. Do you need tool orchestration or actions? Then use MCP. Do you need both? Use a hybrid approach. And you always need to consider cost, latency, and governance. When I talk about hybrid scenarios, you can see again the cost comparison between these three architectures. You can see that MCP alone, considering 10,000 workflows per day, can be really expensive. And here, if you implement all the cost strategies, you get a very good scenario for the same type of architecture.

Thumbnail 700

Again, this really depends on your problem and how you design your system, but it's a fair comparison between these three architectures. So, here are some key takeaways to summarize what we've talked about. Again, RAG is very good for knowledge bases. MCP is really popular right now. Because it's making all LLMs and agents really powerful, allowing them to access everything. However, you need to be careful with cost and security. It gets a bit complex, so you really need to think it through.

Thumbnail 750

But again, if you need a balance of power and complexity, you can use both. And AWS offers many tools, as we've seen. You can build it like building blocks and combine both architectures. That's all. I'd like to connect with all of you. Here are all my social media, LinkedIn, YouTube. I'm always posting content about AI and LLMs, so please connect. Thank you very much.


  • This article is automatically generated using Amazon Bedrock, maintaining the information from the original video as much as possible.

Discussion