iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🤔

Experimenting with a Combination of RAG Fusion and CRAG

に公開

Introduction

In this article, I will experimentally try combining the RAG methods "RAG Fusion" and "CRAG".

RAG Fusion

In a typical RAG implementation, documents are retrieved from an index for a single question, and the search results are passed directly to the LLM as context.

RAG Fusion adds elements of "generating similar questions" and "re-ranking (Reciprocal Rank Fusion)" to standard RAG. The main advantage is that by searching using similar questions, you can achieve greater diversity in the retrieved documents.

The flow from question to answer is as follows:
Generate multiple similar questions from a single question

Perform a similarity search from the index for each of the generated questions (+ the original question)

Re-rank search results (calculate scores for each document) and extract documents with high scores

Pass the extracted documents as context to the LLM to generate an answer

The formula used for re-ranking (calculating document scores) is as follows:

\text{RRF}(d) = \sum_{i=1}^{n} \frac{1}{k + r_{i}(d)}

As an example, let's calculate the score for Document B below.

Document Similarity Rank for Question A Similarity Rank for Question B Similarity Rank for Question C
A 1st 2nd 3rd
B 1st 3rd 2nd
C 2nd 3rd 1st
D 3rd 1st 2nd

Calculate the values for each question as follows:

\frac{1}{\text{constant k (set to 60 in this case)} + \text{similarity rank}}
  • Value for Question A = 1 / (60 + 1) ≈ 0.01639
  • Value for Question B = 1 / (60 + 3) ≈ 0.01587
  • Value for Question C = 1 / (60 + 2) ≈ 0.01613

Next, the score can be calculated by adding each value together.

  • 0.01639 + 0.01587 + 0.01613 = 0.04839

After calculating scores for each document in this way, use the documents with the highest scores as context.

CRAG (Corrective Retrieval Augmented Generation)

CRAG adds elements of a "function to evaluate if document content is 'Relevant', 'Irrelevant', or 'Ambiguous'", a "function to perform a web search if there is no relevance", and "knowledge refinement" to standard RAG. The main advantage is that it can reduce hallucinations because of the relevance check.

Note: The following two points are omitted in this implementation:

  • The "Ambiguous" evaluation in the "evaluation function for document relevance" (in this case, evaluation is binary: "Relevant" or "Irrelevant").
  • "Knowledge refinement".

The flow from question to answer is as follows:
Search and retrieve documents from the index using the question text

Evaluate whether the retrieved documents contain content relevant to the question

If there is no relevance, perform a web search and add information to the documents

Pass the documents (+ web search results) as context to the LLM to generate an answer

About the combination of RAG Fusion and CRAG

RAG Fusion alone may sometimes rank documents high (within the index) even if they are not actually relevant to the question. Therefore, by combining it with CRAG's relevance evaluation filter (+ Web Search), irrelevant documents can be discarded, and if information is insufficient, it can be reinforced through a web search.
The aim is to maintain the diversity of information provided by RAG Fusion while preventing hallucinations caused by irrelevant documents.

The flow from question to answer is as follows:
Generate multiple similar questions from a single question

Perform a similarity search from the index for each of the generated questions (+ the original question)

Re-rank search results (calculate scores for each document) and extract documents with high scores

Integrate the generated questions (+ original question) into a single question

Evaluate the relevance (or lack thereof) between the integrated question text and the documents

If there is no relevance, perform a web search and add information to the documents

Pass the documents (+ web search results) as context to the LLM to generate an answer

Implementation of RAG Fusion × CRAG

LangGraph is used for the implementation. I have explained how to use LangGraph in the following article (which also introduces the Claude 3 model and the Tavily search tool used in this project).
https://zenn.dev/yumefuku/articles/llm-agent-rag

Definition of Nodes and Conditional Edges

The names of the Nodes and Conditional Edges (function names in the code) and their processing details in this implementation are as follows (it is a one-way flow from top to bottom since there is no cycle processing).

Node or Conditional Edge Function Name Processing Content
Node generate_query Generates similar question text from the original question text
Node retrieve Retrieves documents from the index using the generated questions (+ original question)
Node fusion Calculates document scores and extracts the top results
Node integration_query Integrates multiple generated questions (+ original question) into a single question
Node grade_documents Evaluates the relevance between the integrated question and the documents
Conditional Edge decide_to_generate Determines whether a web search is necessary based on the relevance evaluation
Node transform_query Converts the integrated question into a web search query
Node web_search Executes a web search and retrieves results
Node create_message Creates a message to be passed to the LLM
Node generate Generates the response from the LLM

Represented as a graph diagram, it looks like this:


Models

I am using the "Haiku", "Sonnet", and "Opus" models of Claude 3.

Within the implementation, I use the models for different purposes (per Node) as follows:

  • Haiku
    • grade_documents (evaluating the relevance between the integrated question and the documents)
  • Sonnet
    • integration_query (integrating multiple generated questions (+ the original question) into a single question)
    • transform_query (converting the integrated question into a web search query)
    • generate (generating the answer from the LLM)
  • Opus
    • generate_query (generating similar questions from the original question text)

I chose which model to use based on the following criteria: "Haiku" for large context volumes, "Opus" for small context volumes where high accuracy is required, and "Sonnet" when both context volume and accuracy are moderate.

Preparation

Installing Libraries

# Libraries for LangChain and LangGraph
$ pip install langchain
$ pip install langchain-community
$ pip install langgraph
$ pip install langchain_anthropic

# Libraries for Indexing
$ pip install unstructured
$ pip install sentence-transformers
$ pip install faiss-gpu

Preparing the Index

For this project, I created an index by retrieving information from Wikipedia.

https://ja.wikipedia.org/wiki/葬送のフリーレン

The following is the code used to create the index (using FAISS).

faiss_index_create.py
from langchain_community.vectorstores.faiss import FAISS
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import DirectoryLoader

# Location to store documents (directory)
data_dir = "./data"

# Location to save the vectorized index (directory)
index_path = "./storage"

# Load directory
loader = DirectoryLoader(data_dir)

# Load embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="intfloat/multilingual-e5-large"
)

# Split text into chunks
split_texts = loader.load_and_split(
    text_splitter=RecursiveCharacterTextSplitter(
        chunk_size=300,
        chunk_overlap=75
    )
)

# Create index
index = FAISS.from_documents(
    documents=split_texts,
    embedding=embedding_model,
)

# Save index
index.save_local(
    folder_path=index_path
)

I have written about creating and reading indices using FAISS in the article below. Please check it out if you are interested.
https://zenn.dev/yumefuku/articles/llm-langchain-rag

Coding

The implementation code (entirety) for RAG Fusion × CRAG is as follows.

rag_fusion_crag.py
import io
import os
import operator
from typing import List, TypedDict, Sequence, Annotated
from langchain_core.messages import BaseMessage
from langchain.prompts.chat import ChatPromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain_community.retrievers import TavilySearchAPIRetriever
from langchain_community.vectorstores.faiss import FAISS
from langchain.schema import Document
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings
from langgraph.graph import StateGraph, END
from langchain_core.output_parsers import StrOutputParser

os.environ["TAVILY_API_KEY"] = ""
os.environ["ANTHROPIC_API_KEY"] = ""

class GraphState(TypedDict):
    llm_haiku: ChatAnthropic # Claude 3 "Haiku" model
    llm_sonnet: ChatAnthropic # Claude 3 "Sonnet" model
    llm_opus: ChatAnthropic # Claude 3 "Opus" model
    emb_model : HuggingFaceEmbeddings # Embedding model
    question: str # Question text
    generate_querys: List[str] # Generated (additional) question text
    generate_query_num: int # Number of questions to generate (add)
    integration_question: str # Integrated question text
    transform_question: str # Question text converted to a web search query
    messages: Annotated[Sequence[BaseMessage], operator.add] # Message history
    fusion_documents : List[List[Document]] # Documents retrieved with generated question texts
    documents: List[Document] # Documents ultimately passed to the LLM
    is_search : bool # Necessity of web search

# Generate similar questions from the original question text
def generate_query(state):
    print("\n--- __start__ ---")
    print("--- generate_query ---")
    llm = state["llm_opus"]
    question = state["question"]
    generate_query_num = state["generate_query_num"]
    system_prompt = "あなたは、1つの入力クエリに基づいて複数の検索クエリを生成するアシスタントです。"
    human_prompt = """クエリを作成する際は、元のクエリの意味を大きく変えず一行ずつ出力してください。
    入力クエリ: {question}
    {generate_query_num}つの出力クエリ: 
    """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt)
        ]
    )
    questions_chain = prompt | llm | StrOutputParser() | (lambda x: x.split("\n"))
    generate_querys = questions_chain.invoke(
        {
            "question": question, 
            "generate_query_num": generate_query_num
        }
    )
    generate_querys.insert(0, "0. " + question)
    print("\nオリジナルの質問 + 生成された質問==========================")
    for i, query in enumerate(generate_querys):
        print(f"\n{query}")
    print("\n===========================================================\n")
    return {"generate_querys": generate_querys}

# Retrieve documents from the index using the generated questions (+ original question)
def retrieve(state):
    print("--- retrieve ---")
    emb_model = state['emb_model']   
    generate_querys = state["generate_querys"]
    index = FAISS.load_local(
        folder_path= "./storage", 
        embeddings=emb_model,
        allow_dangerous_deserialization=True
    )
    fusion_documents = []
    for question in generate_querys:
        docs = index.similarity_search(question, k=3)
        fusion_documents.append(docs)
    return {"fusion_documents": fusion_documents}

# Calculate document scores and extract the top ones
def fusion(state):
    print("--- fusion ---")
    fusion_documents = state["fusion_documents"]
    k = 60
    documents = []
    fused_scores = {}
    for docs in fusion_documents:
        for rank, doc in enumerate(docs, start=1):
            if doc.page_content not in fused_scores:
                fused_scores[doc.page_content] = 0
                documents.append(doc)
            fused_scores[doc.page_content] += 1 / (rank + k)
    reranked_results = {doc_str: score for doc_str, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)[:3]}
    print("\n検索上位3つのスコア========================================")
    for i, score in enumerate(reranked_results.values(), start=1):
        print(f"\nドキュメント{i}: {score}")
    print("\n===========================================================\n")
    filterd_documents = []
    for doc in documents:
        if doc.page_content in reranked_results:
            filterd_documents.append(doc) 
    documents = filterd_documents     
    return {"documents": documents}

# Integrate multiple generated questions (+ original question) into a single question
def integration_query(state):
    print("--- integration_query ---")
    llm = state["llm_sonnet"]
    generate_querys = state["generate_querys"]
    system_prompt = """あなたは、入力された複数の質問を1つの質問に統合する質問リライターです。"""
    human_prompt = """統合した1つの質問のみを出力してください。
    複数の質問: {query}
    統合した質問: """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt),
        ]
    )
    integration_chain = prompt | llm | StrOutputParser()
    questions = "\n".join(generate_querys)
    integration_query = integration_chain.invoke({"query": questions})
    print(f"\n統合した質問: {integration_query}\n")
    return {"integration_question": integration_query}

# Evaluate the relevance between the integrated question and the documents
def grade_documents(state):
    print("--- grade_documents ---")
    llm = state["llm_haiku"]
    integration_question = state["integration_question"]
    documents = state["documents"]
    system_prompt = """あなたは、検索された文書とユーザー의 질문과의 관련성을 평가하는 어시스턴트입니다。
文書に質問に関連するキーワードまたはセマンティックな内容を含んでいる場合、あなたはそれを関連性があると評価します。
関連性があれば\"Yes\"、関連性がない場合は\"No\"とだけ答えてください。"""
    human_prompt = """
    
    文書: {context} 
    
    質問: {query}
    関連性(\"Yes\" or \"No\"): """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt),
        ]
    )
    filtered_docs = []
    is_search = False
    grade_chain = prompt | llm | StrOutputParser()
    print("\nドキュメントごとの関連性の評価=============================")
    for doc in documents:
        grade = grade_chain.invoke({"context":doc.page_content, "query": integration_question})
        print(f"\n関連性: {grade}")
        if "Yes" in grade:
            filtered_docs.append(doc)
        else:
            is_search = True
    print("\n===========================================================\n")
    return {"documents": filtered_docs, "is_search": is_search}

# Determine the necessity of web search based on relevance evaluation
def decide_to_generate(state):
    print("--- decide_to_generate ---")
    is_search = state['is_search']
    if is_search == True:
        return "transform_query"
    else:
        return "create_message"

# Convert the integrated question into a web search query
def transform_query(state):
    print("--- transform_query ---")
    llm = state["llm_sonnet"]
    integration_question = state["integration_question"]
    system_prompt = """あなたは、入力された質問をWeb検索に最適化されたクエリに変換するリライターです。"""
    human_prompt = """質問を見て、根本的な意味/意図を推論してWeb検索クエリのみ出力してください。
    質問: {query}
    Web検索クエリ: """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt),
        ]
    )
    transform_chain = prompt | llm | StrOutputParser()
    transform_query = transform_chain.invoke({"query": integration_question})
    print(f"\nWeb検索用クエリ: {transform_query}\n")
    return {"transform_question": transform_query}

# Execute web search and retrieve results
def web_search(state):
    print("--- web_search ---")
    transform_question = state["transform_question"]
    documents = state["documents"]
    retriever = TavilySearchAPIRetriever(k=3) 
    docs = retriever.invoke(transform_question)
    documents.extend(docs)
    return {"documents": documents}

# Create a message to be passed to the LLM
def create_message(state):
    print("--- create_message ---")
    documents = state["documents"]
    question = state["question"]
    system_message = "あなたは常に日本語で回答します。"
    human_message ="""次の「=」で区切られたコンテキストを参照して質問に答えてください。

    {context}

    Question: {query}
    """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_message),
            ("human", human_message),
        ]
    )
    partition = "\n" + "=" * 20 + "\n"
    documents_context = partition.join([doc.page_content for doc in documents])
    messages = prompt.format_messages(context=documents_context, query=question)
    return {"messages": messages}

# Generate an answer from the LLM
def generate(state):
    print("--- generate ---")
    llm = state["llm_sonnet"]
    messages = state["messages"]
    response = llm.invoke(messages)
    print("--- end ---\n")
    return {"messages": [response]}

# Construct the graph and compile it into an executable format
def get_compile_graph():
    graph = StateGraph(GraphState)
    graph.set_entry_point("generate_query")
    graph.add_node("generate_query", generate_query)
    graph.add_edge("generate_query", "retrieve")
    graph.add_node("retrieve", retrieve)
    graph.add_edge("retrieve", "fusion")
    graph.add_node("fusion", fusion)
    graph.add_edge("fusion", "integration_query")
    graph.add_node("integration_query", integration_query)
    graph.add_edge("integration_query", "grade_documents")
    graph.add_node("grade_documents", grade_documents)
    graph.add_conditional_edges(
        "grade_documents",
        decide_to_generate,
        {
            "transform_query": "transform_query",
            "create_message": "create_message"
        },
    )
    graph.add_node("transform_query", transform_query)
    graph.add_edge("transform_query", "web_search")
    graph.add_node("web_search", web_search)
    graph.add_edge("web_search", "create_message")
    graph.add_node("create_message", create_message)
    graph.add_edge("create_message", "generate")
    graph.add_node("generate", generate)
    graph.add_edge("generate", END)

    compile_graph = graph.compile()
    
    return compile_graph

if __name__ == "__main__":
    llm_haiku = ChatAnthropic(model_name="claude-3-haiku-20240307")
    llm_sonnet = ChatAnthropic(model_name="claude-3-sonnet-20240229")
    llm_opus = ChatAnthropic(model_name="claude-3-opus-20240229")

    emb_model = HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-large")

    compile_graph = get_compile_graph()
    
    # Execute the graph and output the result (answer from the LLM)
    output = compile_graph.invoke(
        {
            "llm_haiku": llm_haiku,
            "llm_sonnet": llm_sonnet,
            "llm_opus": llm_opus,
            "emb_model": emb_model, 
            "question": "葬送のフリーレンの勇者パーティーについて教えてください", 
            "generate_query_num": 2
        }
    )
    print("output:")
    print(output["messages"][-1].content)

Among the above, let's review the processing for the "Nodes" and "Conditional Edges".

Generate similar questions from the original question text

rag_fusion_crag.py
# Generate similar questions from the original question text
def generate_query(state):
    print("\n--- __start__ ---")
    print("--- generate_query ---")
    llm = state["llm_opus"]
    question = state["question"]
    generate_query_num = state["generate_query_num"]
    system_prompt = "あなたは、1つの入力クエリに基づいて複数の検索クエリを生成するアシスタントです。"
    human_prompt = """クエリを作成する際は、元のクエリの意味を大きく変えず一行ずつ出力してください。
    入力クエリ: {question}
    {generate_query_num}つの出力クエリ: 
    """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt)
        ]
    )
    questions_chain = prompt | llm | StrOutputParser() | (lambda x: x.split("\n"))
    generate_querys = questions_chain.invoke(
        {
            "question": question, 
            "generate_query_num": generate_query_num
        }
    )
    generate_querys.insert(0, "0. " + question)
    print("\nオリジナルの質問 + 生成された質問==========================")
    for i, query in enumerate(generate_querys):
        print(f"\n{query}")
    print("\n===========================================================\n")
    return {"generate_querys": generate_querys}

This node generates as many similar question texts as the number passed in generate_query_num (2 in this case) during graph execution.

The LCEL chain is configured as follows:

rag_fusion_crag.py
questions_chain = prompt | llm | StrOutputParser() | (lambda x: x.split("\n"))

The prompt | llm | StrOutputParser() part returns the response from the LLM as a string, and (lambda x: x.split("\n")) splits it by newlines to store each line in a list. When executed, it returns a list containing each generated question.

Regarding the prompt, I translated it into Japanese and made some minor adjustments based on the following references. While I used Japanese here (overall) for clarity, it might be better to write it in English when considering cost and accuracy.

https://github.com/Raudaschl/rag-fusion
https://zenn.dev/khisa/articles/ab79ad0a92a117
https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_crag.ipynb?ref=blog.langchain.dev

Retrieve documents from the index using the generated questions (+ original question)

rag_fusion_crag.py
# Retrieve documents from the index using the generated questions (+ original question)
def retrieve(state):
    print("--- retrieve ---")
    emb_model = state['emb_model']   
    generate_querys = state["generate_querys"]
    index = FAISS.load_local(
        folder_path= "./storage", 
        embeddings=emb_model,
        allow_dangerous_deserialization=True
    )
    fusion_documents = []
    for question in generate_querys:
        docs = index.similarity_search(question, k=3)
        fusion_documents.append(docs)
    return {"fusion_documents": fusion_documents}

This node searches the index for each of the original and generated questions (3 in total) and retrieves documents (stored in a list in descending order of similarity).

Calculate document scores and extract the top ones

rag_fusion_crag.py
def fusion(state):
    print("--- fusion ---")
    fusion_documents = state["fusion_documents"]
    k = 60
    documents = []
    fused_scores = {}
    for docs in fusion_documents:
        for rank, doc in enumerate(docs, start=1):
            if doc.page_content not in fused_scores:
                fused_scores[doc.page_content] = 0
                documents.append(doc)
            fused_scores[doc.page_content] += 1 / (rank + k)
    reranked_results = {doc_str: score for doc_str, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)[:3]}
    print("\n検索上位3つのスコア========================================")
    for i, score in enumerate(reranked_results.values(), start=1):
        print(f"\nドキュメント{i}: {score}")
    print("\n===========================================================\n")
    filterd_documents = []
    for doc in documents:
        if doc.page_content in reranked_results:
            filterd_documents.append(doc) 
    documents = filterd_documents     
    return {"documents": documents}

This node calculates the re-ranking (Reciprocal Rank Fusion) and extracts the top 3 documents based on their scores.

The flow of the process above is as follows:

rag_fusion_crag.py
for docs in fusion_documents:
    for rank, doc in enumerate(docs, start=1):
        if doc.page_content not in fused_scores:
            fused_scores[doc.page_content] = 0
            documents.append(doc)
        fused_scores[doc.page_content] += 1 / (rank + k)

It creates a dictionary (fused_scores) where the document text is the key, and the value is the sum of the scores calculated based on the rank in each question. The documents variable is a list of unique documents (from which the top 3 scoring documents will eventually be extracted).

rag_fusion_crag.py
reranked_results = {doc_str: score for doc_str, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)[:3]}

It creates a new dictionary (reranked_results) by sorting fused_scores by score and extracting the top 3.

rag_fusion_crag.py
filterd_documents = []
for doc in documents:
    if doc.page_content in reranked_results:
        filterd_documents.append(doc) 
documents = filterd_documents     

It filters the documents list using reranked_results to result in a list of only the top 3 scoring documents.

Integrate multiple generated questions (+ the original question) into a single question

rag_fusion_crag.py
def integration_query(state):
    print("--- integration_query ---")
    llm = state["llm_sonnet"]
    generate_querys = state["generate_querys"]
    system_prompt = """You are a question rewriter that integrates multiple input questions into one question."""
    human_prompt = """Please output only one integrated question.
    Multiple questions: {query}
    Integrated question: """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt),
        ]
    )
    integration_chain = prompt | llm | StrOutputParser()
    questions = "\n".join(generate_querys)
    integration_query = integration_chain.invoke({"query": questions})
    print(f"\nIntegrated question: {integration_query}\n")
    return {"integration_question": integration_query}

To evaluate the relevance to the documents, multiple questions are integrated into one.
This node was created to connect RAG Fusion and CRAG. There might be other methods, such as evaluating the relevance of each question individually without integrating them.

Evaluate the relevance between the integrated question and the documents

rag_fusion_crag.py
def grade_documents(state):
    print("--- grade_documents ---")
    llm = state["llm_haiku"]
    integration_question = state["integration_question"]
    documents = state["documents"]
    system_prompt = """You are an assistant that evaluates the relevance between retrieved documents and user questions.
If a document contains keywords or semantic content relevant to the question, you evaluate it as relevant.
Please answer only with \"Yes\" if there is relevance, or \"No\" if there is no relevance."""
    human_prompt = """
    
    Document: {context} 
    
    Question: {query}
    Relevance (\"Yes\" or \"No\"): """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt),
        ]
    )
    filtered_docs = []
    is_search = False
    grade_chain = prompt | llm | StrOutputParser()
    print("\nRelevance evaluation for each document=============================")
    for doc in documents:
        grade = grade_chain.invoke({"context":doc.page_content, "query": integration_question})
        print(f"\nRelevance: {grade}")
        if "Yes" in grade:
            filtered_docs.append(doc)
        else:
            is_search = True
    print("\n===========================================================\n")
    return {"documents": filtered_docs, "is_search": is_search}

The relevance between the integrated question and the documents is evaluated in a binary choice of "Yes" or "No".
If there is even one document evaluated as "No" (no relevance to the question), that document is not used as context, and "is_search" is set to "True" so that information can be supplemented later through a web search.

Determine the necessity of a web search based on the relevance evaluation

rag_fusion_crag.py
def decide_to_generate(state):
    print("--- decide_to_generate ---")
    is_search = state['is_search']
    if is_search == True:
        return "transform_query"
    else:
        return "create_message"

Based on "is_search", it decides whether to proceed with preparing for a web search or move directly to creating a message to be passed to the LLM.

Convert the integrated question into a web search query

rag_fusion_crag.py
# Convert the integrated question into a web search query
def transform_query(state):
    print("--- transform_query ---")
    llm = state["llm_sonnet"]
    integration_question = state["integration_question"]
    system_prompt = """You are a rewriter that converts input questions into queries optimized for web search."""
    human_prompt = """Look at the question, infer the fundamental meaning/intent, and output only the web search query.
    Question: {query}
    Web search query: """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", human_prompt),
        ]
    )
    transform_chain = prompt | llm | StrOutputParser()
    transform_query = transform_chain.invoke({"query": integration_question})
    print(f"\nWeb search query: {transform_query}\n")
    return {"transform_question": transform_query}

This is the node called when "is_search" is "True". Before proceeding to the web search, it converts the integrated question text into a query optimized for web search.

Execute web search and retrieve results

rag_fusion_crag.py
# Execute web search and retrieve results
def web_search(state):
    print("--- web_search ---")
    transform_question = state["transform_question"]
    documents = state["documents"]
    retriever = TavilySearchAPIRetriever(k=3) 
    docs = retriever.invoke(transform_question)
    documents.extend(docs)
    return {"documents": documents}

After executing a web search using the web search query created in transform_query, it adds the retrieved documents to the "documents" list (Tavily is used for the web search).

Create a message to be passed to the LLM

rag_fusion_crag.py
# Create a message to be passed to the LLM
def create_message(state):
    print("--- create_message ---")
    documents = state["documents"]
    question = state["question"]
    system_message = "You always answer in Japanese."
    human_message ="""Refer to the context separated by "=" below and answer the question.

    {context}

    Question: {query}
    """
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_message),
            ("human", human_message),
        ]
    )
    partition = "\n" + "=" * 20 + "\n"
    documents_context = partition.join([doc.page_content for doc in documents])
    messages = prompt.format_messages(context=documents_context, query=question)
    return {"messages": messages}

This node creates a message using the documents evaluated as "Yes" in the relevance evaluation plus the documents retrieved via web search (web results are not retrieved if all initial documents were evaluated as "Yes").

Generate an answer from the LLM

rag_fusion_crag.py
# Generate an answer from the LLM
def generate(state):
    print("--- generate ---")
    llm = state["llm_sonnet"]
    messages = state["messages"]
    response = llm.invoke(messages)
    print("--- end ---\n")
    return {"messages": [response]}

This node passes the created message to the LLM to generate the final answer.

Execution Results (Index Search Results + Web Search Results)

Question: Please tell me about the hero's party in Frieren: Beyond Journey's End
Number of similar questions to generate: 2


--- __start__ ---
--- generate_query ---

Original Question + Generated Questions==========================

0. Please tell me about the hero's party in Frieren: Beyond Journey's End

1. Please tell me in detail about the story of Frieren: Beyond Journey's End

2. Please introduce the main characters appearing in Frieren: Beyond Journey's End

===========================================================

--- retrieve ---
--- fusion ---

Top 3 Search Scores========================================

Document 1: 0.04918032786885246

Document 2: 0.03200204813108039

Document 3: 0.016129032258064516

===========================================================

--- integration_query ---

Integrated Question: Please tell me in detail about the overview of the story of Frieren: Beyond Journey's End, the main characters and their roles, and the activities of the hero's party.

--- grade_documents ---

Relevance Evaluation for Each Document=============================

Relevance: Yes

Relevance: Yes

Relevance: No

===========================================================

--- decide_to_generate ---
--- transform_query ---

Web Search Query: Frieren novel synopsis characters hero party

--- web_search ---
--- create_message ---
--- generate ---
--- end ---

output:
The hero's party in "Frieren: Beyond Journey's End" consists of the following four members:

- Himmel (Hero)
- Frieren (Mage, Elf)
- Eisen (Warrior)
- Heiter (Priest)

They are comrades who achieved the great feat of defeating the Demon King. However, due to the difference in lifespan between humans and elves, Frieren, as a long-lived elf, gradually has to see off her human companions.

The story begins when the hero's party returns in triumph to the Royal Capital after defeating the Demon King, and they temporarily disband after viewing the "Era Meteors" that occur once every 50 years. Later, as Frieren travels while reminiscing about her companions, she notices new mysteries regarding their past, leading her to form a party once again and set out on an adventurous journey.

The intermediate processes of "generating similar questions," "calculating scores," "integrating questions," "judging relevance," and "creating web search queries" seem to have worked well.
Regarding the output (though it's hard to judge), it seems to introduce the story related to the hero's party, perhaps because there was a similar question about the "story" (I wonder what the "new mystery" is...).

Execution Results (Index Search Results Only)

Question: Please tell me about Himmel from Frieren: Beyond Journey's End
Number of similar questions to generate: 2

--- __start__ ---
--- generate_query ---

Original Question + Generated Questions==========================

0. Please tell me about Himmel from Frieren: Beyond Journey's End

1. About the character profile of Himmel appearing in Frieren: Beyond Journey's End

2. Tell me the characteristics of the character Himmel from the anime "Frieren: Beyond Journey's End"

===========================================================

--- retrieve ---
--- fusion ---

Top 3 Search Scores========================================

Document 1: 0.03278688524590164

Document 2: 0.032266458495966696

Document 3: 0.03225806451612903

===========================================================

--- integration_query ---

Integrated Question: Please tell me about the character profile and characteristics of Himmel appearing in the anime "Frieren: Beyond Journey's End".

--- grade_documents ---

Relevance evaluation for each document=============================

Relevance: Yes

Relevance: Yes

Relevance: Yes

===========================================================

--- decide_to_generate ---
--- create_message ---
--- generate ---
--- end ---

output:
From the context, the following can be understood:

- 50 years later, the elderly Himmel and Frieren, along with party members Heiter and Eisen, reunited and watched the meteor shower together for the last time.

- Shortly after, Himmel passed away.

- At Himmel's funeral, Frieren realized she knew almost nothing about Himmel and hadn't tried to know him, and she shed tears of sadness.

- Frieren wanted to know more about Himmel and set out on a journey to understand humans and collect magic.

In other words, after Himmel's death, Frieren regretted that she knew almost nothing about him and felt a strong desire to know more. Himmel was respected by his companions, including Frieren, as a hero, but Frieren grieved that she didn't know his personality well enough. Therefore, it is depicted that Frieren set out on a new journey to learn about Himmel, deepen her understanding of humans, and collect more magic.

In the example above, the answer is generated by referring only to documents from the index (local materials).

Execution Results (Web Search Results Only)

Question: Please tell me about the protagonist of the anime "Girls Band Cry"
Number of similar questions to generate: 2

Note: Since the anime "Girls Band Cry" is a recent one, Claude 3 (likely) hasn't learned it yet.

--- __start__ ---
--- generate_query ---

Original Question + Generated Questions==========================

0. Please tell me about the protagonist of the anime "Girls Band Cry"

1. Please tell me about the personality and characteristics of the protagonist of Girls Band Cry

2. I want to know about the name and role of the protagonist of the anime work "Girls Band Cry"

===========================================================

--- retrieve ---
--- fusion ---

Top 3 Search Scores========================================

Document 1: 0.048915917503966164

Document 2: 0.048651507139079855

Document 3: 0.015873015873015872

===========================================================

--- integration_query ---

Integrated Question: Please tell me about the name, role, personality, and characteristic aspects of the protagonist of the anime "Girls Band Cry".

--- grade_documents ---

Relevance Evaluation for Each Document=============================

Relevance: No

Relevance: No

Relevance: No

===========================================================

--- decide_to_generate ---
--- transform_query ---

Web Search Query: Girls Band Cry protagonist name role personality characteristics

--- web_search ---
--- create_message ---
--- generate ---
--- end ---

output:
Detailed information about the protagonist of the anime "Girls Band Cry" is not provided, but the following can be understood from the context:

- "Girls Band Cry" is a television anime work produced by Toei Animation.
- Broadcasting started on April 6, 2024.
- The story features at least characters named Nina and Momoka, who are involved in band activities.
- There seems to be a disagreement between Nina and Momoka regarding their feelings toward the band.

It is not specifically clear which of these two is the protagonist, but the center of the story is likely these girls doing band activities, and among them, Nina and Momoka are depicted as the main characters. The exact name of the protagonist cannot be identified from the presented context.

The protagonist hasn't been identified, but there seem to be no errors in the description (as verified below).
https://ja.wikipedia.org/wiki/ガールズバンドクライ

Conclusion

This time, I used Claude 3 "Sonnet" for the final answer. If you have a budget to spare, I recommend using the more accurate "Opus" (I used "Sonnet" to keep costs down...).
While it's difficult to clearly confirm the benefits of combining RAG Fusion and CRAG from the execution results, the processes of "generating similar questions," "calculating scores," "integrating questions," "judging relevance," and "creating web search queries" seem to be working well.
The code is a bit long, but since it just connects RAG Fusion and CRAG (with only one branch and no cycles), I think the implementation is quite simple. If you are interested, please give it a try.

I hope to see you again soon.

References

https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
https://github.com/Raudaschl/rag-fusion
https://zenn.dev/khisa/articles/ab79ad0a92a117
https://qiita.com/isanakamishiro2/items/552372c730f47f1ec53c
https://arxiv.org/pdf/2401.15884
https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_crag.ipynb?ref=blog.langchain.dev
https://qiita.com/isanakamishiro2/items/f4387443b86723eecf36

Discussion