👋

RAG From Scratch をやってみた (1/6) : Query Transformations

2025/03/07に公開

LangChain

RAG

tech

概要

LangChain公式リポジトリに、以下のRAGの各要素に関する解説コードがあります。（※1年ほど前に公開された内容です。）

今回はRAGの構成のなかのQuery Transformationsに関する部分を読み解いてみた内容です。

Query Transformations（クエリ変換）は、検索用にユーザ質問文を書き直したり、修正したりすることに焦点を当てた一連のアプローチです。このcookbookでは、以下の5手法が解説されています。

Multi Query
RAG Fusion
Decomposition
StepBack
HyDE

こちらのnotebookを実行してみながらまとめました。
https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb

実行環境

必要なライブラリをインストール

.ipynbファイル

! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain

環境変数に、OpenAI APIの認証情報等をセットします。（今回はAzure OpenAIのAPIを利用しました。）

環境変数の読み込み

.ipynbファイル
from IPython import display # 結果を見やすくするライブラリインポート
import os
from dotenv import load_dotenv
load_dotenv() # 環境変数を読み込み
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

.env.template

# Azure OpenAI Service のAPI情報をセットする
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_API_KEY=

# Azureのデプロイメント名をセット
DEPLOYMENT_NAME=
# APIのバージョンをセット
API_VERSION=

# Azureのembedding modelのデプロイメント名をセット
EMBE_DEPLOYMENT_NAME=
# embedding modelのAPIバージョンをセット
EMBE_API_VERSION=

# LangSmithのAPIKEYをセット（ない場合も実行可）
LANGCHAIN_API_KEY=

LLMにはgpt-4o-mini, 埋め込みモデルはtext-embedding-3-largeを使用しました。

（検証のベースライン）シンプルなRAGの回答生成

検証の比較として、クエリ変換をしない場合のシンプルなRAG構成で回答生成をしてみます。
以下のAgentに関するpostを読み込み、ベクトルデータベース（ここではChroma DB）を作成します。
https://lilianweng.github.io/posts/2023-06-23-agent/
そのデータベースに対して「LLMを搭載した自律エージェントシステムの主な構成要素は？」と問い合わせをしています。

RAGコード

import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_core.prompts import PromptTemplate

#### INDEXING ####

# Load Documents
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
splits = text_splitter.split_documents(docs)

# Embed
embeddings = AzureOpenAIEmbeddings(
    azure_deployment=os.environ.get("EMBE_DEPLOYMENT_NAME"),
    openai_api_version=os.environ.get("EMBE_API_VERSION"),
)
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=embeddings)

retriever = vectorstore.as_retriever()

#### RETRIEVAL and GENERATION ####

# Prompt
# prompt = hub.pull("rlm/rag-prompt")
# cookbookでは上記のようにprompt hubから取得しています。
# ここでは同じ内容を日本語訳したものを使用します。
prompt = PromptTemplate(template="""
あなたは質問応答のアシスタントです。質問に答えるために、検索された文脈の以下の部分を使用してください。
答えがわからない場合は、わからないと答えましょう。
最大3つのセンテンスを使い、簡潔に答えましょう。
質問: {question} 
コンテキスト: {context}
答え:
""", input_variables=["context", "question"])

# LLM
llm = AzureChatOpenAI(
    openai_api_version=os.getenv("API_VERSION"),
    azure_deployment=os.getenv("DEPLOYMENT_NAME"),
    temperature=0
)

# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Question
rag_chain.invoke("LLMを搭載した自律エージェントシステムの主な構成要素は？")

生成された結果は以下でした。

LLMを搭載した自律エージェントシステムの主な構成要素は、計画とメモリです。
計画では、エージェントが大きなタスクを小さなサブゴールに分解し、自己反省を通じて過去の行動から学びます。
メモリは、有限のコンテキスト長に制約されており、過去の情報や指示を効果的に扱うことが課題となっています。

Multi Query

Multi Queryは、入力されたユーザの質問文から類似する質問文を複数生成してそれぞれ関連文書を検索し、検索結果を統合して最終回答を生成します。
ざっくり言い換えると、聞き方を変えたら相手に意図が伝わって回答してくれるようになった、という効果を期待するような手法です。

ベクトルストアはシンプルなRAGの回答生成と同じものを使用します。
まずはユーザ質問文から類似質問を５つ生成するgenerate_queriesを作成します。

類似質問生成コード

from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """
あなたはAI言語モデルアシスタントです。
あなたの仕事は、ベクトル・データベースから関連文書を検索するために、与えられたユーザーの質問に対して5つの異なるバージョンを生成することです。
ユーザの質問に対する複数の視点を生成することで、あなたのゴールは、ユーザが距離ベースの類似検索の制限のいくつかを克服するのを助けることです。
改行で区切られたこれらの代替の質問を提供してください。
元の質問 :{question}
"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
    openai_api_version=os.getenv("API_VERSION"),
    azure_deployment=os.getenv("DEPLOYMENT_NAME"),
    temperature=0
)
generate_queries = (
    prompt_perspectives 
    | llm 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)
# 類似質問の生成確認
generate_queries.invoke({"question":question})

questionに先ほどと同じユーザ質問文"LLMを搭載した自律エージェントシステムの主な構成要素は？"を渡すと、以下のような５つの類似質問文が生成されます。

['- 自律エージェントシステムにおけるLLMの主要な構成要素は何ですか？',
 '- LLMを利用した自律エージェントシステムの基本的な要素はどのようなものですか？',
 '- 自律エージェントシステムにおけるLLMの重要なコンポーネントは何ですか？',
 '- LLMを搭載した自律エージェントシステムの核心的な構成要素は何でしょうか？',
 '- 自律エージェントシステムにおけるLLMの役割とその構成要素について教えてください。']

次にこの類似質問文を使用して関連文章をそれぞれ取得する一連の処理の流れretrieval_chainを作成します。

類似質問を使って関連文書を取得するコードとその実行結果

from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
print(docs)

合計11個の関連文書が取得されました。(表示は3文書まで)

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 13. The generative agent architecture. (Image source: Park et al. 2023)\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\nProof-of-Concept Examples#\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\nYou are {{ai-name}}, {{user-provided AI bot description}}.\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\n\nGOALS:\n\n1. {{user-provided goal 1}}\n2. {{user-provided goal 2}}\n3. ...\n4. ...\n5. ...'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='The ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\nThought: ...\nAction: ...\nObservation: ...\n... (Repeated many times)'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.\n\n\nReliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.\n\n\nCitation#\nCited as:\n\nWeng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.'),
…(略)]

作成したretrieval_chainを使って、元のユーザ質問文から類似質問文の生成・関連文書取得・回答生成の一連の処理は以下となります。

元のユーザ質問文から類似質問文の生成・関連文書を取得して回答生成するコード

from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """以下の文脈のみに基づいて質問に答えてください:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

生成結果は以下となりました。

LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです：
1. **計画 (Planning)** - 複雑なタスクを把握し、事前に計画を立てる。
2. **サブゴールと分解 (Subgoal and decomposition)** - 大きなタスクを小さく管理しやすいサブゴールに分解する。
3. **反省と洗練 (Reflection and refinement)** - 過去の行動を自己批判し、学びを得て、将来のステップを改善する。
4. **メモリ (Memory)** - 過去の情報を保持し、エージェントの行動に活用する。

これらの要素が組み合わさることで、エージェントは効率的に複雑なタスクを処理できるようになります。

RAG-Fusion

RAG-Fusionは、ユーザ質問文から類似質問文を生成してそれぞれ関連文書取得するまではMulti Queryと同じで、最終回答生成の前に関連度の高い文書順に並び変える（リランキング）処理が入ります。

ベクトルストアはシンプルなRAGの回答生成と同じものを使用します。
ユーザ質問文から類似質問を生成するgenerate_queriesもMulti Queryとほぼ同様のものを使用します。
（※ 生成する質問文の数が5→4になり、プロンプトもシンプルになっています。）

generate_queriesの作成

from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# RAG-Fusion: Related
template = """
あなたは、1つの入力クエリに基づいて複数の検索クエリを生成する便利なアシスタントです。\n
に関連する複数の検索クエリを生成します： {question} \n
出力（4つのクエリ）：
"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_rag_fusion 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

類似質問文を使用して関連文章をそれぞれ取得し、関連文書をリランキングする一連の処理の流れretrieval_chain_rag_fusionを作成します。

関連文書を取得し、RRF（リランキング）を行うコードと実行結果

from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ 複数のランク付けされたドキュメントのリストと、RRF式で使用されるオプションのパラメータkを受け取る関数 """
    
    # 融合されたスコアを保持する辞書を初期化
    fused_scores = {}

    # 各ランク付けされたドキュメントのリストを反復処理
    for docs in results:
        # リスト内の各ドキュメントとそのランク（リスト内の位置）を反復処理
        for rank, doc in enumerate(docs):
            # ドキュメントを文字列形式に変換し、キーとして使用（ドキュメントはJSONとしてシリアライズ可能であると仮定）
            doc_str = dumps(doc)
            # ドキュメントがまだfused_scores辞書に存在しない場合、スコア0で追加
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # ドキュメントの現在のスコアを取得
            previous_score = fused_scores[doc_str]
            # RRF式を使用してドキュメントのスコアを更新: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # ドキュメントをその融合されたスコアに基づいて降順にソートし、最終的な再ランク付け結果を取得
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # 再ランク付けされた結果を、各ドキュメントとその融合スコアを含むタプルのリストとして返す
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
print(docs)

[(Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory'),
  0.06666666666666667),
 (Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.\n\n\nReliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.\n\n\nCitation#\nCited as:\n\nWeng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.'),
  0.06557377049180328),
 (Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'),
  0.06426011264720942),
 (Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 13. The generative agent architecture. (Image source: Park et al. 2023)\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\nProof-of-Concept Examples#\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\nYou are {{ai-name}}, {{user-provided AI bot description}}.\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\n\nGOALS:\n\n1. {{user-provided goal 1}}\n2. {{user-provided goal 2}}\n3. ...\n4. ...\n5. ...'),
  0.06374807987711213)]

取得した関連文書にRRF式で計算されたスコアの高い順にソートされて出力されました。

作成したretrieval_chain_rag_fusionを使った回答生成までの一連の処理は以下となります。

RAG Fusion 回答生成までの一連の処理

from langchain_core.runnables import RunnablePassthrough

# RAG
template = """以下の文脈のみに基づいて質問に答えてください:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

response = final_rag_chain.invoke({"question":question})

生成結果は以下となりました。

LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです：

1. **計画 (Planning)** - 複雑なタスクを管理可能なサブゴールに分解し、効率的に処理する。
2. **サブゴールと分解 (Subgoal and decomposition)** - 大きなタスクを小さなタスクに分解することで、複雑なタスクを効率的に扱う。
3. **反省と洗練 (Reflection and refinement)** - 過去の行動を自己批判し、学びを得て、将来のステップを改善することで、最終結果の質を向上させる。
4. **メモリ (Memory)** - 過去の情報を保持し、タスクの遂行に役立てる。

これらの要素が組み合わさることで、LLMは強力な一般的問題解決者として機能します。

Decomposition

Decompositionでは、入力されたユーザ質問をサブ質問に分解して関連文書を取得し、サブ質問と関連文書のペアを参照して最終回答を生成します。

サブ質問に分解するコード

from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """あなたは、入力された質問に関連する複数のサブ質問を生成する親切なアシスタントです。\n
入力された問題を、単独で答えられるような小問題／小問題のセットに分解することが目標です。\n
次の質問に関連する複数の検索クエリを生成します:{question}\n
出力（3つのクエリ）："""
prompt_decomposition = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser

# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "LLMを搭載した自律エージェントシステムの主な構成要素は？"
questions = generate_queries_decomposition.invoke({"question":question})
print(questions)

上記コードを使用して、ユーザ質問文「LLMを搭載した自律エージェントシステムの主な構成要素は？」を以下のサブ質問に分解しています。

['1. LLMを搭載した自律エージェントシステムの基本的な構成要素は何ですか？',
 '2. 自律エージェントシステムにおけるLLMの役割はどのようなものですか？',
 '3. 自律エージェントシステムの設計において考慮すべき技術的要素は何ですか？']

cookbookではサブ質問と関連文書のペアを使用して1.再帰的に回答生成するパターンと、2.個別に回答を生成して最後に統合するパターンが紹介されています。

1.再帰的な回答生成

以下の論文が着想元のようです:

再帰的に回答生成するコード

# Prompt
template = """回答すべき質問は以下の通りです：

\n --- \n {question} \n --- \n

利用可能な背景となる質問 + 回答ペアは以下の通りです：

\n --- \n {q_a_pairs} \n --- \n

質問に関連する追加のコンテキストは以下の通りです：

\n --- \n {context} \n --- \n

上記のコンテキストと背景となる質問 + 回答ペアを使用して、質問に答えてください： \n {question}
"""
decomposition_prompt = ChatPromptTemplate.from_template(template)

from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

q_a_pairs = ""
for q in questions:
    
    rag_chain = (
    {"context": itemgetter("question") | retriever, 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair
print(answer)

生成された結果は以下です。

自律エージェントシステムの設計において考慮すべき技術的要素は以下の通りです：

1. **計画とタスク分解**:
   - エージェントは大きなタスクを小さなサブゴールに分解し、効率的に処理するための計画を立てる必要があります。これにより、複雑な問題を段階的に解決することが可能になります。

2. **メモリ管理**:
   - 過去の情報を保持し、タスクの遂行に役立てるためのメモリ機能が重要です。長期的なメモリ管理は、エージェントが過去の経験から学び、将来の行動を改善するために不可欠です。

3. **自然言語インターフェース**:
   - LLMと外部コンポーネント（メモリやツールなど）との間のインターフェースとして自然言語を使用することが求められます。このインターフェースの信頼性は、エージェントのパフォーマンスに大きく影響します。

4. **自己反省と学習**:
   - エージェントは過去の行動を自己批判し、反省することで、間違いから学び、将来の行動を改善する能力を持つべきです。これにより、エージェントのパフォーマンスが向上します。

5. **長期計画と柔軟性**:
   - 長期的な計画を立てる能力と、予期しないエラーに対処する柔軟性が必要です。エージェントは、試行錯誤を通じて学ぶ能力を持つことで、よりロバストなシステムとなります。

6. **信頼性とエラー処理**:
   - 自然言語インターフェースの信頼性を確保し、エラーが発生した際の処理方法を設計することが重要です。エージェントは、フォーマットエラーや指示に従わない場合に適切に対処できる必要があります。

これらの要素を考慮することで、自律エージェントシステムはより効果的に機能し、複雑なタスクを遂行する能力を高めることができます。

LangSmith上で、サブ質問＋関連文書の回答がどのように生成されているか？見てみます。

LangSmithのトレース結果

「1. LLMを搭載した自律エージェントシステムの基本的な構成要素は何ですか？」

「2. 自律エージェントシステムにおけるLLMの役割はどのようなものですか？」

「3. 自律エージェントシステムの設計において考慮すべき技術的要素は何ですか？」

※ 画面が見切れていますが、Outputは上記の最終出力結果となります。

Q1時点では背景となるQAペアの項目が空欄ですが、Q2以降では1つ前のQAをプロンプトに含めたうえで回答生成されていることがわかります。

2.個別に回答を生成して最後に統合するパターン

個別に生成した回答を統合して最終回答を生成するコード

# Answer each sub-question individually 

from langchain import hub
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# RAG prompt
prompt_rag = PromptTemplate(template="""
あなたは質問応答のアシスタントです。質問に答えるために、検索された文脈の以下の部分を使用してください。
答えがわからない場合は、わからないと答えましょう。
最大3つのセンテンスを使い、簡潔に答えましょう。
質問: {question} 
コンテキスト: {context}
答え:
""", input_variables=["context", "question"])

def retrieve_and_rag(question,prompt_rag,sub_question_generator_chain):
    """各サブ質問に対してRAG（Retrieval-Augmented Generation）を実行"""
    
    # 分解を使用してサブ質問を生成
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # RAGチェーンの結果を保持するリストを初期化
    rag_results = []
    
    for sub_question in sub_questions:
        
        # 各サブ質問に対して関連するドキュメントを取得
        retrieved_docs = retriever.get_relevant_documents(sub_question)
        
        # 取得したドキュメントとサブ質問を使ってRAGチェーンを実行
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, 
                                                                "question": sub_question})
        rag_results.append(answer)
    
    return rag_results,sub_questions

# 取得とRAG処理をRunnableLambdaにラップして、チェーンに統合
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

questions（サブ質問）, answers（サブ回答）は以下が生成されました。

`questions`	`answers`
1. LLMを搭載した自律エージェントシステムの基本的な構成要素は何ですか？	`LLMを搭載した自律エージェントシステムの基本的な構成要素には、計画、サブゴールの分解、自己反省と改善が含まれます。エージェントは大きなタスクを小さな管理可能なサブゴールに分解し、過去の行動から学び、結果の質を向上させることができます。これにより、複雑なタスクを効率的に処理することが可能になります。`
2. 自律エージェントシステムにおけるLLMの役割はどのようなものですか？	自律エージェントシステムにおけるLLMの役割は、エージェントの「脳」として機能し、計画、サブゴールの分解、自己反省などを通じて複雑なタスクを効率的に処理することです。LLMは大きなタスクを小さな管理可能なサブゴールに分解し、過去の行動から学び、結果の質を向上させる能力を持っています。これにより、LLMは強力な一般的問題解決者としての可能性を持っています。
3. 自律エージェントシステムの設計において考慮すべき技術的要素は何ですか？	自律エージェントシステムの設計において考慮すべき技術的要素には、計画とサブゴールの分解、自己反省と改善の能力、そして自然言語インターフェースの信頼性が含まれます。特に、複雑なタスクを効率的に処理するために、エージェントは大きなタスクを小さなサブゴールに分解する必要があります。また、過去の行動から学び、将来のステップを改善するための自己批判も重要です。

このQAペアを用いて、最終回答を合成します。

コード

def format_qa_pairs(questions, answers):
    """QとAのペアをフォーマットする"""
    
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"質問 {i}: {question}\n回答 {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# プロンプト
template = """以下はQ+Aペアのセットです：

{context}

これらを使用して、質問に対する答えを合成してください: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

answer = final_rag_chain.invoke({"context":context,"question":question})
print(answer)

最終回答は以下が生成されました。

LLMを搭載した自律エージェントシステムの主な構成要素には、計画、サブゴールの分解、自己反省と改善が含まれます。
エージェントは大きなタスクを小さな管理可能なサブゴールに分解し、過去の行動から学ぶことで結果の質を向上させる能力を持っています。
これにより、複雑なタスクを効率的に処理することが可能となります。
また、自然言語インターフェースの信頼性も重要な要素です。
これらの構成要素が相互に作用することで、エージェントは強力な問題解決者として機能します。

Step Back

Step Backとは、ユーザ質問文をより一般的な（抽象化した）質問へ変換し、元の質問文と一般化した質問文のそれぞれ関連する文書を取得し、最終回答を生成するアプローチです。

以下の論文で提案されたStep Back Promptingに着想されたものです。
TAKE A STEP BACK: EVOKING REASONING VIA ABSTRACTION IN LARGE LANGUAGE MODELS

ユーザ質問文をStep Back（抽象化）するコード

# 少数ショットの例
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
examples = [
    {
        "input": "The Policeのメンバーは合法的な逮捕ができるか？",
        "output": "The Policeのメンバーは何ができるか？",
    },
    {
        "input": "Jan Sindelはどの国で生まれたか？",
        "output": "Jan Sindelの個人的な歴史は何か？",
    },
]
# これらを例のメッセージに変換します
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """あなたは世界知識の専門家です。あなたの仕事は、質問をもう少し一般的な形に言い換えることで、より答えやすくすることです。以下はそのいくつかの例です:""",
        ),
        # 少数ショットの例
        few_shot_prompt,
        # 新しい質問
        ("user", "{question}"),
    ]
)
generate_queries_step_back.invoke({"question": question})

これまでの例と同じく、ユーザ質問は「LLMを搭載した自律エージェントシステムの主な構成要素は？」を使用します。Step Back後の質問文は以下です。

自律エージェントシステムの主要な構成要素は何か？

Step Backを用いた最終回答生成コード

# プロンプト 
response_prompt_template = """あなたは世界知識の専門家です。これから質問をします。あなたの回答は、以下の文脈と矛盾しないように包括的であるべきです。それらが関連している場合に限り文脈を考慮し、そうでない場合は無視してください。

# {normal_context}
# {step_back_context}

# 元の質問: {question}
# 回答:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        # 通常の質問を使用して文脈を取得
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # ステップバック質問を使用して文脈を取得
        "step_back_context": generate_queries_step_back | retriever,
        # 質問をそのまま渡す
        "question": lambda x: x["question"],
    }
    | response_prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke({"question": question})

以下の最終回答が生成されました。

LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです：

**計画 (Planning)**: エージェントは複雑なタスクを小さな管理可能なサブゴールに分解し、効率的にタスクを処理します。

**サブゴールと分解 (Subgoal and Decomposition)**: 大きなタスクを小さな部分に分けることで、複雑なタスクを効率的に扱うことが可能になります。

**反省と洗練 (Reflection and Refinement)**: 過去の行動を自己批判し、反省することで、エージェントはミスから学び、将来のステップを改善します。

**記憶 (Memory)**: エージェントは過去の観察やイベントを記憶し、それを基に行動を決定します。

**自然言語インターフェース (Natural Language Interface)**: LLMと外部コンポーネント（メモリやツールなど）との間のインターフェースとして自然言語を使用しますが、その信頼性には課題があります。

これらの要素が組み合わさることで、LLMを中心とした自律エージェントが機能します。

HyDE

HyDE（Hypothetical Document Embeddings, 直訳すると"仮説的文書埋め込み"）では、ユーザ質問文に対する仮の回答例を一度LLMに生成させ、その回答例に関連する文書を取得して回答生成します。

これまでのクエリ変換アプローチでは、基本的に「質問」と「回答（になりそうな文書）」の類似性を比較していましたが、HyDEではで「（仮の）回答」と「回答（になりそうな文書）」の比較となることがポイントです。（概念図で説明すると、元の質問文よりも検索対象の文書とベクトル的な距離を近づけるイメージがです。）

以下の論文で提案された手法です。
Precise Zero-Shot Dense Retrieval without Relevance Labels

HyDE文書生成コード

from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# HyDE文書生成
template = """質問に答えるための科学論文の一節を書いてください
質問: {question}
一節:"""
prompt_hyde = ChatPromptTemplate.from_template(template)

generate_docs_for_retrieval = (
    prompt_hyde | llm | StrOutputParser() 
)

# 実行
question = "LLMを搭載した自律エージェントシステムの主な構成要素は？"
answer = generate_docs_for_retrieval.invoke({"question":question})

生成結果は以下となります。

自律エージェントシステムにおける大規模言語モデル（LLM）の主な構成要素は、以下のように分類される。まず第一に、知識ベースが挙げられる。これは、エージェントが参照する情報源やデータセットであり、LLMが効果的に応答を生成するための基盤を提供する。次に、自然言語処理（NLP）モジュールが重要である。これにより、エージェントはユーザーからの入力を理解し、適切な出力を生成する能力を持つ。さらに、意思決定アルゴリズムが必要であり、これはエージェントが環境に応じて行動を選択するためのロジックを提供する。最後に、インターフェース層が存在し、ユーザーとのインタラクションを円滑に行うための手段を提供する。これらの構成要素が相互に連携することで、LLMを搭載した自律エージェントシステムは、複雑なタスクを遂行し、ユーザーのニーズに応じた応答を生成する能力を持つ。

この仮の回答に近い文書を取得します。

関連文書取得コードと取得した文書の中身

# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever 
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.\n\n\nReliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.\n\n\nCitation#\nCited as:\n\nWeng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 13. The generative agent architecture. (Image source: Park et al. 2023)\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\nProof-of-Concept Examples#\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\nYou are {{ai-name}}, {{user-provided AI bot description}}.\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\n\nGOALS:\n\n1. {{user-provided goal 1}}\n2. {{user-provided goal 2}}\n3. ...\n4. ...\n5. ...')]

HyDEに関連する文書を使って最終回答生成するコード

# RAG
template = """以下の文脈のみに基づいて質問に答えてください:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

answer = final_rag_chain.invoke({"context":retireved_docs,"question":question})

最終回答は以下となりました。

LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです：

1. **計画 (Planning)** - 複雑なタスクを管理可能なサブゴールに分解し、効率的に処理する。
2. **サブゴールと分解 (Subgoal and decomposition)** - 大きなタスクを小さなタスクに分解することで、複雑なタスクを効率的に扱う。
3. **反省と洗練 (Reflection and refinement)** - 過去の行動を自己批判し、学びを得て、将来のステップを改善することで、最終結果の質を向上させる。
4. **メモリ (Memory)** - 過去の情報を保持し、学習に活用する。

これらの要素が組み合わさることで、LLMはエージェントの「脳」として機能します。

まとめ

最後に各クエリ変換のアプローチの出力を見比べてみます。

手法	出力
（比較用）シンプルなRAG	LLMを搭載した自律エージェントシステムの主な構成要素は、計画とメモリです。計画では、エージェントが大きなタスクを小さなサブゴールに分解し、自己反省を通じて過去の行動から学びます。メモリは、有限のコンテキスト長に制約されており、過去の情報や指示を効果的に扱うことが課題となっています。
Multi Query	LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです： 1. 計画 (Planning) - 複雑なタスクを把握し、事前に計画を立てる。 2. サブゴールと分解 (Subgoal and decomposition) - 大きなタスクを小さく管理しやすいサブゴールに分解する。 3. 反省と洗練 (Reflection and refinement) - 過去の行動を自己批判し、学びを得て、将来のステップを改善する。 4. メモリ (Memory) - 過去の情報を保持し、エージェントの行動に活用する。これらの要素が組み合わさることで、エージェントは効率的に複雑なタスクを処理できるようになります。
RAG-Fusion	LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです： 1. 計画 (Planning) - 複雑なタスクを管理可能なサブゴールに分解し、効率的に処理する。 2. サブゴールと分解 (Subgoal and decomposition) - 大きなタスクを小さなタスクに分解することで、複雑なタスクを効率的に扱う。 3. 反省と洗練 (Reflection and refinement) - 過去の行動を自己批判し、学びを得て、将来のステップを改善することで、最終結果の質を向上させる。 4. メモリ (Memory) - 過去の情報を保持し、タスクの遂行に役立てる。これらの要素が組み合わさることで、LLMは強力な一般的問題解決者として機能します。
Decomposition 1.再帰的に回答生成	自律エージェントシステムの設計において考慮すべき技術的要素は以下の通りです： 1. 計画とタスク分解: - エージェントは大きなタスクを小さなサブゴールに分解し、効率的に処理する能力が必要です。これにより、複雑な問題を管理可能なステップに分けることができます。 2. メモリ管理: - 過去の情報を保持し、タスクの遂行に役立てるためのメモリ機能が重要です。長期的なメモリ管理は、エージェントが過去の経験から学び、より良い判断を下すために不可欠です。 3. 自然言語インターフェース: - LLMと外部コンポーネント（メモリやツールなど）との間で自然言語を使用してコミュニケーションを行うため、インターフェースの設計が重要です。信頼性の高い出力を確保するための工夫が求められます。 4. 自己反省と学習: - エージェントは過去の行動を自己批判し、反省することで、間違いから学び、将来の行動を改善する能力が必要です。これにより、エージェントはより良い結果を出すことができます。 5. 信頼性とロバスト性: - 自然言語インターフェースの信頼性や、予期しないエラーに対する計画の調整能力が求められます。エージェントは、試行錯誤から学ぶ能力を持つことで、より人間に近い柔軟性を持つことができます。 6. 制約の管理: - 有限のコンテキスト長や情報の制約を考慮し、システムが効果的に機能するように設計する必要があります。これには、情報の取得や処理の効率化が含まれます。これらの要素を考慮することで、自律エージェントシステムはより効果的に機能し、複雑なタスクを遂行する能力を高めることができます。
Decomposition 2. 個別に回答生成	LLMを搭載した自律エージェントシステムの主な構成要素には、計画、サブゴールの分解、自己反省と改善が含まれます。エージェントは大きなタスクを小さな管理可能なサブゴールに分解し、過去の行動から学ぶことで結果の質を向上させることができます。これにより、複雑なタスクを効率的に処理する能力を持ち、全体的な問題解決能力を高めることが可能になります。
Step Back	LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです： 1. 計画 (Planning): エージェントは複雑なタスクを小さな管理可能なサブゴールに分解し、効率的にタスクを処理します。 2. サブゴールと分解 (Subgoal and Decomposition): 大きなタスクを小さな部分に分けることで、複雑なタスクを効率的に扱うことが可能になります。 3. 反省と洗練 (Reflection and Refinement): 過去の行動を自己批判し、反省することで、エージェントはミスから学び、将来のステップを改善します。 4. 記憶 (Memory): エージェントは過去の観察やイベントを記憶し、それを基に行動を決定します。 5. 自然言語インターフェース (Natural Language Interface): LLMと外部コンポーネント（メモリやツールなど）との間のインターフェースとして自然言語を使用しますが、その信頼性には課題があります。これらの要素が組み合わさることで、LLMを中心とした自律エージェントが機能します。
HyDE	LLMを搭載した自律エージェントシステムの主な構成要素は以下の通りです： 1. 計画 (Planning) - 複雑なタスクを管理可能なサブゴールに分解し、効率的に処理する。 2. サブゴールと分解 (Subgoal and decomposition) - 大きなタスクを小さなタスクに分解することで、複雑なタスクを効率的に扱う。 3. 反省と洗練 (Reflection and refinement) - 過去の行動を自己批判し、学びを得て、将来のステップを改善することで、最終結果の質を向上させる。 4. メモリ (Memory) - 過去の情報を保持し、学習に活用する。これらの要素が組み合わさることで、LLMはエージェントの「脳」として機能します。

今回の検証では、「Decomposition1.再帰的に回答生成」が最も詳細まで説明した内容で回答生成されました。サブ質問毎に前回のQAのセットを渡す処理がFew-Show CoT(Chain-of-Thought)となるので、詳細な説明をなるべく維持したまま最終回答まで生成できたのではないかと推測します。

※ あくまで1例でしか検証していないので、他のユースケースでは異なる傾向も得られるかもしれません。

Discussion

ログインするとコメントできます