🧩

LlamaIndexのReActAgentについての調査

2024/08/21に公開

概要

https://docs.llamaindex.ai/en/stable/examples/agent/react_agent/

agentを学ぶにあたり、LlamaIndexのReActAgentについてコードリーディングしました。

チュートリアルのコードリーディング

以下のコードを読んでいきます。

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
def multiply(a: int, b: int) -> int:
    """Multiply two integers and returns the result integer"""
    return a * b

multiply_tool = FunctionTool.from_defaults(fn=multiply)
def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b

add_tool = FunctionTool.from_defaults(fn=add)
llm = OpenAI(model="gpt-4")
agent = ReActAgent.from_tools([multiply_tool, add_tool], llm=llm, verbose=True)
response = agent.chat("What is 2+2*4")
print(response)


---出力---

> Running step ce786c65-b633-47c6-b4b7-9082ef432f8b. Step input: What is 2+2*4
Thought: The user is asking for the result of a mathematical operation. According to the order of operations (BIDMAS/BODMAS), multiplication should be performed before addition. So, I need to first multiply 2 and 4, and then add the result to 2. I'll use the 'multiply' tool first.
Action: multiply
Action Input: {'a': 2, 'b': 4}
Observation: 8
> Running step 20e6085f-6239-4dda-b858-116d83afea28. Step input: None
Thought: The multiplication result is 8. Now, I need to add this result to 2. I'll use the 'add' tool for this.
Action: add
Action Input: {'a': 2, 'b': 8}
Observation: 10
> Running step 7c2f11a9-ab76-4887-80cf-a649f6b39d51. Step input: None
Thought: I can answer without using any more tools. I'll use the user's language to answer.
Answer: 10
10


処理の詳細

FunctionTool

multiply_tool = FunctionTool.from_defaults(fn=multiply)

関数をツールとして定義します。

https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/#functiontool
関数が何をする物なのか、引数に何を受け取るかをLLMに分かるように定義する。
うまく定義できればLLM agentが必要に応じてtoolを実行してくれる。

どういったものをtoolとして定義するべきなのかが設計のポイントになりそう。
LLMが単体ではできないもの、苦手なものをtoolにする?
引数のフォーマットなどもベストプラクティスがありそう。(要調査)

例)

  • 計算系
    • LLMは計算が苦手なので
  • 外部apiコール
    • LLMは外部通信できないので
    • セキュリティは十分考慮する必要がある

https://llamahub.ai/
ここに出来合いのtoolsがあるので参考なりそう。

ReActAgent

llm = OpenAI(model="gpt-4")
agent = ReActAgent.from_tools([multiply_tool, add_tool], llm=llm, verbose=True)

llm + toolsを持ったagentを定義します。
ここでのReActAgentとは「llmとtoolsをつかって質問に対し試行錯誤して回答を導いてくれるやつ」位の理解をした。

https://github.com/run-llama/llama_index/blob/17f23014953e07eb8f8e7690d4cca7fb26c2109c/llama-index-core/llama_index/core/agent/react/base.py#L91-L110


質問の実行

response = agent.chat("What is 2+2*4")
print(response)

質問を投げた時の動作を見ていく。

https://github.com/run-llama/llama_index/blob/17f23014953e07eb8f8e7690d4cca7fb26c2109c/llama-index-core/llama_index/core/agent/runner/base.py#L631-L638
ここから始まる。

https://github.com/run-llama/llama_index/blob/17f23014953e07eb8f8e7690d4cca7fb26c2109c/llama-index-core/llama_index/core/agent/runner/base.py#L576-L587
ここで答えが出るまでループする。

https://github.com/run-llama/llama_index/blob/17f23014953e07eb8f8e7690d4cca7fb26c2109c/llama-index-core/llama_index/core/agent/runner/base.py#L385-L426
1回のループ処理の中身。

        if self.verbose:
            print(f"> Running step {step.step_id}. Step input: {step.input}")

ステップの開始ログがここで出力される。

https://github.com/run-llama/llama_index/blob/17f23014953e07eb8f8e7690d4cca7fb26c2109c/llama-index-core/llama_index/core/agent/react/step.py#L544-L581
agent_workerが行うメインの処理。

        # TODO: see if we want to do step-based inputs
        tools = self.get_tools(task.input)
        input_chat = self._react_chat_formatter.format(
            tools,
            chat_history=task.memory.get(input=task.input)
            + task.extra_state["new_memory"].get_all(),
            current_reasoning=task.extra_state["current_reasoning"],
        )

        # send prompt
        chat_response = self._llm.chat(input_chat)
        # given react prompt outputs, call tools or return response
        reasoning_steps, is_done = self._process_actions(
            task, tools, output=chat_response
        )

このあたりが大事っぽい。
pdbで変数を見てみる。

---1周目---

ipdb>  print(task.input)
What is 2+2*4


ipdb>  print(tools)
[<llama_index.core.tools.function_tool.FunctionTool object at 0x137d9b2c0>, <llama_index.core.tools.function_tool.FunctionTool object at 0x1451a7650>]

toolsはReActAgent.from_toolsに渡したもの。

ipdb>  print(task.memory.get(input=task.input))
[]
ipdb>  print(task.extra_state["new_memory"].get_all())
[ChatMessage(role=<MessageRole.USER: 'user'>, content='What is 2+2*4', additional_kwargs={})]

最初の質問文のみが入っている。

ipdb>  print(task.extra_state["current_reasoning"])
[]


ipdb>  print(input_chat)
[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content='You are designed to help with a variety of tasks, from answering questions to providing summaries to other types of analyses.\n\n## Tools\n\nYou have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.\nThis may require breaking the task into subtasks and using different tools to complete each subtask.\n\nYou have access to the following tools:\n> Tool Name: multiply\nTool Description: multiply(a: int, b: int) -> int\nMultiply two integers and returns the result integer\nTool Args: {"type": "object", "properties": {"a": {"title": "A", "type": "integer"}, "b": {"title": "B", "type": "integer"}}, "required": ["a", "b"]}\n\n> Tool Name: add\nTool Description: add(a: int, b: int) -> int\nAdd two integers and returns the result integer\nTool Args: {"type": "object", "properties": {"a": {"title": "A", "type": "integer"}, "b": {"title": "B", "type": "integer"}}, "required": ["a", "b"]}\n\n\n\n## Output Format\n\nPlease answer in the same language as the question and use the following format:\n\n```\nThought: The current language of the user is: (user\'s language). I need to use a tool to help me answer the question.\nAction: tool name (one of multiply, add) if using a tool.\nAction Input: the input to the tool, in a JSON format representing the kwargs (e.g. {"input": "hello world", "num_beams": 5})\n```\n\nPlease ALWAYS start with a Thought.\n\nNEVER surround your response with markdown code markers. You may use code markers within your response if you need to.\n\nPlease use a valid JSON format for the Action Input. Do NOT do this {\'input\': \'hello world\', \'num_beams\': 5}.\n\nIf this format is used, the user will respond in the following format:\n\n```\nObservation: tool response\n```\n\nYou should keep repeating the above format till you have enough information to answer the question without using any more tools. At that point, you MUST respond in the one of the following two formats:\n\n```\nThought: I can answer without using any more tools. I\'ll use the user\'s language to answer\nAnswer: [your answer here (In the same language as the user\'s question)]\n```\n\n```\nThought: I cannot answer the question with the provided tools.\nAnswer: [your answer here (In the same language as the user\'s question)]\n```\n\n## Current Conversation\n\nBelow is the current conversation consisting of interleaving human and assistant messages.\n', additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='What is 2+2*4', additional_kwargs={})]

systemメッセージとuserメッセージの2要素が入っている。

ipdb>  print(input_chat[0].content)
You are designed to help with a variety of tasks, from answering questions to providing summaries to other types of analyses.

## Tools

You have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.
This may require breaking the task into subtasks and using different tools to complete each subtask.

You have access to the following tools:
> Tool Name: multiply
Tool Description: multiply(a: int, b: int) -> int
Multiply two integers and returns the result integer
Tool Args: {"type": "object", "properties": {"a": {"title": "A", "type": "integer"}, "b": {"title": "B", "type": "integer"}}, "required": ["a", "b"]}

> Tool Name: add
Tool Description: add(a: int, b: int) -> int
Add two integers and returns the result integer
Tool Args: {"type": "object", "properties": {"a": {"title": "A", "type": "integer"}, "b": {"title": "B", "type": "integer"}}, "required": ["a", "b"]}



## Output Format

Please answer in the same language as the question and use the following format:

```
Thought: The current language of the user is: (user's language). I need to use a tool to help me answer the question.
Action: tool name (one of multiply, add) if using a tool.
Action Input: the input to the tool, in a JSON format representing the kwargs (e.g. {"input": "hello world", "num_beams": 5})
```

Please ALWAYS start with a Thought.

NEVER surround your response with markdown code markers. You may use code markers within your response if you need to.

Please use a valid JSON format for the Action Input. Do NOT do this {'input': 'hello world', 'num_beams': 5}.

If this format is used, the user will respond in the following format:

```
Observation: tool response
```

You should keep repeating the above format till you have enough information to answer the question without using any more tools. At that point, you MUST respond in the one of the following two formats:

```
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: [your answer here (In the same language as the user's question)]
```

```
Thought: I cannot answer the question with the provided tools.
Answer: [your answer here (In the same language as the user's question)]
```

## Current Conversation

Below is the current conversation consisting of interleaving human and assistant messages.

system messageを見やすくprint。

toolsを考慮した、めちゃくちゃ膨大な指示が記載されている。
ある程度賢いLLMでなければ対応できなそう。

ipdb>  print(input_chat[1].content)
What is 2+2*4

user messageを見やすくprint。
質問文のみ。

ipdb>  print(chat_response)
assistant: Thought: The user is asking for the result of a mathematical operation. According to the order of operations (BIDMAS/BODMAS), multiplication should be performed before addition. Therefore, I need to first multiply 2 and 4, and then add the result to 2. I will use the 'multiply' tool first.

Action: multiply
Action Input: {"a": 2, "b": 4}

最初の回答がこちら。
考えた結果multiplyの実行が必要と判断された。
引数についても値が指定されている。

        # given react prompt outputs, call tools or return response
        reasoning_steps, is_done = self._process_actions(
            task, tools, output=chat_response
        )

ここで上記のレスポンスをパースしてtoolのキックがひつようなら関数を実行。
必要でなければis_doneがTrueになって処理が終了したことがわかる。

https://github.com/run-llama/llama_index/blob/a63caa94e99f0f272885797df4da4b51ab2ae31d/llama-index-core/llama_index/core/agent/react/output_parser.py#L16-L28
ちなみにコードを深堀っていくとActionや引数を特定しているのはこの部分。
function_callではないことに注意。

ipdb>  print(reasoning_steps)
[ActionReasoningStep(thought="The user is asking for the result of a mathematical operation. According to the order of operations (BIDMAS/BODMAS), multiplication should be performed before addition. Therefore, I need to first multiply 2 and 4, and then add the result to 2. I will use the 'multiply' tool first.", action='multiply', action_input={'a': 2, 'b': 4}), ObservationReasoningStep(observation='8', return_direct=False)]
ipdb>  print(is_done)
False

reasoning_stepsにはtoolsの実行結果も入っている。
まだ完了していないのでis_doneはFalse。

---2周目---

ipdb>  print(len(input_chat))
4

メッセージが2つ追加されている。

ipdb>  print(input_chat[2])
assistant: Thought: The user is asking for the result of a mathematical operation. According to the order of operations (BIDMAS/BODMAS), multiplication should be performed before addition. Therefore, I need to first multiply 2 and 4, and then add the result to 2. I will use the 'multiply' tool first.
Action: multiply
Action Input: {'a': 2, 'b': 4}

agentが判断したtoolキック(掛け算)の指示と。

ipdb>  print(input_chat[3])
user: Observation: 8

その実行結果。

ipdb>  print(chat_response)
assistant: Thought: The multiplication of 2 and 4 is 8. Now, I need to add this result to 2. I will use the 'add' tool for this.
Action: add
Action Input: {"a": 2, "b": 8}

さらに掛け算が必要と判断。

---3周目---

ipdb>  print(len(input_chat))
6

メッセージが2つ追加されている。

ipdb>  print(input_chat[4])
assistant: Thought: The multiplication of 2 and 4 is 8. Now, I need to add this result to 2. I will use the 'add' tool for this.
Action: add
Action Input: {'a': 2, 'b': 8}

agentが判断したtoolキック(足し算)の指示と。

ipdb>  print(input_chat[5])
user: Observation: 10

その実行結果。

ipdb>  print(chat_response)
assistant: Thought: I can answer without using any more tools. The result of the operation 2+2*4 is 10, following the order of operations.
Answer: 10

ここで答えが出る。

ipdb>  print(reasoning_steps)
[ResponseReasoningStep(thought='I can answer without using any more tools. The result of the operation 2+2*4 is 10, following the order of operations.', response='10', is_streaming=False)]

toolの結果を使って最終的な答えに辿り着いたことが分かる。

ipdb>  print(is_done)
True

終わり。

感想

ReActAgentはllm agentのキーとなる仕組みと思われるので、理解が進んで良かった。

llama-agentsの様に複数のagentに対し適材適所で指示を行い、回答を導き出す仕組みがある。
それぞれのagentが任務を遂行してくれないと全体として成立しないので、この辺が大事になってくるはず。

また、LLMでプロダクト開発を行うにあたり個々のagantの役割設定がセンスが問われる部分になってくると思う。

Discussion