LLamaIndexのAgents Module Guidesを試す

kun432

以前LlamaIndexは、各機能を一通り試したのだが、エージェントについては当時は時期尚早と思ってあまり深く触らなかった。最近エージェント熱が高まっていることもあって、以下で改めて試している。

LlamaIndexで使えるエージェント用のモジュールはたくさんあるので、こちらで分けて個別に見ていこうと思う。

OpenAI向けエージェント
- OpenAI Agent
- OpenAI Agent with Query Engine Tools
- Retrieval Augmented Agent
- OpenAI Agent Cookbook
- Query Planning
- Context Retrieval Agent
- Recursive Retriever Agents
- Multi Document Agents
- Agent Builder
- Parallel Function Calling
- Agent with Planning
OpenAI Assistant エージェント
- OpenAI Assistant
- OpenAI Assistant Retrieval Benchmark
- Assistant Query Cookbook
他のFunction Calling エージェント
- Mistral Agent
ReActエージェント
- ReAct Agent
- ReAct Agent
- ReAct Agent with Query Engine Tools
LlamaHubで利用できるエージェント
- LLMCompiler Agent (Cookbook)
- Chain-of-Abstraction Agent (Cookbook)
- Language Agent Tree Search Agent (Cookbook)
- Instrospective Agent (Cookbook)
カスタムエージェント
- Custom Agent
- Query Pipeline Agent
低レベルエージェントAPI
- Agent Runner
- Agent Runner RAG
- Agent with Planning
- Controllable Agent Runner

kun432

事前準備

Colaboratoryで試していこうと思う。以下は事前準備。

パッケージインストール。トレーシング用にArize Phoenixのインテグレーションも追加。足りないものは別途追加する。

!pip install llama-index llama-index-callbacks-arize-phoenix

!pip freeze | egrep "llama-|arize"

arize-phoenix==4.5.0
llama-cloud==0.0.6
llama-index==0.10.51
llama-index-agent-openai==0.2.7
llama-index-callbacks-arize-phoenix==0.1.5
llama-index-cli==0.1.12
llama-index-core==0.10.51
llama-index-embeddings-openai==0.1.10
llama-index-indices-managed-llama-cloud==0.2.1
llama-index-legacy==0.9.48
llama-index-llms-openai==0.1.24
llama-index-multi-modal-llms-openai==0.1.6
llama-index-program-openai==0.1.6
llama-index-question-gen-openai==0.1.3
llama-index-readers-file==0.1.25
llama-index-readers-llama-parse==0.1.4
llama-parse==0.4.4
openinference-instrumentation-llama-index==2.0.0

OpenAI APIキーの読み込み

from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

Arize Phoenixのトレーシングを有効化。表示されたURLにアクセスすればトレーシングが確認できる。

import phoenix as px
import llama_index.core

px.launch_app()
llama_index.core.set_global_handler("arize_phoenix")

notebookなのでイベントループのネストを有効化。

import nest_asyncio

nest_asyncio.apply()

kun432

OpenAIモデル向けエージェント

OpenAI Agent

OpenAIのFunction Calling APIを使うエージェント。OpenAI用のエージェントモジュールは最初から存在してて、それを使えばかなりシンプルに書けるのだけど、それを使わずにまずはカスタムなOpenAI APIを使うエージェントをベタに実装する。

以下が必要になる。

OpenAI API（llama_indexのllmクラスを使う）
会話履歴を保持するメモリ
エージェントが使うツールの定義

まずツールの定義。ここでは整数の足し算、掛け算を行う関数をツールとして定義する。

from llama_index.core.tools import FunctionTool


def add(a: int, b: int) -> int:
    """2つの整数を足し算して、結果を整数で返す。"""
    return a + b


def multiply(a: int, b: int) -> int:
    """2つの整数を掛け算して、結果を整数で返す。"""
    return a * b


add_tool = FunctionTool.from_defaults(fn=add)
multiply_tool = FunctionTool.from_defaults(fn=multiply)

次にエージェントのクラスを作成する。

import json
from typing import Sequence, List

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool
from openai.types.chat import ChatCompletionMessageToolCall


class MyOpenAIAgent:
    def __init__(
        self,
        tools: Sequence[BaseTool] = [],
        llm: OpenAI = OpenAI(temperature=0, model="gpt-3.5-turbo-0125"),
        chat_history: List[ChatMessage] = [],
    ) -> None:
        self._llm = llm
        self._tools = {tool.metadata.name: tool for tool in tools}
        self._chat_history = chat_history

    def reset(self) -> None:
        self._chat_history = []

    def chat(self, message: str) -> str:
        chat_history = self._chat_history
        chat_history.append(ChatMessage(role="user", content=message))
        tools = [
            tool.metadata.to_openai_tool() for _, tool in self._tools.items()
        ]

        ai_message = self._llm.chat(chat_history, tools=tools).message
        additional_kwargs = ai_message.additional_kwargs
        chat_history.append(ai_message)

        tool_calls = additional_kwargs.get("tool_calls", None)
        # parallel function callingのサポート
        if tool_calls is not None:
            for tool_call in tool_calls:
                function_message = self._call_function(tool_call)
                chat_history.append(function_message)
                ai_message = self._llm.chat(chat_history).message
                chat_history.append(ai_message)

        return ai_message.content

    def _call_function(
        self, tool_call: ChatCompletionMessageToolCall
    ) -> ChatMessage:
        id_ = tool_call.id
        function_call = tool_call.function
        tool = self._tools[function_call.name]
        tool_args = json.loads(function_call.arguments)
        output = tool(**tool_args)
        print(f"**tool: {function_call.name}**")
        print("INPUT:")
        print(tool_args)
        print("OUTPUT:")
        print(output)
        print("**")
        return ChatMessage(
            name=function_call.name,
            content=str(output),
            role="tool",
            additional_kwargs={
                "tool_call_id": id_,
                "name": function_call.name,
            },
        )

初期化時（__init__）にツール、LLM、会話履歴を受け取る。それぞれ指定がなければデフォルトもしくは初期化された状態でエージェントのインスタンスが作成される。
chatメソッドでクエリを受け取り、LLMに投げてレスポンスを受け取る
- ツールの定義があれば、あわせてLLMに投げられる
- LLMのレスポンスにtool_callsが含まれていれば、レスポンスのパラメータをツールに渡して実行（_call_function）、結果をLLMに投げて、レスポンスを受け取る
- これらのやり取りを会話履歴に保存する
resetメソッドは会話履歴をクリアする

関数実行時に少しデバッグ的な出力をいれてある。

では作成したエージェントクラスからインスタンスを作成。この時、ツールを指定する。

agent = MyOpenAIAgent(tools=[multiply_tool, add_tool])

エージェントにクエリを投げてみる。

agent.chat("こんにちは！")

こんにちは！どのようにお手伝いしましょうか？

agent.chat("2123 かける 215123 はいくつ？")

**tool: multiply**
INPUT:
{'a': 2123, 'b': 215123}
OUTPUT:
456706129
**
2123 かける 215123 は 456,706,129 です。

Function Callingを使って、ツールの選択と実行が行われて、その結果を踏まえた回答が行われていることがわかる。

会話履歴もエージェント内で保持している。

agent.chat("ごめん、答えを聞き逃しちゃった。なんだっけ？")

2123 かける 215123 は 456,706,129 です。

ツールを使わずに会話履歴から回答できていることがわかる。

トレースはこんな感じ。

これらをより簡単かつ便利に使えるようにしてあるのがOpenAIAgentモジュール。上で書いたものとの違いは以下。

BaseChatEngineとBaseQueryEngineインターフェイスを実装しているため、LlamaIndexの他のモジュールとシームレスに使用可能。
会話ターンごとに複数の関数呼び出しをサポート
ストリーミング出力をサポート
非同期エンドポイントをサポート
コールバックとトレースをサポート

OpenAIAgentモジュールを使うと以下のようにシンプルに書ける。

from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-0125")

agent = OpenAIAgent.from_tools(
    [multiply_tool, add_tool], llm=llm, verbose=True
)

ではクエリを投げてみる。chat、achat、stream_chat、astream_chatが使える。aがついているのは非同期、steamはストリーミング出力になる。

chatの場合。

response = agent.chat("こんにちは！")
print(str(response))

Added user message to memory: こんにちは！
こんにちは！どのようにお手伝いしましょうか？

1行目は初期化時のverboseオプションによるもの。会話履歴に保存されていることがわかる。

ツールを使う質問をしてみる。

response = agent.chat("(121 * 3) + 42 はいくつ？")
print(str(response))

Added user message to memory: (121 * 3) + 42 はいくつ？
=== Calling Function ===
Calling function: multiply with args: {"a": 121, "b": 3}
Got output: 363
========================

=== Calling Function ===
Calling function: add with args: {"a": 363, "b": 42}
Got output: 405
========================

(121 * 3) + 42 は、405 です。

ツール実行が行われてその結果から回答が生成されているのがわかる。上記のツール実行時に出力はverboseによるものだが、レスポンス内のsourceでも確認することもできる。

response.sources

[
  ToolOutput(content='363', tool_name='multiply', raw_input={'args': (), 'kwargs': {'a': 121, 'b': 3}}, raw_output=363, is_error=False),
  ToolOutput(content='405', tool_name='add', raw_input={'args': (), 'kwargs': {'a': 363, 'b': 42}}, raw_output=405, is_error=False)
]

会話履歴も保持されている。

response = agent.chat("ごめん、答えを聞き逃しちゃった。なんだっけ？")
print(str(response))

Added user message to memory: ごめん、答えを聞き逃しちゃった。なんだっけ？
答えは405です！

achatの場合

response = await agent.achat("次に、121 * 3 は?")
print(str(response))

Added user message to memory: 次に、121 * 3 は?
=== Calling Function ===
Calling function: multiply with args: {"a":121,"b":3}
Got output: 363
========================

121 * 3 は、363 です。

streamの場合

response = agent.stream_chat(
    "121 * 2 は?"
)

response_gen = response.response_gen

for token in response_gen:
    print(token, end="\n")  # 通常は`end=""`で。ストリーミング確認しやすいように改行を入れた。

Added user message to memory: 121 * 2 は?
=== Calling Function ===
Calling function: multiply with args: {"a":121,"b":2}
Got output: 242
========================

121
 *
 
2
 
は
、
242
 
です
。

astreamの場合

response = await agent.astream_chat(
    "121 + 8 は?"
)

response_gen = response.response_gen

async for token in response.async_response_gen():
    print(token, end="\n")  # 通常は`end=""`で。ストリーミング確認しやすいように改行を入れた。

Added user message to memory: 121 + 8 は?
=== Calling Function ===
Calling function: add with args: {"a":121,"b":8}
Got output: 129
========================

121
 +
 
8
 
は
、
129
 
です
。

エージェントモジュールはトレーシングについても設定されているため、一連の処理がまとまって見れる。（自分で作成する場合はこのあたりも意識する必要がある）

そのうちやる

kun432

OpenAI Assistant APIを使うエージェント（ベータ）

OpenAI Assitant APIを使うエージェント

OpenAI Assistant

そのうちやる

kun432

OpenAI Assistant Retrieval Benchmark

そのうちやる

kun432

Assistant Query Cookbook

そのうちやる

kun432

その他のFunction Calling対応モデルを使うエージェント

OpenAI以外のFuncton Calling対応モデルを使うエージェント。例としてMistral Agentが挙げられているが、Mistral専用のモジュールというわけではなくて、FunctionCallingAgentWorkerというどうも汎用的なFunctionCalling対応モデル向けエージェントモジュールがある様子。

karakuri-lm-8x7b-instruct-v0.1＋OllamaでRAGエージェントを作ってみたけど、このときはReActAgent+QueryEngineToolを使用した。ReActよりもこちらのほうが直接的に使えるのではないかという気がする。

気が向いたら試す。

kun432

ReActを使ったエージェント

ReActを使ったエージェントモジュール。
https://arxiv.org/abs/2210.03629

ReActAgent

上でやったOpenAIAgentと同じサンプルをReActAgentに置き換えただけのサンプル。

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool


def multiply(a: int, b: int) -> int:
    """2つの整数を掛け算して、結果を整数で返す。"""
    return a * b


def add(a: int, b: int) -> int:
    """2つの整数を足し算して、結果を整数で返す。"""
    return a + b


multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)

llm = OpenAI(model="gpt-3.5-turbo-0125")

agent = ReActAgent.from_tools(
    [multiply_tool, add_tool], llm=llm, verbose=True
)

クエリを投げてみる。chat、achat、stream_chat、astream_chatなどはエージェント共通のメソッドっぽい。

response = agent.chat("121 * 3 + 42 は？計算はステップバイステップで行うこと。計算は推論せずに必ずツールの実行結果を示すこと。")

verboseの出力。ReActの思考過程が表示されている。

Thought: The current language of the user is: Japanese. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 121, 'b': 3}
Observation: 363
Thought: The current language of the user is: Japanese. I need to use a tool to help me answer the question.
Action: add
Action Input: {'a': 363, 'b': 42}
Observation: 405
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: 121 * 3 + 42 は、363 + 42 = 405 です。

生成された回答

print(str(response))

121 * 3 + 42 は、363 + 42 = 405 です。

ReActAgentのプロンプトはget_promptsで取得できる。

prompt_dict = agent.get_prompts()
for k, v in prompt_dict.items():
    print(f"**プロンプト: {k}\n\n---\n{v.template}")
    print("\n---")

**プロンプト: agent_worker:system_prompt

---
You are designed to help with a variety of tasks, from answering questions to providing summaries to other types of analyses.

## Tools

You have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.
This may require breaking the task into subtasks and using different tools to complete each subtask.

You have access to the following tools:
{tool_desc}


## Output Format

Please answer in the same language as the question and use the following format:

```
Thought: The current language of the user is: (user's language). I need to use a tool to help me answer the question.
Action: tool name (one of {tool_names}) if using a tool.
Action Input: the input to the tool, in a JSON format representing the kwargs (e.g. {{"input": "hello world", "num_beams": 5}})
```

Please ALWAYS start with a Thought.

Please use a valid JSON format for the Action Input. Do NOT do this {{'input': 'hello world', 'num_beams': 5}}.

If this format is used, the user will respond in the following format:

```
Observation: tool response
```

You should keep repeating the above format till you have enough information to answer the question without using any more tools. At that point, you MUST respond in the one of the following two formats:

```
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: [your answer here (In the same language as the user's question)]
```

```
Thought: I cannot answer the question with the provided tools.
Answer: [your answer here (In the same language as the user's question)]
```

## Current Conversation

Below is the current conversation consisting of interleaving human and assistant messages.


---

プロンプトをカスタマイズして日本語にしてみる。ほぼ機械翻訳させただけ。

from llama_index.core import PromptTemplate

react_system_header_str = """\

あなたは、質問への回答から要約の提供、その他の分析に至るまで、さまざまなタスクを支援するように設計されている。

## ツール
あなたは、多種多様なツールを使用することができる。あなたは、目の前のタスクを完了するために適切と思われる順序でツールを使用する責任がある。
そのためには、タスクをサブタスクに分割し、それぞれのサブタスクを完了するために異なるツールを使用する必要があるかもしれない。

あなたは以下のツールを利用できる:
{tool_desc}

## 出力フォーマット
質問に答えるために、以下の書式を使うこと。

```
Thought: その質問に答えるためには、私はツールを使う必要がある。
Action: ツール名 ({tool_names} のうちから1つ)を指定、もしツールを使う場合。
Action Input: ツールへの入力を、キーワード引数を表すJSONフォーマットで示される (例: {{"input": "ハローワールド", "num_beams": 5}})
```

必ずThoughtから開始してください。

Action Inputは有効なJSONフォーマットを使用してください。次のようなもものは使用しないでください: {{'input': 'hello world', 'num_beams': 5}}.

このフォーマットが使用された場合、ユーザーは以下のフォーマットで返答する：

```
Observation: ツールの出力
```

ツールを使わずに質問に答えられるだけの情報が得られるまで、上記の形式を繰り返すこと。その時点で、次の2つの形式のいずれかで回答しなければならない：


```
Thought: これ以上ツールを使わなくても私は回答が可能
Answer: [あなたの回答]
```

```
Thought: 提供されたツールだけでは私は回答ができない
Answer: ごめんなさい、私はその質問に答えることが出来ません。
```

## 追加ルール
- 回答には、なぜその回答に至ったかを説明する一連の箇条書きを含めなければならない（MUST）。これには、以前の会話履歴の側面を含めることができる。
- 各ツールの関数シグネチャに従わなければならない。関数が引数を要求している場合、引数なしで渡してはならない。

## 現在の会話
以下は、人間とアシスタントのメッセージを織り交ぜた現在の会話である。

"""

react_system_prompt = PromptTemplate(react_system_header_str)

なお、Thought、Action、Action Input、Answerなどのやり取り中に使用されるフォーマット部分を変更するとうまくいかなかった。プロンプトが足りないのか、これらが明示的に使用されているのかはわからない。

エージェントのプロンプトを更新する。

agent.update_prompts({"agent_worker:system_prompt": react_system_prompt})

ではクエリを投げてみる。最初にreset()で会話履歴をクリアしている。

agent.reset()
response = agent.chat("121 * 3 + 42 は？計算はステップバイステップで行うこと。計算は推論せずに必ずツールの実行結果を示すこと。")

Thought: この計算を行うために、まずは掛け算を行い、その後に足し算を行う必要があります。
Action: multiply
Action Input: {'a': 121, 'b': 3}
Observation: 363
Thought: 今度は足し算を行います。
Action: add
Action Input: {'a': 363, 'b': 42}
Observation: 405
Thought: これ以上ツールを使わなくても回答が可能です。
Answer: 121 * 3 + 42 = 405

print(response)

121 * 3 + 42 = 405

ReActAgent をQueryEngineToolsと組み合わせて使う

上でも紹介したけど、karakuri-lm-8x7b-instruct-v0.1＋OllamaでLlamaIndexのReActAgent+QueryEngineToolを使った例。

kun432

LlamaHubを使ったエージェント

LlamaHubにあるLLama Packsパッケージを使った、少し複雑なエージェント。

LLMCompiler Agent

そのうちやる

kun432

Chain-of-Abstraction Agent

Agent with Planning

そのうちやる

kun432

Controllable Agent Runner

そのうちやる

human-in-the-loop はたぶんここ

kun432

Workflowできたしもういっかという感になった
https://zenn.dev/kun432/scraps/64dfc4957f98a9
実装見てみたいとかはあるかもなので、気が向いたら個別に試して見るかも。

このスクラップは3ヶ月前にクローズされました