CohereのFunction Callingを試す

kun432

以前確認した際にはCohereにおけるFunction Calling相当のものは「Connectors」と呼ばれるものだと思っていた。

https://docs.cohere.com/docs/connectors

自分の理解が間違っていた可能性はあるのだけども、ざっと見た感じはOpenAIのFunction Callingとは実装の仕方が異なる印象だった。

それが、どうやらOpenAIライクなFunction Callingに対応していたらしい。~~いつかはわからない。~~

https://docs.cohere.com/docs/tool-use

LiteLLM見てみるとCommand-RのFunction calling対応のPRがすでにマージされていて、それを見る限りはどうやら3週間ほど前だったみたい。

ということで少し動きを見てみる。

kun432

公式のnotebookに従ってやってみる。Collaboratoryで。

ただし、ちょうどv4からv5への移行が行われている。notebookはv4で書かれたもののようなので、v5でうまくいくかはわからないけど、マイグレガイド見ながらv5でやってみる。

パッケージインストール

pip install cohere

!pip freeze | grep cohere

5.2.2がインストールされた

cohere==5.2.2

ではクライアントを初期化

import cohere
from google.colab import userdata
import os

os.environ["CO_API_KEY"] = userdata.get('CO_API_KEY')
co = cohere.Client()

今回のサンプルでは以下のような内容となる。

以下の販売管理データベースを用意
- 日付別の売上データ（総売上、総販売数）
- カテゴリごとの商品データ（ID、商品名、価格、在庫）
上記のデータを参照する関数を用意して、Function Callingを使って、関数を実行し、回答を生成する。

まず、ダミーのデータを作成する。

sales_database = {
    '2023-09-28': {
        'total_sales_amount': 5000,
        'total_units_sold': 100,
    },
    '2023-09-29': {
        'total_sales_amount': 10000,
        'total_units_sold': 250,
    },
    '2023-09-30': {
        'total_sales_amount': 8000,
        'total_units_sold': 200,
    }
}

product_catalog = {
    '家電': [
        {'product_id': 'E1001', 'name': 'スマートフォン', 'price': 500, 'stock_level': 20},
        {'product_id': 'E1002', 'name': 'ノートパソコン', 'price': 1000, 'stock_level': 15},
        {'product_id': 'E1003', 'name': 'タブレット', 'price': 300, 'stock_level': 25},
    ],
    '衣類': [
        {'product_id': 'C1001', 'name': 'Tシャツ', 'price': 20, 'stock_level': 100},
        {'product_id': 'C1002', 'name': 'ジーンズ', 'price': 50, 'stock_level': 80},
        {'product_id': 'C1003', 'name': 'ジャケット', 'price': 100, 'stock_level': 40},
    ]
}

これにアクセスする関数を作成。

def query_daily_sales_report(day: str) -> dict:
    """
    指定された日の売上レポートを取得する関数
    """
    report = sales_database.get(day, {})
    if report:
        return {
            'date': day,
            'summary': f"総売上高: {report['total_sales_amount']}, 総販売数: {report['total_units_sold']}"
        }
    else:
        return {'date': day, 'summary': '該当日の販売データなし'}


def query_product_catalog(category: str) -> dict:
    """
    指定されたカテゴリの商品を検索する関数
    """
    products = product_catalog.get(category, [])
    return {
        'category': category,
        'products': products
    }

試しに実行。

print(query_daily_sales_report("2023-09-30"))

{
    'date': '2023-09-30',
    'summary': '総売上高: 8000, 総販売数: 200'
}

print(query_product_catalog("家電"))

{
    'category': '家電',
    'products': [
        {
            'product_id': 'E1001',
            'name': 'スマートフォン',
            'price': 500,
            'stock_level': 20
        },
        {
            'product_id': 'E1002',
            'name': 'ノートパソコン',
            'price': 1000,
            'stock_level': 15
        },
        {
            'product_id': 'E1003',
            'name': 'タブレット',
            'price': 300,
            'stock_level': 25
        }
    ]
}

ちゃんと動作している。

ではfunction callingの関数定義。

functions_map = {
    "query_daily_sales_report": query_daily_sales_report,
    "query_product_catalog": query_product_catalog
}

tools = [
    {
        "name": "query_daily_sales_report",
        "description": "データベースに接続し、指定された日の販売量と販売情報を取得します。",
        "parameter_definitions": {
            "day": {
                "description": "ここで指定された日の売上データを取得する。日付は YYYY-MM-DD 形式で指定する",
                "type": "str",
                "required": True
            }
        }
    },
    {
        "name": "query_product_catalog",
        "description": "商品カタログに接続し、カテゴリー、価格、在庫レベルなど、販売されているすべての商品に関する情報を取得する。",
        "parameter_definitions": {
            "category": {
                "description": "ここで指定されたカテゴリに属する全商品の商品情報データを取得する",
                "type": "str",
                "required": True
            }
        }
    }
]

プロンプトを設定。Cohereではシステムプロンプトのことを"preamble"という。

# タスクに関する指示と、出力に望まれるスタイルをpreambleで指定
preamble = """
## 指示とコンテキスト

あなたは、人々の質問やその他のリクエストにインタラクティブに答える手助けをします。
あなたは、あらゆる種類のトピックに関する非常に幅広い要求を尋ねられるでしょう。
幅広い検索エンジンや類似のツールが用意されており、それらを使って答えを調べます。

ユーザーのニーズにできる限り応えることに集中する必要があります。

## 回答のスタイル

ユーザーから別の回答スタイルを要求されない限り、適切な文法とスペルを使い、完全な文章で回答する必要があります。
"""

# ユーザークエリ
message = """
2023年9月29日の売上概要と、「家電」カテゴリーに属する製品の詳細（価格や在庫レベルなど）を教えてください。
"""

モデルにリクエスを送信

response = co.chat(
    message=message,
    tools=tools,
    preamble=preamble,
    model="command-r"
)

print("モデルが推奨するtool callsは以下:")
print("\n".join(str(tool_call) for tool_call in response.tool_calls))

関数に渡すパラメータが生成される。

モデルが推奨するtool callsは以下:
name='query_daily_sales_report' parameters={'day': '2023-09-29'}
name='query_product_catalog' parameters={'category': '家電'}

ツールの実行とその結果をモデルに返すためのオブジェクトを組み立てる。ここは少しnotebookの内容ではうまく動かなかったので修正している。

import json

tool_results = []

# モデルによって生成されたtool callを順に取り出し
for tool_call in response.tool_calls:
    # モデルが返したパラメータを使ってツールを呼び出す
    print(f"= ツール {tool_call.name} を次のパラメータで実行: {tool_call.parameters}")
    output = functions_map[tool_call.name](**tool_call.parameters)
    # ツールの実行結果をリストとして保存
    outputs = [output]
    print(f"== ツール実行結果: {outputs}")
    # ツールの実行結果を以下の形式で保存
    tool_results.append({
        "call": tool_call.dict(),
        "outputs": outputs
    })

print("次のステップでモデルに返されるツール実行結果:")
print(json.dumps(tool_results, indent=4, ensure_ascii=False))

= ツール query_daily_sales_report を次のパラメータで実行: {'day': '2023-09-29'}
== ツール実行結果: [{'date': '2023-09-29', 'summary': '総売上高: 10000, 総販売数: 250'}]
= ツール query_product_catalog を次のパラメータで実行: {'category': '家電'}
== ツール実行結果: [{'category': '家電', 'products': [{'product_id': 'E1001', 'name': 'スマートフォン', 'price': 500, 'stock_level': 20}, {'product_id': 'E1002', 'name': 'ノートパソコン', 'price': 1000, 'stock_level': 15}, {'product_id': 'E1003', 'name': 'タブレット', 'price': 300, 'stock_level': 25}]}]
次のステップでモデルに返されるツール実行結果:
[
    {
        "call": {
            "name": "query_daily_sales_report",
            "parameters": {
                "day": "2023-09-29"
            }
        },
        "outputs": [
            {
                "date": "2023-09-29",
                "summary": "総売上高: 10000, 総販売数: 250"
            }
        ]
    },
    {
        "call": {
            "name": "query_product_catalog",
            "parameters": {
                "category": "家電"
            }
        },
        "outputs": [
            {
                "category": "家電",
                "products": [
                    {
                        "product_id": "E1001",
                        "name": "スマートフォン",
                        "price": 500,
                        "stock_level": 20
                    },
                    {
                        "product_id": "E1002",
                        "name": "ノートパソコン",
                        "price": 1000,
                        "stock_level": 15
                    },
                    {
                        "product_id": "E1003",
                        "name": "タブレット",
                        "price": 300,
                        "stock_level": 25
                    }
                ]
            }
        ]
    }
]

ツール実行結果を送信して最終回等を生成

response = co.chat(
   message=message,
   tools=tools,
   tool_results=tool_results,
   preamble=preamble,
   model="command-r",
   temperature=0.3
)


print("最終回答:")
print(response.text)

最終回答:
9月29日の総売上高は10000、総販売数は250でした。

家電製品の詳細については、以下の通りです。
- スマートフォン：価格500、在庫20
- ノートパソコン：価格1000、在庫15
- タブレット：価格300、在庫25

Cohereの面白いのは、こういうツール等を使った場合はその根拠となった出典情報が必ずついてくるらしい。

print("最終回答の出典:")
for cite in response.citations:
  print(cite)

最終回答の出典:
start=0 end=25 text='9月29日の総売上高は10000、総販売数は250' document_ids=['query_daily_sales_report:0:0']
start=55 end=62 text='スマートフォン' document_ids=['query_product_catalog:1:0']
start=63 end=73 text='価格500、在庫20' document_ids=['query_product_catalog:1:0']
start=76 end=83 text='ノートパソコン' document_ids=['query_product_catalog:1:0']
start=84 end=95 text='価格1000、在庫15' document_ids=['query_product_catalog:1:0']
start=98 end=103 text='タブレット' document_ids=['query_product_catalog:1:0']
start=104 end=114 text='価格300、在庫25' document_ids=['query_product_catalog:1:0']

これを使って最終回答に脚注的な装飾を行うことができる。notebookで紹介されていたヘルパー関数はそのままだと動かなかったので、少し修正。あと、zennのmarkdown使えるように少し出力を調整した。

def insert_citations_in_order(text, citations):
    """
    脚注の整形表示を行うヘルパー関数
    """
    offset = 0
    document_id_to_number = {}
    citation_number = 0
    modified_citations = []

    # 一意のdocument_idに基づいて番号を割り振って、脚注を処理
    for citation in citations:
        citation_numbers = []
        for document_id in sorted(citation.document_ids):
            if document_id not in document_id_to_number:
                citation_number += 1  # 新しいdocument_idに対するインクリメント
                document_id_to_number[document_id] = citation_number
            citation_numbers.append(document_id_to_number[document_id])

        # オフセット付きで開始/終了を調整
        start, end = citation.start + offset, citation.end + offset
        placeholder = ''.join([f'[^{number}]' for number in citation_numbers])
        # 脚注対象のテキストを太字にして、プレースホルダーを追加
        modification = f'**{text[start:end]}**{placeholder}'
        # Replace the cited text with its bolded version + placeholder
        # 脚注対象のテキストを太字バージョン＋プレースホルダーに置き換える
        text = text[:start] + modification + text[end:]
        # 以降の書き換えのためにオフセットを更新
        offset += len(modification) - (end - start)

    # 一意なdocument_idが一度だけリストされるように、一番下にリストされる脚注を準備
    unique_citations = {number: doc_id for doc_id, number in document_id_to_number.items()}
    citation_list = '\n'.join([f'[^{doc_id}]: 出典: {tool_results[doc_id - 1]["outputs"]} \n    ツール実行内容: {dict(tool_results[doc_id - 1]["call"])}' for doc_id, number in sorted(unique_citations.items(), key=lambda item: item[1])])
    text_with_citations = f'{text}\n\n{citation_list}'

    return text_with_citations

最終回答のテキストと出典情報を渡して整形。

print(insert_citations_in_order(response.text, response.citations))

**9月29日の総売上高は10000、総販売数は250**[^1]でした。

家電製品の詳細については、以下の通りです。
- **スマートフォン**[^2]：**価格500、在庫20**[^2]
- **ノートパソコン**[^2]：**価格1000、在庫15**[^2]
- **タブレット**[^2]：**価格300、在庫25**[^2]

[^1]: 出典: [{'date': '2023-09-29', 'summary': '総売上高: 10000, 総販売数: 250'}] 
    ツール実行内容: {'name': 'query_daily_sales_report', 'parameters': {'day': '2023-09-29'}}
[^2]: 出典: [{'category': '家電', 'products': [{'product_id': 'E1001', 'name': 'スマートフォン', 'price': 500, 'stock_level': 20}, {'product_id': 'E1002', 'name': 'ノートパソコン', 'price': 1000, 'stock_level': 15}, {'product_id': 'E1003', 'name': 'タブレット', 'price': 300, 'stock_level': 25}]}] 
    ツール実行内容: {'name': 'query_product_catalog', 'parameters': {'category': '家電'}}

上記をmarkdownでレンダリングさせてみた。

9月29日の総売上高は10000、総販売数は250^[1]でした。

家電製品の詳細については、以下の通りです。

スマートフォン^[2]：価格500、在庫20^[2:1]

ノートパソコン^[2:2]：価格1000、在庫15^[2:3]

タブレット^[2:4]：価格300、在庫25^[2:5]

脚注

出典: [{'date': '2023-09-29', 'summary': '総売上高: 10000, 総販売数: 250'}]
ツール実行内容: {'name': 'query_daily_sales_report', 'parameters': {'day': '2023-09-29'}} ↩︎
出典: [{'category': '家電', 'products': [{'product_id': 'E1001', 'name': 'スマートフォン', 'price': 500, 'stock_level': 20}, {'product_id': 'E1002', 'name': 'ノートパソコン', 'price': 1000, 'stock_level': 15}, {'product_id': 'E1003', 'name': 'タブレット', 'price': 300, 'stock_level': 25}]}]
ツール実行内容: {'name': 'query_product_catalog', 'parameters': {'category': '家電'}} ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

kun432

出典・脚注も正しく動作している。出力するかどうかは判断だけども、きちんと根拠が見えるのは良い。

出典・脚注についてはこちらも参照

kun432

複数の関数実行もワンターンで処理できるし、LiteLLMでもすでにサポートされているので、OpenAI Pythonクライアントでコードを書けば、OpenAI/Cohereのどちらのモデルにも対応できるというのが実現できるはず。

modelがswappableになって良き。

kun432

複数の関数を順序立てて実行しないといけないmulti-step tool callの例。

この例では、

LangChainのReAct agentを使用
ツールは以下を用意
- TavilyによるWeb検索（TavilyのAPIキーが必要）
- PythonREPL
- 複数の整数をランダムに演算する関数
- ポール・グレアムのエッセイ"The Best Essay"をもとに作成したベクトルインデックス検索

となっている。

なお、現時点ではCohereのPython SDKでマルチステップなtool callの実装例は、ドキュメントにも記載はなく、いろいろ試してみたけどもうまくいかない（1ターンで終わってしまう）。

後ほど判明したら追記する予定だが、一旦はnotebookに従ってやってみる。LangChainはあんまりわかってないのでサラッと流す程度で。

LangChainと依存パッケージをインストール。

!pip install --quiet langchain langchain_cohere langchain_experimental

インストールされたバージョンはこんな感じ。

!pip freeze | egrep "cohere|langchain"

cohere==5.2.2
langchain==0.1.14
langchain-cohere==0.1.0
langchain-community==0.0.31
langchain-core==0.1.40
langchain-experimental==0.0.56
langchain-text-splitters==0.0.1

最初にAPIキーをセット。ちなみにCOHEREのAPIキー、CohereネイティブのPython SDKだと環境変数はCO_API_KEYみたいなんだけど、どうやらLangChainではCOHERE_API_KEYみたい。

from google.colab import userdata
import os

os.environ["COHERE_API_KEY"] = userdata.get('CO_API_KEY')
os.environ["TAVILY_API_KEY"] = userdata.get('TAVILY_API_KEY')

ではここからツールを設定していく。

まず、Tavily検索。

from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.pydantic_v1 import BaseModel, Field

internet_search = TavilySearchResults()
internet_search.name = "internet_search"
internet_search.description = "インターネットから取得したテキストクエリに関連するドキュメントのスニペットのリストを返します。"

class TavilySearchInput(BaseModel):
   query: str = Field(description="インターネットを検索するクエリ")

internet_search.args_schema = TavilySearchInput

PythonREPL．

from langchain.agents import Tool
from langchain_experimental.utilities import PythonREPL

python_repl = PythonREPL()
python_tool = Tool(
   name="python_repl",
   description="pythonコードを実行し、結果を返します。コードは対話モードなしで静的なサンドボックス環境で実行されるので、結果を出力するか、ファイルに保存してください。",
   func=python_repl.run,
)
python_tool.name = "python_interpreter"

class ToolInput(BaseModel):
   code: str = Field(description="実行するPythonコード")

python_tool.args_schema = ToolInput

関数。2つの整数をランダムに掛け算・足し算するもの。

from langchain_core.tools import tool
import random
from langchain_core.pydantic_v1 import BaseModel, Field

@tool
def random_operation_tool(a: int, b: int):
 """複数の整数をランダムに演算する"""
 coin_toss = random.uniform(0, 1)
 if coin_toss > 0.5:
   return {'output': a*b}
 else:
   return {'output': a+b}

random_operation_tool.name = "random_operation" # use python case
random_operation_tool.description = "複数の整数をランダムに演算する"

class random_operation_inputs(BaseModel):
   a: int = Field(description="1番目の整数値")
   b: int = Field(description="2番目の整数値")

random_operation_tool.args_schema = random_operation_inputs

ベクトル検索。ポール・グレアムのエッセイからインデックス作成している。

!pip --quiet install faiss-cpu tiktoken

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_cohere import CohereEmbeddings
from langchain.tools.retriever import create_retriever_tool

embd = CohereEmbeddings()

urls = [
    "https://paulgraham.com/best.html",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

vectorstore = FAISS.from_documents(
    documents=doc_splits,
    embedding=embd,
)

vectorstore_retriever = vectorstore.as_retriever()

vectorstore_search = create_retriever_tool(
    retriever=vectorstore_retriever,
    name="vectorstore_search",
    description="優れたエッセイの書き方に関するポール・グレアムの情報を含むベクトルストアから、関連情報を取得する",
)

ツールの準備はOK。これらを使うReActエージェントを初期化。

from langchain.agents import AgentExecutor
from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_cohere.chat_models import ChatCohere

# LLM
llm = ChatCohere(model="command-r-plus", temperature=0.3)

# Preamble
preamble = """
優れたエッセイの書き方に関するポール・グラハムの情報を含むベクターストアから、関連情報を取得する。
"""

# Prompt template
prompt = ChatPromptTemplate.from_template("{input}")

# Create the ReAct agent
agent = create_cohere_react_agent(
   llm=llm,
   tools=[internet_search, vectorstore_search, python_tool],
   prompt=prompt,
)

agent_executor = AgentExecutor(agent=agent,
                               tools=[internet_search, vectorstore_search, python_tool],
                               verbose=True)

kun432

ではクエリ

response = agent_executor.invoke({
    "input": "ローマ帝国についてのエッセイを書きたいです。エッセイを書くコツがあれば教えてください。面白い事実があれば教えてください。",
    "preamble": preamble,
})

response['output']

出力を順に見ていく。

> Entering new AgentExecutor chain...

I will search for 'ポール・グラハム エッセイの書き方' and 'ローマ帝国 面白い事実' and write an answer based on the information I find.

まず最初にベクトル検索

{'tool_name': 'vectorstore_search', 'parameters': {'query': 'ポール・グラハム エッセイの書き方'}}
Morris, Courtenay Pipkin, and Harj Taggar for reading drafts of
this.

can end up on a local maximum. If the most valuable question is
preceded by a boring one, you'll overlook it. But I can't imagine
a better strategy. There's no lookahead except by writing. So use
a greedy algorithm and a lot of time.[6]
I ended up reattaching the first 5 of the 17 paragraphs, and
discarding the rest.[7]
Stephen Fry confessed to making use of this phenomenon when
taking exams at Oxford. He had in his head a standard essay about
some general literary topic, and he would find a way to turn the
exam question toward it and then just reproduce it again.Strictly speaking it's the graph of ideas that would be highly
(snip)

次にTavily検索。

{'tool_name': 'internet_search', 'parameters': {'query': 'ローマ帝国 面白い事実'}}
[{'url': 'https://XXXXXXXXX', 'content': 'ローマ帝国の起源は紀元前8世紀ごろ。ラテン人という民族がイタリア半島に移住してローマに王国を築いたのが始まりでした。. この時のローマは7人の王様によって統治されており戦争をしながら内政を行なっていきローマ帝国の基礎を作り上げたのです。'}, {'url': 'https://XXXXXXXXX', 'content': 'ローマ帝国とは紀元前8世紀ごろから1000年以上も繁栄した大帝国。しかし、誕生までに約700年もの歳月を費やした長い苦難の歴史がありました。ローマ帝国の歴史を一から紐解き、成り立ちから滅亡までを解説していきます。'},(snip)

ここから回答生成。引用・脚注が生成されているものわかる。

Answer: ポール・グラハムによると、エッセイを書くコツは、まずは書き始めてみること、そして何度も書き直すことです。また、エッセイは発見のためのものであるため、書き始める前にすべての内容を決めてしまうのではなく、書きながら発見していくことが大切です。

ローマ帝国についてのエッセイを書くにあたり、以下のような面白い事実を含めることができます。

- ローマ帝国は紀元前8世紀ごろから1000年以上も繁栄した大帝国です。
- 誕生までに約700年もの歳月を費やしました。
- イタリア半島に誕生した都市国家から、地中海にまたがる領域国家へと発展しました。
- 395年に東西に分裂し、西ローマ帝国は476年にゲルマン人によって滅ぼされました。
Grounded answer: ポール・グラハムによると、エッセイを書くコツは、まずは<co: 0>書き始めてみること</co: 0>、そして<co: 0>何度も書き直す</co: 0>ことです。また、エッセイは<co: 0>発見のためのものである</co: 0>ため、<co: 0>書き始める前にすべての内容を決めてしまうのではなく、書きながら発見していく</co: 0>ことが大切です。

ローマ帝国についてのエッセイを書くにあたり、以下のような面白い事実を含めることができます。

- ローマ帝国は<co: 1,2>紀元前8世紀ごろ</co: 1,2>から<co: 2,5>1000年以上も繁栄した大帝国</co: 2,5>です。
- <co: 2>誕生までに約700年もの歳月を費やした</co: 2>長い歴史があります。
- <co: 3>イタリア半島に誕生した都市国家</co: 3>から、<co: 3,5>地中海にまたがる領域国家へと発展</co: 3,5>しました。
- <co: 4>395年に東西に分裂</co: 4>し、<co: 4>西ローマ帝国は476年にゲルマン人によって滅ぼされた</co: 4>。

> Finished chain.

最終回答

ポール・グラハムによると、エッセイを書くコツは、まずは書き始めてみること、そして何度も書き直すことです。また、エッセイは発見のためのものであるため、書き始める前にすべての内容を決めてしまうのではなく、書きながら発見していくことが大切です。

ローマ帝国についてのエッセイを書くにあたり、以下のような面白い事実を含めることができます。

ローマ帝国は紀元前8世紀ごろから1000年以上も繁栄した大帝国です。

誕生までに約700年もの歳月を費やしました。\n- イタリア半島に誕生した都市国家から、地中海にまたがる領域国家へと発展しました。

395年に東西に分裂し、西ローマ帝国は476年にゲルマン人によって滅ぼされました。

この例は複数ステップである必要は余りなさそうに思える。

kun432

次のクエリ。これはマルチステップの例。

response = agent_executor.invoke({
    "input": "10と20をランダムに演算した結果を計算してください。次に、その数とその素因数に関する楽しい事実をいくつか見つけてください。回答は日本語で。",
    "preamble": preamble,
})

response['output']

> Entering new AgentExecutor chain...
WARNING:langchain_experimental.utilities.python:Python REPL can execute arbitrary code. Use with caution.

最初にPythonREPLが実行される。

I will use Python to generate a random number between 10 and 20, then I will find the prime factors of that number and search for fun facts about it.
{'tool_name': 'python_interpreter', 'parameters': {'code': 'import random\nimport math\n\nnumber = random.randint(10, 20)\n\nprint("Random number: ", number)\n\nprime_factors = math.prime_factors(number)\nprint("Prime factors: ", prime_factors)'}}
AttributeError("module 'math' has no attribute 'prime_factors'")
I received an error message, so I will rewrite the code to use the sympy library instead.
{'tool_name': 'python_interpreter', 'parameters': {'code': 'import random\nimport sympy\n\nnumber = random.randint(10, 20)\n\nprint("Random number: ", number)\n\nprime_factors = sympy.primefactors(number)\nprint("Prime factors: ", prime_factors)'}}
Random number:  16
Prime factors:  [2]

16というランダムな数字が得られたので、これを使って今度はTavily検索。

The random number is 16, so I will now search for fun facts about the number 16.
{'tool_name': 'internet_search', 'parameters': {'query': 'fun facts about the number 16'}}
[{'url': 'https://www.factscrush.com/2022/11/facts-about-number-16.html', 'content': 'Facts About Number 16. In the age-old game of chess each player has sixteen pieces. There are also sixteen pawns in the game. Sum of the first four odd numbers is sixteen.(snip)

回答を生成。

Answer: ランダムな演算の結果は**16**です。16の素因数は2です。

16にまつわる楽しい事実をいくつか紹介します。

- チェスでは、各プレイヤーが16個の駒を持ち、16個のポーンがあります。
- 16は完全性と完全な完成の象徴です。
- 16は平方数です（4 x 4 = 16）。
- 16はインドの通貨制度において、1ルピーが16アンナに相当していた1865年から1957年までの間、幸運の数字でした。
- 人間の成人は、上下の顎にそれぞれ16本の歯があり、合計32本です。
Grounded answer: ランダムな演算の結果は**<co: 1>16</co: 1>**です。16の素因数は<co: 1>2</co: 1>です。

16にまつわる楽しい事実をいくつか紹介します。

- <co: 2>チェスでは、各プレイヤーが16個の駒を持ち、16個のポーンがあります</co: 2>。
- <co: 5>16は完全性と完全な完成の象徴です</co: 5>。
- <co: 5>16は平方数です（4 x 4 = 16</co: 5>）。
- <co: 5>16はインドの通貨制度において、1ルピーが16アンナに相当していた1865年から1957年までの間、幸運の数字でした</co: 5>。
- <co: 6>人間の成人は、上下の顎にそれぞれ16本の歯があり、合計32本です</co: 6>。

最終回答

ランダムな演算の結果は16です。16の素因数は2です。
16にまつわる楽しい事実をいくつか紹介します。

チェスでは、各プレイヤーが16個の駒を持ち、16個のポーンがあります。

16は完全性と完全な完成の象徴です。

16は平方数です（4 x 4 = 16）。

16はインドの通貨制度において、1ルピーが16アンナに相当していた1865年から1957年までの間、幸運の数字でした。

人間の成人は、上下の顎にそれぞれ16本の歯があり、合計32本です。

kun432

次。今度はツールを使う必要のないクエリ

response = agent_executor.invoke({
    "input": "こんにちは！調子はどう？",
    "preamble": preamble,
})

response['output']

> Entering new AgentExecutor chain...

I will respond to the user's greeting.
{'tool_name': 'directly_answer', 'parameters': {}}
directly_answer is not a valid tool, try one of [internet_search, vectorstore_search, python_interpreter].Relevant Documents: None
Cited Documents: None
Answer: こんにちは！私はAIアシスタントです。調子は良いです。あなたはどうですか？
Grounded answer: こんにちは！私はAIアシスタントです。調子は良いです。あなたはどうですか？

> Finished chain.

こんにちは！私はAIアシスタントです。調子は良いです。あなたはどうですか？

kun432

Tavily検索をシーケンシャルに実行していく例。

response = agent_executor.invoke({
    "input": "設立当初'Sound of Music'という社名だった会社は、何年に上場しましたか？あと、その会社の2000年と2010年の株価は？回答は日本語で。",
    "preamble": preamble,
})

response['output']

> Entering new AgentExecutor chain...

I will search for the company 'Sound of Music' and its listing date, and then search for its stock price in 2000 and 2010.
{'tool_name': 'internet_search', 'parameters': {'query': "company 'Sound of Music' listing date"}}
[{'url': 'https://fortune.com/2015/03/02/sound-of-music-50-years/', 'content': 'The musical, starring Julie Andrews, turned 50 years old Monday. (snip)
I could not find any information about a company called 'Sound of Music'. I will now search for a company with a similar name.

{'tool_name': 'internet_search', 'parameters': {'query': "company 'Sound of Music' similar name listing date"}}
[{'url': 'https://www.soocial.com/audio-business-names/', 'content': 'Here are some of the coolest audio business name ideas that can help your brand become instantly recognizable and memorable. Audio Boom. (snip)
I found that Best Buy Co., Inc. was originally called Sound of Music. I will now search for its listing date and stock price in 2000 and 2010.

{'tool_name': 'internet_search', 'parameters': {'query': 'Best Buy Co., Inc. listing date'}}
[{'url': 'https://corporate.bestbuy.com/best-buy-reports-q4-fy23-results/', 'content': 'Among the factors that could cause actual results and outcomes to differ materially from those contained in such forward-looking statements are the following:(snip)

{'tool_name': 'internet_search', 'parameters': {'query': 'Best Buy Co., Inc. stock price 2000'}}
[{'url': 'https://investors.bestbuy.com/investor-relations/stock-info/quote-and-chart/', 'content': "Stock Quote: NYSE ; Intraday High 80.85. 52 Week High 86.11 ; Intraday Low 78.37. 52 Week Low 62.30 ; Today's Open 80.33. Previous Close 79.31."}, {'url': 'https://companiesmarketcap.com/best-buy/stock-price-history/', 'content': 'Stock price history for Best Buy (BBY). Highest end of day price: $138.00 USD on 2021-11-22. Lowest end of day price: $0.14 USD on 1985-05-02\xa0...'}, {'url': 'https://www.macrotrends.net/stocks/charts/BBY/best-buy/stock-price-history', 'content': 'Historical daily share price chart and data for Best Buy since 1985 adjusted for splits and dividends. The latest closing stock price for Best Buy as of\xa0...'}, {'url': 'https://www.netcials.com/stock-price-chart-history-nyse/BBY-Best-Buy-Co-Inc/', 'content': '1 Best Buy Co Inc (BBY) 20 Years Stock Chart History ; 2000, 16.24 (17.6%), 23.199 (29.07%) ; 2001, 14.80 (-8.87%), 20.1319 (-13.22%) ; 2002, 14.59 (-1.42%)\xa0...'}, {'url': 'https://finance.yahoo.com/quote/BBY/history/', 'content': 'Discover historical prices for BBY stock on Yahoo Finance. View daily, weekly or monthly format back to when Best Buy Co., Inc. stock was issued.'}]

{'tool_name': 'internet_search', 'parameters': {'query': 'Best Buy Co., Inc. stock price 2010'}}
[{'url': 'https://finance.yahoo.com/quote/BBY/history', 'content': 'Discover historical prices for BBY stock on Yahoo Finance. View daily, wee(snip)

Cited Documents: 9,18,20,21,22,23,24
Answer: 設立当初『サウンド・オブ・ミュージック』という社名だった会社は、ベスト・バイ・カンパニーです。ベスト・バイは1985年5月2日に上場しました。2000年の株価は16.24ドル、2010年の株価は17.59ドルです。
Grounded answer: 設立当初『サウンド・オブ・ミュージック』という社名だった会社は、<co: 9>ベスト・バイ・カンパニー</co: 9>です。ベスト・バイは<co: 18,20>1985年5月2日</co: 18,20>に上場しました。2000年の株価は<co: 18>16.24ドル</co: 18>、2010年の株価は<co: 21,22,23,24>17.59ドル</co: 21,22,23,24>です。

> Finished chain.

設立当初『サウンド・オブ・ミュージック』という社名だった会社は、ベスト・バイ・カンパニーです。ベスト・バイは1985年5月2日に上場しました。2000年の株価は16.24ドル、2010年の株価は17.59ドルです。

kun432

会話の流れでエージェント検索が行われる例。

from langchain_core.messages import HumanMessage, AIMessage

chat_history = [
    HumanMessage(content="CRMをオラクルに切り替えることを検討しています。"),
    AIMessage(content="いい考えだと思うよ！なにかお手伝いできることはありますか？"),
    HumanMessage(content="彼らのサービス提供内容について、見つけることができるすべての情報を私に報告して。回答は日本語で。"),
]

prompt = ChatPromptTemplate.from_messages(chat_history)

agent = create_cohere_react_agent(
    llm=llm,
    tools=[internet_search, vectorstore_search, python_tool],
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=agent, tools=[internet_search, vectorstore_search, python_tool], verbose=True)

response = agent_executor.invoke({
    "preamble": preamble,
})

response['output']

> Entering new AgentExecutor chain...

I will search for 'Oracle CRM services' and relay the information I find to the user in Japanese.
{'tool_name': 'internet_search', 'parameters': {'query': 'Oracle CRM services'}}
[{'url': 'https://www.oracle.com/cx/service/', 'content': 'Oracle Service is a unified platform of apps, data, and capabilities enabling effortless self-service, agent-assisted service, and field service workflows tailored to your industry and use case. (snip)

Cited Documents: 0,1,2,3,4
Answer: オラクルのCRMサービスは、顧客データの収集、リンク、分析を行います。これには、連絡先情報、会社代表者とのやり取り、購入、サービスリクエスト、資産、見積もり/提案が含まれます。オラクルは、業界やユースケースに合わせて調整されたセルフサービス、エージェント支援サービス、フィールドサービスワークフローを可能にするアプリ、データ、機能の統合プラットフォームを提供しています。オラクルのCRMは、製薬セールス向けにパーソナライズされたコンテンツ配信ツールを提供し、マーケティング、セールス、カスタマーサービスのアプリケーションを完全に統合して、ブランドオーナー、パートナー、顧客、エンドコンシューマー間の複雑なやり取りと関係を管理します。オラクルは、データを活用してCXを改革し、オファーを立ち上げ、顧客を獲得・維持し、オムニチャネルeコマースとカスタマーケアを提供し、サービスを履行・収益化します。オラクルのCRMは、AI強化されたガイド付き販売により、セールスチームのスピードと効率を向上させ、統合されたクリーンで正確なファーストパーティおよびサードパーティの顧客データを通じて、顧客の360度ビューを提供します。オラクル クラウド CX は、マーケティング、セールス、サービス、バックオフィス アプリケーション全体で顧客の行動、取引、人口統計データを接続し、データ ファースト アプローチを採用しています。
Grounded answer: オラクルのCRMサービスは、<co: 2>顧客データの収集、リンク、分析</co: 2>を行います。これには、<co: 2>連絡先情報、会社代表者とのやり取り、購入、サービスリクエスト、資産、見積もり/提案</co: 2>が含まれます。オラクルは、<co: 0>業界やユースケースに合わせて調整された</co: 0><co: 0>セルフサービス、エージェント支援サービス、フィールドサービスワークフローを可能にするアプリ、データ、機能の統合プラットフォーム</co: 0>を提供しています。オラクルのCRMは、<co: 1>製薬セールス向けにパーソナライズされたコンテンツ配信ツールを提供し、マーケティング、セールス、カスタマーサービスのアプリケーションを完全に統合</co: 1>して、<co: 1>ブランドオーナー、パートナー、顧客、エンドコンシューマー間の複雑なやり取りと関係を管理</co: 1>します。オラクルは、<co: 1>データを活用してCXを改革し、オファーを立ち上げ、顧客を獲得・維持し、オムニチャネルeコマースとカスタマーケアを提供し、サービスを履行・収益化</co: 1>します。オラクルのCRMは、<co: 3>AI強化されたガイド付き販売により、セールスチームのスピードと効率を向上させ</co: 3>、<co: 3>統合されたクリーンで正確なファーストパーティおよびサードパーティの顧客データを通じて、顧客の360度ビューを提供</co: 3>します。オラクル クラウド CX は、<co: 4>マーケティング、セールス、サービス、バックオフィス アプリケーション全体で顧客の行動、取引、人口統計データを接続し、データ ファースト アプローチを採用</co: 4>しています。

> Finished chain.

オラクルのCRMサービスは、顧客データの収集、リンク、分析を行います。これには、連絡先情報、会社代表者とのやり取り、購入、サービスリクエスト、資産、見積もり/提案が含まれます。オラクルは、業界やユースケースに合わせて調整されたセルフサービス、エージェント支援サービス、フィールドサービスワークフローを可能にするアプリ、データ、機能の統合プラットフォームを提供しています。オラクルのCRMは、製薬セールス向けにパーソナライズされたコンテンツ配信ツールを提供し、マーケティング、セールス、カスタマーサービスのアプリケーションを完全に統合して、ブランドオーナー、パートナー、顧客、エンドコンシューマー間の複雑なやり取りと関係を管理します。オラクルは、データを活用してCXを改革し、オファーを立ち上げ、顧客を獲得・維持し、オムニチャネルeコマースとカスタマーケアを提供し、サービスを履行・収益化します。オラクルのCRMは、AI強化されたガイド付き販売により、セールスチームのスピードと効率を向上させ、統合されたクリーンで正確なファーストパーティおよびサードパーティの顧客データを通じて、顧客の360度ビューを提供します。オラクルクラウド CX は、マーケティング、セールス、サービス、バックオフィスアプリケーション全体で顧客の行動、取引、人口統計データを接続し、データファーストアプローチを採用しています。

kun432

とりあえずはmulti-stepも、LangChain使えば期待通りには動いているのかな？ただ、できればネイティブのPythonクライアントで書いてみたいんだけど、全然方法がわからない。

kun432

なお、現時点ではCohereのPython SDKでマルチステップなtool callの実装例は、ドキュメントにも記載はなく、いろいろ試してみたけどもうまくいかない（1ターンで終わってしまう）。

ただ、できればネイティブのPythonクライアントで書いてみたいんだけど、全然方法がわからない。

一応試したことを以下にまとめておく。

こういうデータベースを用意した。

product_list = {
    'スマートフォン': 'E1001',
    'ノートパソコン': 'E1002',
    'タブレット': 'E1003',
    'Tシャツ': 'C1001',
    'ジーンズ':'C1002',
    'ジャケット': 'C1003',
}

product_catalog = {
    'E1001': {'price': 500, 'stock_level': 20},
    'E1002': { 'price': 1000, 'stock_level': 15},
    'E1003': {'price': 300, 'stock_level': 25},
    'C1001': {'price': 20, 'stock_level': 100},
    'C1002': {'price': 50, 'stock_level': 80},
    'C1003': {'price': 100, 'stock_level': 40},
}

これらにアクセスする関数を2つ用意。

商品名から商品IDを取得
商品IDから商品情報を取得

def get_product_id_from_product_name(product_name):
    return {"product_name": product_name, "product_id": product_list[product_name]}

def get_product_info_from_product_id(product_id):
    return {"product_id": product_id, "product_info": product_catalog[product_id]}

functions_map = {
    "get_product_id_from_product_name": get_product_id_from_product_name,
    "get_product_info_from_product_id": get_product_info_from_product_id,
}

実行してみる。

print(get_product_id_from_product_name("タブレット"))
print(get_product_info_from_product_id("E1003"))

{'product_name': 'タブレット', 'product_id': 'E1003'}
{'product_id': 'E1003', 'product_info': {'price': 300, 'stock_level': 25}}

要は「タブレットの値段」を知りたければ、最初に商品IDを取得、その取得した商品IDで商品情報を取得するというマルチステップを踏まないといけないという流れ。

ではこれをツールとして定義する。

tools = [
    {
        "name": "get_product_id_from_product_name",
        "description": "「商品名」（product_name）から「商品ID」(product_id)を取得する。",
        "parameter_definitions": {
            "product_name": {
                "description": "「商品ID」(product_id)を取得するための「商品名」を指定する。「商品名」は一般名詞で指定する必要がある。ex. タブレット、ジャケット、等",
                "type": "str",
                "required": True
            }
        }
    },
    {
        "name": "get_product_info_from_product_id",
        "description": "「商品ID」(product_id)から「商品情報（価格、在庫）」（product_info）を取得する。",
        "parameter_definitions": {
            "product_id": {
                "description": "「商品情報（価格、在庫）」（product_info）を取得するための「商品ID」を指定する。「商品ID」はアルファベット1文字＋数字3文字で指定すること。それ以外の形式ではエラーになる。ex. X0002、等。",
                "type": "str",
                "required": True
            },
        }
    }
]

クエリしてみる。

# preamble containing instructions about the task and the desired style for the output.
preamble = """
## 指示とコンテキスト

あなたは、人々の質問やその他のリクエストにインタラクティブに答える手助けをします。
あなたは、あらゆる種類のトピックに関する非常に幅広い要求を尋ねられるでしょう。
幅広い検索エンジンや類似のツールが用意されており、それらを使って答えを調べます。

ユーザーのニーズにできる限り応えることに集中する必要があります。

## 回答のスタイル

ユーザーから別の回答スタイルを要求されない限り、適切な文法とスペルを使い、完全な文章で回答する必要があります。
"""

# user request
message = "タブレットの在庫を教えて。"

first_response = co.chat(
    message=message,
    tools=tools,
    preamble=preamble,
    model="command-r"
)

print("\n".join(str(tool_call) for tool_call in first_response.tool_calls))

結果

name='get_product_id_from_product_name' parameters={'product_name': 'タブレット'}
name='get_product_info_from_product_id' parameters={'product_id': ''}

他にも

name='get_product_id_from_product_name' parameters={'product_name': 'タブレット'}
name='get_product_info_from_product_id' parameters={'product_id': 'X0002'}

name='get_product_id_from_product_name' parameters={'product_name': 'タブレット'}
name='get_product_info_from_product_id' parameters={'product_id': '商品ID'}

みたいな感じで、安定しない。もしかするとプロンプトで改善するかもしれないが。

あと、例えば、

name='get_product_id_from_product_name' parameters={'product_name': 'タブレット'}

だけが返ってきた場合。この場合、これだけでは完結しないので後続のやり取りが必要になる。

ツールを実行して結果を取得する。

import json

tool_results = []

for tool_call in first_response.tool_calls:
    print(f"= ツール {tool_call.name} を次のパラメータで実行: {tool_call.parameters}")
    output = functions_map[tool_call.name](**tool_call.parameters)
    outputs = [output]
    print(f"== ツール実行結果: {outputs}")
    tool_results.append({
        "call": tool_call.dict(),
        "outputs": outputs
    })

print("次のステップでモデルに返されるツール実行結果:")
print(json.dumps(tool_results, indent=4, ensure_ascii=False))

結果。

= ツール get_product_id_from_product_name を次のパラメータで実行: {'product_name': 'タブレット'}
== ツール実行結果: [{'product_name': 'タブレット', 'product_id': 'E1003'}]
次のステップでモデルに返されるツール実行結果:
[
    {
        "call": {
            "name": "get_product_id_from_product_name",
            "parameters": {
                "product_name": "タブレット"
            }
        },
        "outputs": [
            {
                "product_name": "タブレット",
                "product_id": "E1003"
            }
        ]
    }
]

結果を再度モデルに送る。

second_response = co.chat(
   tools=tools,
   message=message,
   preamble=preamble,
   model="command-r",
   tool_results=tool_results,
)

print(second_response)

回答

text='タブレットの在庫を調べてみたところ、E1003 の商品番号が割り当てられているようです。'
generation_id='4218935a-fe41-49fa-832b-5837c99dd449'
citations=[ChatCitation(start=18, end=23, text='E1003', document_ids=['get_product_id_from_product_name:0:0'])] documents=[{'id': 'get_product_id_from_product_name:0:0', 'product_id': 'E1003', 'product_name': 'タブレット', 'tool_name': 'get_product_id_from_product_name'}]
is_search_required=None
search_queries=None search_results=None
finish_reason='COMPLETE'
tool_calls=None
chat_history=[ChatMessage(role='USER', message='タブレットの在庫を調べて。'), ChatMessage(role='CHATBOT', message='タブレットの在庫を調べてみたところ、E1003 の商品番号が割り当てられているようです。')]
response_id='0223569c-a409-40a4-b02f-d1b4af06d790'
meta={'api_version': {'version': '1'},
'billed_units': {'input_tokens': 188, 'output_tokens': 27},
'tokens': {'input_tokens': 809, 'output_tokens': 27}}

必要なツールをマルチステップで実行できていないので当然回答が正しくない。

CohereのPythonクライアントのパラメータを見る限り、ツール実行結果はtool_resultを使って渡すようになっている。OpenAIやAnthropic等ではこの辺を会話履歴の中に含める形になるようなのだけど、Cohereでは曹ではない様子。で、結果的に上記のようにtool_callsの処理が終わってしまうと次につなげれなくなってしまう。

つまり、マルチステップでやりとりするようなインタフェースが見当たらないのではないかと思っている。

LangChainの例はエージェント使うことでツールの結果を受け渡して次の処理を考えさせているんだと思う。つまり、クライアント側で組み立ててやってるのだと思う。LangChainmみたいなものを使うならまあいいんだろうけど、LangChain使わないようなケースだとこの部分を自分で制御しないといけなくなるというわけ。

kun432

んー、nootbook更新されてるけど、LangChainのエージェントなしで果たしてマルチステップできるのかな？

kun432

試してみたけどやっぱりダメだった。やっぱり以下のような、前回のツール結果を渡して次のツール推論を受け取るような「中間ステップ」をインタフェース的にどう渡せばいいのかがわからない。

前提
- ツールAとBがある
- ツールBはツールAの結果を引数とする（マルチステップで推論する必要がある）
流れ
- モデルにツールA/B＋クエリを渡す
- モデルはツールAを推奨、そのパラメータを返す
- ツールAを実行して結果をモデルに返すのだけど・・・・
  - tool_resultsで返すと、次のツール推論は行われず最終回答が返る
  - SDKやAPIを見る限り、ツールの実行結果を会話履歴に含めるのは非推奨な模様
- マルチステップにならずに、ハルシネーションされた回答で終わる

LangChainのエージェントを使う場合、エージェントがそのステップを管理してくれるので実現できるのはある意味当然で、OpenAIやAnthropicのFunction Callingは、LangChainなしでもモデルとのやり取りの中でこのステップがちゃんと管理されている（ように見える）ので、これらに比べると正直な印象としては現状CohereのFunction Callingはマルチステップに対応している、とは言えないのではないかと思う。

これがモデルの問題なのかインタフェースの問題なのかはわからない。

とりあえずissueあげておいた

kun432

近々出てくるっぽい。楽しみ。

It was first introduced into langchain as an experimental mode, instructions for how to implement with the API&SDK coming soon

このスクラップは2024/04/06にクローズされました