Closed3ヶ月前にクローズ5

推論能力を強化するOpenAI API互換プロキシ「optillm」を試す

proxy

OpenAI

LLM

optillm

kun432

https://github.com/codelion/optillm

 optillmoptillmは、LLMの精度とパフォーマンスを向上させることができるいくつかの最先端の技術を実装した、OpenAI API互換の最適化推論プロキシです。現在の焦点は、コーディング、論理、数学的なクエリを越えた推論を向上させる技術の実装にあります。推論時に追加の計算を行うことで、これらの技術をさまざまなタスクに適用し、フロンティアモデルを上回ることも可能です。
optillmは透過プロキシであり、OpenAI APIと互換性のあるチャット補完エンドポイントを持つLLM APIまたはプロバイダーであれば、どのようなものでも動作します。また、optillmは同じくOpenAI APIと互換性のあるチャット補完エンドポイントを公開しています。これにより、既存のツールやフレームワークに簡単に統合することができます。使用したいLLMがOpenAI API対応エンドポイントを持たない場合(GoogleやAnthropicなど)、ほとんどのLLMをサポートするLiteLLMプロキシサーバーを使用することができます。
以下のシーケンス図は、リクエストとレスポンスがoptillmを経由する様子を示しています。


optillmの使用を示すシーケンス図
図中:
Aは、optillmの結果を使用したい既存のツール(oobaboogaなど)、フレームワーク(patchworkなど)、または独自のコードです。OpenAIクライアントSDKを使用して、直接使用することができます。
Bは、optillmサービス(直接実行またはDockerコンテナ内で実行)で、base_urlにリクエストを送信します。
Cは、OpenAI APIと互換性のあるチャット補完エンドポイントを提供する任意のサービスです。


 実装されているテクニック


技術
識別子
説明


Agent
agent
以下のアプローチのどれを取るかを決定し、結果を組み合わせる

Monte Carlo Tree Search
mcts
チャットレスポンスの意思決定にMCTSを使用

Best of N Sampling
bon
複数の応答を生成し、最良のものを選択

Mixture of Agents
moa
複数の批評からの応答を組み合わせる

Round Trip Optimization
rto
ラウンドトリップ処理を通じて応答を最適化

Z3 Solver
z3
論理的推論にZ3定理証明器を利用

Self-Consistency
self_consistency
高度な自己一貫性メソッドを実装

PV Game
pvg
推論時に証明者-検証者ゲームアプローチを適用

R* Algorithm
rstar
問題解決のためにR*アルゴリズムを実装

CoT with Reflection
cot_reflection
<thinking>、<reflection>、<output>セクションを用いた思考の連鎖推論を実装

PlanSearch
plansearch
自然言語での問題解決のための候補プランに対する探索アルゴリズムを実装

LEAP
leap
少数の例から任務特有の原則を学習

技術	識別子	説明
Agent	`agent`	以下のアプローチのどれを取るかを決定し、結果を組み合わせる
Monte Carlo Tree Search	`mcts`	チャットレスポンスの意思決定にMCTSを使用
Best of N Sampling	`bon`	複数の応答を生成し、最良のものを選択
Mixture of Agents	`moa`	複数の批評からの応答を組み合わせる
Round Trip Optimization	`rto`	ラウンドトリップ処理を通じて応答を最適化
Z3 Solver	`z3`	論理的推論にZ3定理証明器を利用
Self-Consistency	`self_consistency`	高度な自己一貫性メソッドを実装
PV Game	`pvg`	推論時に証明者-検証者ゲームアプローチを適用
R* Algorithm	`rstar`	問題解決のためにR*アルゴリズムを実装
CoT with Reflection	`cot_reflection`	<thinking>、<reflection>、<output>セクションを用いた思考の連鎖推論を実装
PlanSearch	`plansearch`	自然言語での問題解決のための候補プランに対する探索アルゴリズムを実装
LEAP	`leap`	少数の例から任務特有の原則を学習

kun432

READMEに従ってインストール。今回はローカルのMac上で。

レポジトリクローン

$ git clone https://github.com/codelion/optillm.git && cd optillm

Python仮想環境を作成

$ python -m venv .venv
$ source .venv/bin/activate

パッケージインストール

$ pip install -r requirements.txt

OpenAIのAPIキーを環境変数にセットする。

$ export OPENAI_API_KEY="*******************"

optillmのプロキシサーバを起動

$ python optillm.py

2024-09-18 20:00:48,547 - INFO - Starting server with approach: auto
2024-09-18 20:00:48,547 - INFO - Server configuration: {'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'n': 1, 'base_url': '', 'api_key': '', 'return_full_response': False, 'port': 8000, 'simulations': 2, 'exploration': 0.2, 'depth': 1}
 * Serving Flask app 'optillm'
 * Debug mode: off
2024-09-18 20:00:48,636 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8000
 * Running on http://192.168.***.***:8000
2024-09-18 20:00:48,636 - INFO - Press CTRL+C to quit

プロキシは8000番ポートで起動し、デフォルトだとgpt-4o-miniが使用される様子。

別ターミナルを開いて同じディレクトリに移動し（venvのactivateもお忘れなく）、サンプルのスクリプトを２つ作成する。

$ mkdir work && cd work
$ touch sample_openai.py
$ touch sample_optillm.py

こちらはoptillmに関係なく、普通にOpenAI APIに問い合わせるスクリプト。比較用に作成している。題材としてアリス問題を問い合わせるものにしてある。

work/sample_openai.py

import os
import json
from openai import OpenAI

OPENAI_KEY = os.environ.get("OPENAI_API_KEY")
client = OpenAI(api_key=OPENAI_KEY)

query = """
アリスには4人の兄弟と1人の姉妹がいます。アリスの兄弟には何人の姉妹がいますか？
"""

response = client.chat.completions.create(
  model="gpt-4o-mini-2024-07-18",
  messages=[
    { "role": "user", "content": query }
  ],
  temperature=0
)

print(response.choices[0].message.content)

で、こちらがoptillmを使うスクリプト。OPENAI_BASE_URLを書き換えてoptillmに向けているのと、モデル名の前に使用したい推論テクニックを指定しているところが違い。今回は複数のテクニックを組み合わせるagentにしている。

work/sample_optillm.py

import os
import json
from openai import OpenAI

OPENAI_KEY = os.environ.get("OPENAI_API_KEY")
# BASE_URLをoptillmのプロキシに向ける
OPENAI_BASE_URL = "http://localhost:8000/v1"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

query = """
アリスには4人の兄弟と1人の姉妹がいます。アリスの兄弟には何人の姉妹がいますか？
"""

response = client.chat.completions.create(
  # モデル名の前に識別子を付与
  model="agent-gpt-4o-mini-2024-07-18",
  messages=[
    { "role": "user", "content": query }
  ],
  temperature=0
)

print(response.choices[0].message.content)

ではそれぞれ実行してみる。別ターミナルにしているので、OpenAI APIキーを環境変数にセットするのをお忘れなく。

$ export OPENAI_API_KEY="*******************"

まずOpenAI APIをそのまま使うパターン。

$ time python work/sample_openai.py

アリスには4人の兄弟と1人の姉妹がいます。アリス自身が1人の姉妹であるため、アリスの兄弟にはアリスを含めて2人の姉妹がいます。したがって、アリスの兄弟には1人の姉妹がいます。

real	0m1.544s
user	0m0.251s
sys	0m0.049s

推論過程はあっているのに、最終回答だけが間違っている。

次にoptillmを使うパターン。

$ time python work/sample_optillm.py

アリスの兄弟には2人の姉妹がいます。アリス自身が1人の姉妹であり、もう1人の姉妹が存在します。

real	0m23.549s
user	0m0.248s
sys	0m0.049s

時間がかかっているが、こちらは正解。

ログを見てみると、z3、self_consistency、 cot_reflectionが使用しており、最後にそれらの結果を解析した結果として最終判断が行われているのがわかる。

2024-09-19 11:48:35,206 - INFO - Received request to /v1/chat/completions
2024-09-19 11:48:35,206 - INFO - Using approach agent, with gpt-4o-mini-2024-07-18
2024-09-19 11:48:35,206 - INFO - Attempt 1/3
2024-09-19 11:48:38,989 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:38,991 - INFO - Suggested approach: z3
2024-09-19 11:48:38,991 - INFO - Explanation: The Z3 Solver is ideal for logical reasoning and constraint satisfaction problems. In this task, we need to analyze the relationships between siblings to determine how many sisters Alice's brothers have. Using logical reasoning to deduce the relationships can effectively solve this problem.
2024-09-19 11:48:38,991 - INFO - Suggested approach: self_consistency
2024-09-19 11:48:38,991 - INFO - Explanation: The Self-Consistency approach is beneficial for tasks that require coherent and consistent reasoning across multiple steps. This task involves understanding familial relationships, and ensuring that the reasoning is consistent throughout the analysis of Alice's siblings will lead to an accurate conclusion.
2024-09-19 11:48:38,991 - INFO - Suggested approach: cot_reflection
2024-09-19 11:48:38,991 - INFO - Explanation: The Chain of Thought with Reflection approach is useful for complex reasoning tasks that benefit from step-by-step thinking. This task requires breaking down the relationships and reflecting on the implications of Alice having four brothers and one sister, making this approach suitable for arriving at the correct answer.
2024-09-19 11:48:38,991 - INFO - Selected approaches: ['z3', 'self_consistency', 'cot_reflection']
2024-09-19 11:48:43,700 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:44,929 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:44,931 - INFO - Response from z3: アリスには4人の兄弟と1人の姉妹がいます。アリス自身が姉妹の1人なので、アリスの兄弟にはアリスを含めて2人の姉妹がいます。したがって、アリスの兄弟には1人の姉妹がいます。...
2024-09-19 11:48:45,849 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:48,377 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:49,945 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:51,038 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:52,198 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:52,205 - INFO - Advanced Self-Consistency Results:
2024-09-19 11:48:52,205 - INFO - Total responses: 5
2024-09-19 11:48:52,205 - INFO - Number of unique clusters: 5
2024-09-19 11:48:52,205 - INFO - Response from self_consistency: アリスには4人の兄弟と1人の姉妹がいるので、アリスの兄弟にとっては、アリス自身が1人の妹になります。したがって、アリスの兄弟には1人の姉妹がいます。...
2024-09-19 11:48:54,352 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:54,355 - INFO - CoT with Reflection :
<thinking>
アリスは4人の兄弟と1人の姉妹を持っています。アリスの兄弟の視点から見て、彼らはアリスを「姉妹」として認識します。したがって、アリスの兄弟はアリス1人の姉妹を持っていることになります。アリス自身が兄弟に対しては、アリスが1人の姉妹であることを考慮する必要がありますが、アリスの兄弟の姉妹の人数はアリスだけです。

したがって、アリスの兄弟には1人の姉妹がいます。
</thinking>
<reflection>
思考プロセスは正しいです。アリスの兄弟は、アリスを含めて1人の姉妹を持っているため、結論は正確です。特に誤りは見当たりません。
</reflection>
<output>
アリスの兄弟には1人の姉妹がいます。
</output>
2024-09-19 11:48:54,355 - INFO - Final output :
アリスの兄弟には1人の姉妹がいます。
2024-09-19 11:48:54,355 - INFO - Response from cot_reflection: アリスの兄弟には1人の姉妹がいます。...
2024-09-19 11:48:58,448 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 11:48:58,453 - INFO - Reflection completed successfully
2024-09-19 11:48:58,453 - INFO - Final response generated
2024-09-19 11:48:58,453 - INFO - Analysis: {'z3': "This response correctly identifies that Alice has 4 brothers and 1 sister. It also states that Alice herself is 1 sister, which is a bit confusing in the context. The conclusion that Alice's brothers have 2 sisters is incorrect as it does not account for Alice being one of the sisters. Overall, while it provides some correct information, it ultimately leads to an incorrect conclusion.", 'self_consistency': "This response is similar to the z3 response in that it correctly identifies the number of brothers and sisters Alice has. It states that Alice herself is 1 sister, which is accurate. However, it also incorrectly concludes that Alice's brothers have 1 sister, which is misleading since they have 2 sisters (Alice and her one sister). The response is consistent but ultimately incorrect.", 'cot_reflection': "This response is the most straightforward and correctly states that Alice's brothers have 1 sister, which is Alice herself. However, it fails to mention the existence of Alice's other sister, which is a significant oversight. While it is concise, it lacks completeness."}
2024-09-19 11:48:58,453 - INFO - Explanation: I arrived at the final response by synthesizing the correct elements from the various approaches. I recognized that while some responses correctly identified the number of siblings, they failed to account for both of Alice's sisters. The final response clarifies that Alice has 2 sisters in total, which includes herself and another sister, thus providing a complete and accurate answer to the user's query.
2024-09-19 11:48:58,454 - INFO - 127.0.0.1 - - [19/Sep/2024 11:48:58] "POST /v1/chat/completions HTTP/1.1" 200 -

なお、ここはちょっと恣意的な記載をしていて、optillmでもアリス問題を必ず解けるようになるというわけではなくて、実際に成功した確率はかなり低かった。自分がテストで繰り返した肌感からはこんな感じ。

OpenAI（gpt-4o-mini）だと20回やっても1度も成功しない
optillm（gpt-4o-mini+agent）だと20回で1回成功するかどうか

このあたりはクエリの内容やユースケースにもよっても変わってくるし、使用する推論テクニックによっても変わってくる（agentの場合だと、ざっと見た感じ、ほとんどz3、self_consistency、cot_reflectionの組み合わせになっていたた）と思う。

また、上記のようなワンショットの結果だけだと評価できないと思うので、何かしらのベンチマーク・評価手法を使う、もしくは自分のユースケースに組み込んでみて、というところで評価するのが良いと思う。

なお、READMEにはoptillmを使用した場合の評価結果が記載されているので、そちらも参考に。

kun432

Ollamaで使う

optillmは透過プロキシであり、OpenAI APIと互換性のあるチャット補完エンドポイントを持つLLM APIまたはプロバイダーであれば、どのようなものでも動作します。

とあるので、Ollamaでも使えるし、LiteLLMあたりを入れればなんでもいける（プロキシにプロキシを重ねる感じになるけども）

Ollamaの場合だと、こんな感じでoptillmのプロキシ先をOllamaのエンドポイントにして起動。optillmプロキシの起動時に使用可能な引数はここを参考に。

$ python optillm.py --base-url http://localhost:11434/v1

スクリプトからはこんな感じで。BASE_URLの設定が複数出てきてちょっと混乱しそうになるけども。

import os
import json
from openai import OpenAI

OPENAI_KEY = "ollama"
OPENAI_BASE_URL = "http://localhost:8000/v1"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

query = """
アリスには4人の兄弟と1人の姉妹がいます。アリスの兄弟には何人の姉妹がいますか？
"""

response = client.chat.completions.create(
  model="cot_reflection-llama3.1",
  messages= [
    { "role": "user", "content": query }
  ],
  temperature=0
)

print(response.choices[0].message.content)

<thinking>
まず、アリèsに4人がいるので、この内の1人があねでするということで、もう1人はきょうだいいのいくらいるぞ。 brothers(つづらを) に Brothers, Sister が3 つなら sisters には1 と sisters を 1 - Brothers.count のように解くはずだから、

  brothers =3

 sisse  =5 - brothes = brothers - Count = count sisters は、3- brothers です。
</thinking>
<reflection>
ここでは正しくないのは何かなので、
</reflection>

<outputs>
 3人の姉妹です
</output>

回答の質はちょっと置いておくとして、以下のIssueを見ている限りは、うまく動かない推論テクニックもある様子。自分もいろいろ試してみたけども、失敗するものがちらほらあった。

あと、ollamaのモデル名がユーザ名/モデル名のパターンだとうまくいかなかった。

kun432

Dockerでもoptillmは実行できる。こちらのほうが使い勝手はいいかも。

kun432

 まとめメリット
いろんな推論テクニックを採用できる
プロキシとして入れることができるので、既存のプロンプトやコードを書き換えなくて済む
推論テクニックの追加・拡張もできそうに思える
デメリット
推論にかかるレスポンスが遅くなる（それは当然）
推論にかかるコストが増える（それは当然）
ストリーミングには現時点で非対応
ちょっとデバッグが大変かも？
プロキシとしてはLiteLLMがすでにあるけれども、あちらは複数のモデルのAPIレイヤーの抽象化。プロンプトも含めた推論レイヤーの抽象化は一般的にはフレームワークがやるところではあるけど、プロキシとして実装することでいろいろメリットが出てくるというのが興味深い。こういうプロキシの使い方は他にも色々応用ができそう。
実運用で使うにはまだ足りないけど、今後に期待したい。

このスクラップは3ヶ月前にクローズされました