Langchain.rbを読む

前提

レポジトリ: https://github.com/andreibondarev/langchainrb
バージョン: https://github.com/andreibondarev/langchainrb/releases/tag/0.9.0

rerost

出したい結論

概念の理解と役割の対応付け
実際何ができるのか？

rerost

試す

READMEを一通り触る https://github.com/andreibondarev/langchainrb/blob/0.9.0/README.md
能力を確認する
- ChatGPTでできること https://platform.openai.com/docs/guides/text-generation => このあたりは基本非対応。あまりにもChatGPTの変化が早すぎる
  - Function Calling
  - GPTs(これはChatGPTのWeb版特有すぎるかもだが)
  - Custom instructions => 対応
  - Chat Completions response format(?) https://platform.openai.com/docs/guides/text-generation/chat-completions-response-format
  - JSON Mode https://platform.openai.com/docs/guides/text-generation/json-mode => なさげ。自前対応
  - シード値の固定 https://platform.openai.com/docs/guides/text-generation/reproducible-outputs
  - 画像を認識させるやつ
  - Moderation https://platform.openai.com/docs/guides/moderation
    - OpenAIから2024/1/25にでたやつ https://openai.com/blog/new-embedding-models-and-api-updates
    - 多分調教のために作った人力データとかが入っていて有益なのではってところ
    - 他で対応していないのでわからないかもだが
- その他ライブラリに求めたいこと
  - テンプレート機能
  - RAG => 対応してそう

rerost

要約

irb(main):009:1" llm.summarize(text: "WantedlyではGoやRails、Python書いてマイクロサービスを作成しています。
irb(main):010:1" データサイエンティストと一緒にユーザーに価値を届けることをしています。
irb(main):011:1" 実際に行っていることとしては次で、
irb(main):012:1" - 検索に関するロジックの改善・整理
irb(main):013:1" - 作成した推薦モデルを本番環境に安全に出せる環境を作る
irb(main):014:1" - 推薦モデルを作成できるように、ログの充実やドメイン知識の理解
irb(main):015:1"
irb(main):016:1" などをやっています。").completion
=> "Wantedly is creating microservices using Go, Rails, and Python. They work with data scientists to deliver value to users. They focus on improving and organizing search logic, creating a safe environment for deploying recommendation models, and enhancing logs and domain knowledge to enable the creation of recommendation models."

あんまいい例ではないが、英語で要約されている（多分テンプレートがそうなっている感じ）

rerost

Prompt Management

全体的にLLMの実装と切り離されているので使いまわしやすそう。
ChatGPTに話しかける部分は結構インターフェースの変更とかありそう？だし、ここだけ使うとかもあり。

Template

irb(main):028:0> prompt = Langchain::Prompt::PromptTemplate.new(template: "Tell me a {adjective} joke about {content}.", input_variables: ["adjective", "content"])
irb(main):029:0> prompt.format(adjective: "funny", content: "chickens") # "Tell me a funny joke about chickens."
=> "Tell me a funny joke about chickens."
irb(main):030:0> prompt.input_variables
=> ["adjective", "content"]

LLM とは分離された形でTemplate機能が入っている。LLM特有っぽい機能はなさそうかも？
強いて言えば、

JSONベースでの管理ができる。多言語対応とかやりやすそう
Few Shot Learningがやりやすいらしい

JSON

こんな感じで使えるらしい。

{
  "_type": "prompt",
  "input_variables": ["adjective", "content"],
  "template": "Tell me a {adjective} joke about {content}."
}

疑問としては、

_type がいくつかあるんだろうか？
input_variables の宣言面倒では？

あたり。

Few Shot Learning

require "langchain"

prompt = Langchain::Prompt::FewShotPromptTemplate.new(
  prefix: "プレイヤー数を教えてください",
  suffix: "Input: {game_name}\nOutput:",
  example_prompt: Langchain::Prompt::PromptTemplate.new(
    input_variables: ["input", "output"],
    template: "Input: {input}\nOutput: {output}"
  ),
  examples: [
    { "input": "リバーシ", "output": "2人" },
    { "input": "テトリス", "output": "1~4人" },
  ],
   input_variables: ["game_name"]
)

p prompt.format(adjective: "テニス")

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

p llm.complete(prompt: prompt.format(adjective: "テニス")).completion

$ bundle exec ruby few_shot_learning.rb
"プレイヤー数を教えてください\n\nInput: リバーシ\nOutput: 2人\n\nInput: テトリス\nOutput: 1~4人\n\nInput: テニス\nOutput:"
"2人"

こんな感じ。便利

ちなみにsaveもできる

prompt.save(file_path: "./hoge.json")

$ cat hoge.json | jq .
{
  "_type": "few_shot",
  "input_variables": [
    "game_name"
  ],
  "prefix": "プレイヤー数を教えてください",
  "example_prompt": {
    "_type": "prompt",
    "input_variables": [
      "input",
      "output"
    ],
    "template": "Input: {input}\nOutput: {output}"
  },
  "examples": [
    {
      "input": "リバーシ",
      "output": "2人"
    },
    {
      "input": "テトリス",
      "output": "1~4人"
    }
  ],
  "suffix": "Input: {game_name}\nOutput:"
}

rerost

Output Parser

JSON Schemaを与えてそれに従ってもらうプロンプトを作る
帰ってきたレスポンスをパース

という流れっぽい。OpenAIのJSON Modeとは別物で依存しているわけではない

追加されたのが、2023/06とかなので事情が変わっているかも。

JSON ModeがないLLMでやるときとかは便利そう。あと、逆にJSON Modeを使いたいときは直に叩くのが良いかも？

require "langchain"

json_schema = {
  type: "object",
  properties: {
    name: {
      type: "string",
      description: "Persons name"
    },
    age: {
      type: "number",
      description: "Persons age"
    },
    interests: {
      type: "array",
      items: {
        type: "object",
        properties: {
          interest: {
            type: "string",
            description: "A topic of interest"
          },
          levelOfInterest: {
            type: "number",
            description: "A value between 0 and 100 of how interested the person is in this interest"
          }
        },
        required: ["interest", "levelOfInterest"],
        additionalProperties: false
      },
      minItems: 1,
      maxItems: 3,
      description: "A list of the person's interests"
    }
  },
  required: ["name", "age", "interests"],
  additionalProperties: false
}
parser = Langchain::OutputParsers::StructuredOutputParser.from_json_schema(json_schema)
prompt = Langchain::Prompt::PromptTemplate.new(template: "Generate details of a fictional character.\n{format_instructions}\nCharacter description: {description}", input_variables: ["description", "format_instructions"])
prompt_text = prompt.format(description: "Korean chemistry student", format_instructions: parser.get_format_instructions)
puts "---prompt_text--"
puts prompt_text

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
llm_response = llm.complete(prompt: prompt_text).completion
puts "---llm_response--"
puts llm_response

puts "---parse_result--"
puts parser.parse(llm_response)

~/go/src/github.com/rerost/tmp/langchainrb_test rerost/langchainrb*
(arm64) $  bundle exec ruby output_parser.rb
---prompt_text--
Generate details of a fictional character.
You must format your output as a JSON value that adheres to a given "JSON Schema" instance.

"JSON Schema" is a declarative language that allows you to annotate and validate JSON documents.

For example, the example "JSON Schema" instance {"properties": {"foo": {"description": "a list of test words", "type": "array", "items": {"type": "string"}}, "required": ["foo"]}
would match an object with one required property, "foo". The "type" property specifies "foo" must be an "array", and the "description" property semantically describes it as "a list of test words". The items within "foo" must be strings.
Thus, the object {"foo": ["bar", "baz"]} is a well-formatted instance of this example "JSON Schema". The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Your output will be parsed and type-checked according to the provided schema instance, so make sure all fields in your output match the schema exactly and there are no trailing commas!

Here is the JSON Schema instance your output must adhere to. Include the enclosing markdown codeblock:
```json
{"type":"object","properties":{"name":{"type":"string","description":"Persons name"},"age":{"type":"number","description":"Persons age"},"interests":{"type":"array","items":{"type":"object","properties":{"interest":{"type":"string","description":"A topic of interest"},"levelOfInterest":{"type":"number","description":"A value between 0 and 100 of how interested the person is in this interest"},"required":["interest","levelOfInterest"],"additionalProperties":false},"minItems":1,"maxItems":3,"description":"A list of the person's interests"},"required":["name","age","interests"],"additionalProperties":false}
```

Character description: Korean chemistry student
---llm_response--
{
  "name": "Ji-hyun Kim",
  "age": 21,
  "interests": [
    {
      "interest": "Organic Chemistry",
      "levelOfInterest": 85
    },
    {
      "interest": "Physical Chemistry",
      "levelOfInterest": 70
    },
    {
      "interest": "Analytical Chemistry",
      "levelOfInterest": 60
    }
  ]
}
---parse_result--
{"name"=>"Ji-hyun Kim", "age"=>21, "interests"=>[{"interest"=>"Organic Chemistry", "levelOfInterest"=>85}, {"interest"=>"Physical Chemistry", "levelOfInterest"=>70}, {"interest"=>"Analytical Chemistry", "levelOfInterest"=>60}]}

rerost

あとExampleコードがエラーになるのでPRを送ってみた

rerost

Assistant

こんな感じでassistant（LLMにツールや事前の会話を付与したもの）を作れる

require "langchain"

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

thread = Langchain::Thread.new

assistant = Langchain::Assistant.new(
  llm: llm,
  thread: thread,
  tools: [
    Langchain::Tool::RubyCodeInterpreter.new
  ]
)

assistant.add_message content: "フィボナッチ数列の100個目を教えてください。またできるだけ効率的な探し方をしてください"
puts assistant.run(auto_tool_execution: true).map { |message| {role: message.role, content: message.content} }

こんな感じでRubyのコードを生成して実行させることができる。

~/go/src/github.com/rerost/tmp/langchainrb_test rerost/langchainrb* 7s
(arm64) $  bundle exec ruby assistants.rb
I, [2024-02-04T21:17:50.254519 #81332]  INFO -- : [LangChain.rb] [Langchain::Tool::RubyCodeInterpreter]: Executing "def fibonacci(n)
  fib = [0, 1]
  (2..n).each do |i|
    fib[i] = fib[i-1] + fib[i-2]
  end
  fib[n]
end

fibonacci(100)"
{:role=>"user", :content=>"フィボナッチ数列の100個目を教えてください。またできるだけ効率的な探し方をしてください"}
{:role=>"assistant", :content=>""}
{:role=>"tool", :content=>"354224848179261915075"}
{:role=>"assistant", :content=>"フィボナッチ数列の100個目は、354224848179261915075です。この結果は、効率的な方法で計算されました。"}

rerost

AssistantのLLMはOpenAIにのみ絞られている（一応Azureは継承関係にあるので通るはず）