Closed9

Langchain.rbを読む

rerostrerost

出したい結論

  • 概念の理解と役割の対応付け
  • 実際何ができるのか?
rerostrerost

試す

rerostrerost

要約

irb(main):009:1" llm.summarize(text: "WantedlyではGoやRails、Python書いてマイクロサービスを作成しています。
irb(main):010:1" データサイエンティストと一緒にユーザーに価値を届けることをしています。
irb(main):011:1" 実際に行っていることとしては次で、
irb(main):012:1" - 検索に関するロジックの改善・整理
irb(main):013:1" - 作成した推薦モデルを本番環境に安全に出せる環境を作る
irb(main):014:1" - 推薦モデルを作成できるように、ログの充実やドメイン知識の理解
irb(main):015:1"
irb(main):016:1" などをやっています。").completion
=> "Wantedly is creating microservices using Go, Rails, and Python. They work with data scientists to deliver value to users. They focus on improving and organizing search logic, creating a safe environment for deploying recommendation models, and enhancing logs and domain knowledge to enable the creation of recommendation models."

あんまいい例ではないが、英語で要約されている(多分テンプレートがそうなっている感じ)

rerostrerost

Prompt Management

全体的にLLMの実装と切り離されているので使いまわしやすそう。
ChatGPTに話しかける部分は結構インターフェースの変更とかありそう?だし、ここだけ使うとかもあり。

Template

irb(main):028:0> prompt = Langchain::Prompt::PromptTemplate.new(template: "Tell me a {adjective} joke about {content}.", input_variables: ["adjective", "content"])
irb(main):029:0> prompt.format(adjective: "funny", content: "chickens") # "Tell me a funny joke about chickens."
=> "Tell me a funny joke about chickens."
irb(main):030:0> prompt.input_variables
=> ["adjective", "content"]

LLM とは分離された形でTemplate機能が入っている。LLM特有っぽい機能はなさそうかも?
強いて言えば、

  • JSONベースでの管理ができる。多言語対応とかやりやすそう
  • Few Shot Learningがやりやすいらしい

JSON

こんな感じで使えるらしい。

{
  "_type": "prompt",
  "input_variables": ["adjective", "content"],
  "template": "Tell me a {adjective} joke about {content}."
}

https://github.com/andreibondarev/langchainrb/blob/0.9.0/spec/fixtures/prompt/prompt_template.json

疑問としては、

  • _type がいくつかあるんだろうか?
  • input_variables の宣言面倒では?

あたり。

Few Shot Learning

require "langchain"

prompt = Langchain::Prompt::FewShotPromptTemplate.new(
  prefix: "プレイヤー数を教えてください",
  suffix: "Input: {game_name}\nOutput:",
  example_prompt: Langchain::Prompt::PromptTemplate.new(
    input_variables: ["input", "output"],
    template: "Input: {input}\nOutput: {output}"
  ),
  examples: [
    { "input": "リバーシ", "output": "2人" },
    { "input": "テトリス", "output": "1~4人" },
  ],
   input_variables: ["game_name"]
)

p prompt.format(adjective: "テニス")

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

p llm.complete(prompt: prompt.format(adjective: "テニス")).completion
$ bundle exec ruby few_shot_learning.rb
"プレイヤー数を教えてください\n\nInput: リバーシ\nOutput: 2人\n\nInput: テトリス\nOutput: 1~4人\n\nInput: テニス\nOutput:"
"2人"

こんな感じ。便利

ちなみにsaveもできる

prompt.save(file_path: "./hoge.json")
$ cat hoge.json | jq .
{
  "_type": "few_shot",
  "input_variables": [
    "game_name"
  ],
  "prefix": "プレイヤー数を教えてください",
  "example_prompt": {
    "_type": "prompt",
    "input_variables": [
      "input",
      "output"
    ],
    "template": "Input: {input}\nOutput: {output}"
  },
  "examples": [
    {
      "input": "リバーシ",
      "output": "2人"
    },
    {
      "input": "テトリス",
      "output": "1~4人"
    }
  ],
  "suffix": "Input: {game_name}\nOutput:"
}
rerostrerost

Output Parser

  1. JSON Schemaを与えてそれに従ってもらうプロンプトを作る
  2. 帰ってきたレスポンスをパース

という流れっぽい。OpenAIのJSON Modeとは別物で依存しているわけではない
https://platform.openai.com/docs/guides/text-generation/json-mode

追加されたのが、2023/06とかなので事情が変わっているかも。
https://github.com/andreibondarev/langchainrb/pull/208

JSON ModeがないLLMでやるときとかは便利そう。あと、逆にJSON Modeを使いたいときは直に叩くのが良いかも?

require "langchain"

json_schema = {
  type: "object",
  properties: {
    name: {
      type: "string",
      description: "Persons name"
    },
    age: {
      type: "number",
      description: "Persons age"
    },
    interests: {
      type: "array",
      items: {
        type: "object",
        properties: {
          interest: {
            type: "string",
            description: "A topic of interest"
          },
          levelOfInterest: {
            type: "number",
            description: "A value between 0 and 100 of how interested the person is in this interest"
          }
        },
        required: ["interest", "levelOfInterest"],
        additionalProperties: false
      },
      minItems: 1,
      maxItems: 3,
      description: "A list of the person's interests"
    }
  },
  required: ["name", "age", "interests"],
  additionalProperties: false
}
parser = Langchain::OutputParsers::StructuredOutputParser.from_json_schema(json_schema)
prompt = Langchain::Prompt::PromptTemplate.new(template: "Generate details of a fictional character.\n{format_instructions}\nCharacter description: {description}", input_variables: ["description", "format_instructions"])
prompt_text = prompt.format(description: "Korean chemistry student", format_instructions: parser.get_format_instructions)
puts "---prompt_text--"
puts prompt_text

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
llm_response = llm.complete(prompt: prompt_text).completion
puts "---llm_response--"
puts llm_response

puts "---parse_result--"
puts parser.parse(llm_response)
~/go/src/github.com/rerost/tmp/langchainrb_test rerost/langchainrb*
(arm64) $  bundle exec ruby output_parser.rb
---prompt_text--
Generate details of a fictional character.
You must format your output as a JSON value that adheres to a given "JSON Schema" instance.

"JSON Schema" is a declarative language that allows you to annotate and validate JSON documents.

For example, the example "JSON Schema" instance {"properties": {"foo": {"description": "a list of test words", "type": "array", "items": {"type": "string"}}, "required": ["foo"]}
would match an object with one required property, "foo". The "type" property specifies "foo" must be an "array", and the "description" property semantically describes it as "a list of test words". The items within "foo" must be strings.
Thus, the object {"foo": ["bar", "baz"]} is a well-formatted instance of this example "JSON Schema". The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Your output will be parsed and type-checked according to the provided schema instance, so make sure all fields in your output match the schema exactly and there are no trailing commas!

Here is the JSON Schema instance your output must adhere to. Include the enclosing markdown codeblock:
```json
{"type":"object","properties":{"name":{"type":"string","description":"Persons name"},"age":{"type":"number","description":"Persons age"},"interests":{"type":"array","items":{"type":"object","properties":{"interest":{"type":"string","description":"A topic of interest"},"levelOfInterest":{"type":"number","description":"A value between 0 and 100 of how interested the person is in this interest"},"required":["interest","levelOfInterest"],"additionalProperties":false},"minItems":1,"maxItems":3,"description":"A list of the person's interests"},"required":["name","age","interests"],"additionalProperties":false}
```

Character description: Korean chemistry student
---llm_response--
{
  "name": "Ji-hyun Kim",
  "age": 21,
  "interests": [
    {
      "interest": "Organic Chemistry",
      "levelOfInterest": 85
    },
    {
      "interest": "Physical Chemistry",
      "levelOfInterest": 70
    },
    {
      "interest": "Analytical Chemistry",
      "levelOfInterest": 60
    }
  ]
}
---parse_result--
{"name"=>"Ji-hyun Kim", "age"=>21, "interests"=>[{"interest"=>"Organic Chemistry", "levelOfInterest"=>85}, {"interest"=>"Physical Chemistry", "levelOfInterest"=>70}, {"interest"=>"Analytical Chemistry", "levelOfInterest"=>60}]}
rerostrerost

Assistant

こんな感じでassistant(LLMにツールや事前の会話を付与したもの)を作れる

require "langchain"

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

thread = Langchain::Thread.new

assistant = Langchain::Assistant.new(
  llm: llm,
  thread: thread,
  tools: [
    Langchain::Tool::RubyCodeInterpreter.new
  ]
)

assistant.add_message content: "フィボナッチ数列の100個目を教えてください。またできるだけ効率的な探し方をしてください"
puts assistant.run(auto_tool_execution: true).map { |message| {role: message.role, content: message.content} }

こんな感じでRubyのコードを生成して実行させることができる。

~/go/src/github.com/rerost/tmp/langchainrb_test rerost/langchainrb* 7s
(arm64) $  bundle exec ruby assistants.rb
I, [2024-02-04T21:17:50.254519 #81332]  INFO -- : [LangChain.rb] [Langchain::Tool::RubyCodeInterpreter]: Executing "def fibonacci(n)
  fib = [0, 1]
  (2..n).each do |i|
    fib[i] = fib[i-1] + fib[i-2]
  end
  fib[n]
end

fibonacci(100)"
{:role=>"user", :content=>"フィボナッチ数列の100個目を教えてください。またできるだけ効率的な探し方をしてください"}
{:role=>"assistant", :content=>""}
{:role=>"tool", :content=>"354224848179261915075"}
{:role=>"assistant", :content=>"フィボナッチ数列の100個目は、354224848179261915075です。この結果は、効率的な方法で計算されました。"}
このスクラップは2024/02/04にクローズされました