Open61

Langchain.jsを試しつつ、WordPressとつないだりを試した記録

hidetaka okamotohidetaka okamoto

ChatGPT的な使い方をする場合

import { ChatOpenAI } from "langchain/chat_models/openai";
import { HumanChatMessage, SystemChatMessage } from "langchain/schema";

export const run = async () => {
  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
  return chat.call([
    new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),
    new HumanChatMessage("こんにちは")
  ]).then(result => result.text)
};
run().then(console.log)

SystemChatMessagerole: systemHumanChatMessagerole: userに相当すると思われる。

hidetaka okamotohidetaka okamoto

「過去にGPTが発言したテキスト」は、AIChatMessageを使うと良いらしい?

  return chat.call([
    new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),
    new HumanChatMessage("こんにちは"),
    new AIChatMessage("Hello! That's \"Konnichiwa\" in Japanese."),
    new HumanChatMessage("ほかの言い方を、英語で教えてください。"),
  ]).then(result => result.text)
"Konnichiwa" can also be translated to "Good afternoon" in English.
hidetaka okamotohidetaka okamoto

プロンプトを再利用したい場合は、テンプレートを使う

  const translationPrompt = ChatPromptTemplate.fromPromptMessages([
    SystemMessagePromptTemplate.fromTemplate(
      "You're a helpful assistant that translates {input_language} to {output_language}"
    ),
    HumanMessagePromptTemplate.fromTemplate(
      "{text}"
    )
  ])

呼び出し方はこんな感じ。

await chat.generatePrompt([
    await translationPrompt.formatPromptValue({
      input_language: 'Japanese',
      output_language: 'English',
      text: 'こんばんは'
    }),
    await translationPrompt.formatPromptValue({
      input_language: 'English',
      output_language: 'Japanese',
      text: "I like sushi."
    })
  ])

実行結果

[
  {
    "text": "Good evening!",
    "message": {
      "text": "Good evening!"
    }
  }
]
[
  {
    "text": "私は寿司が好きです。",
    "message": {
      "text": "私は寿司が好きです。"
    }
  }
]
hidetaka okamotohidetaka okamoto

モデルを指定する場合は、合わせ技になる。

  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
  const translationPrompt = ChatPromptTemplate.fromPromptMessages([
    SystemMessagePromptTemplate.fromTemplate(
      "You're a helpful assistant that translates {input_language} to {output_language}"
    ),
    HumanMessagePromptTemplate.fromTemplate(
      "{text}"
    )
  ])
  const chain = new LLMChain({
    prompt: translationPrompt,
    llm: chat
  })
  return chain.call({
    input_language: 'Japanese',
    output_language: 'English',
    text: 'こんばんは'
  })
hidetaka okamotohidetaka okamoto

記憶もできるらしい。
メモリをどこに置くかが、実運用ではキーになるやつと見た。

  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
  const chatPrompt = ChatPromptTemplate.fromPromptMessages([
    SystemMessagePromptTemplate.fromTemplate(
      "The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know."
    ),
    new MessagesPlaceholder("history"),
    HumanMessagePromptTemplate.fromTemplate("{input}"),
  ]);
  
  const chain = new ConversationChain({
    memory: new BufferMemory({ returnMessages: true, memoryKey: "history" }),
    prompt: chatPrompt,
    llm: chat,
  });
  const responseH = await chain.call({
    input: "My home is in Osaka. And I want to go to Tokyo. How should I do?",
  });
  
  console.log(responseH);
  const responseI = await chain.call({
    input: "How about return to my home?",
  });
  
  console.log(responseI);

実行結果

{
  response: 'There are several ways to travel from Osaka to Tokyo. The most common way is to take the Shinkansen bullet train, which takes about 2.5 hours and costs around 14,000 yen. Alternatively, you can take a domestic flight from Osaka to Tokyo, which takes about 1.5 hours and costs around 10,000 yen. Another option is to take a highway bus, which takes about 8-9 hours and costs around 4,000-6,000 yen. Finally, you can also drive or take a taxi, but this will take significantly longer and be more expensive.'
}
{
  response: 'You can use the same transportation options to return to Osaka from Tokyo. If you took the Shinkansen bullet train to Tokyo, you can take the same train back to Osaka. If you took a domestic flight, you can book a return flight from Tokyo to Osaka. If you took a highway bus, you can book a return ticket from Tokyo to Osaka. And if you drove or took a taxi, you can use the same method to return to Osaka.'
}
hidetaka okamotohidetaka okamoto

追加データを読ませたい

まずはローカルのファイルから。

  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
    const loader = new DirectoryLoader(
    '/Users/sandbox/langchain-ts-starter/dummy-data',
    {
      '.md': path => new TextLoader(path),
      '.json': path => new JSONLoader(path)
    }
  )
  const docs = await loader.load()
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  writeFileSync('./dummy-data.json', JSON.stringify(vectorStoreData))
  return vectorStoreData

これでdummy-data.jsonを作る。

hidetaka okamotohidetaka okamoto

使う時はこんな感じになる。

import { ChatOpenAI } from "langchain/chat_models/openai";
import { JSONLoader } from "langchain/document_loaders";
import { HNSWLib } from "langchain/vectorstores";
import { VectorStoreToolkit, createVectorStoreAgent } from "langchain/agents";
import { OpenAIEmbeddings } from "langchain/embeddings";

const runAnswer = async () => {
  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const jsonLoader = new JSONLoader('./dummy-data.json')
  const d = await jsonLoader.load()
  const vectorStoreData = await HNSWLib.fromDocuments(d, embeddings)
  const vectorStoreTool = new VectorStoreToolkit({
    name: 'demo',
    description: 'demo',
    vectorStore: vectorStoreData,
  }, chat)
  const agent = createVectorStoreAgent(chat, vectorStoreTool)
  return agent.call({
      input: 'What does the Gutenberg project licensed by?'
//    input: `How we can get start the Guternberg project?`
//    input: "What is the `Gutenberg`?"
  })
}
hidetaka okamotohidetaka okamoto

レスポンスの例

{
  output: 'To get started with the Gutenberg project, one can download the Gutenberg plugin and start testing the block editor features.',
  intermediateSteps: [
    {
      action: [Object],
      observation: 'The Gutenberg project is a new paradigm in WordPress site building and publishing that aims to revolutionize the entire publishing experience. It is in the second phase of a four-phase process that will touch every piece of WordPress, including Editing, Customization (which includes Full Site Editing, Block Patterns, Block Directory, and Block-based themes), Collaboration, and Multilingual. The project is focused on a new editing experience, the block editor, which introduces a modular approach to pages and posts. Each piece of content in the editor, from a paragraph to an image gallery to a headline, is its own block. The Gutenberg plugin gives you the latest version of the block editor so you can join in testing bleeding-edge features, start playing with blocks, and maybe get inspired to build your own.'
    }
  ]
}

{
  output: 'The Gutenberg project is licensed under the GNU General Public License version 2 or any later version.',
  intermediateSteps: [
    {
      action: [Object],
      observation: 'The Gutenberg project is released under the terms of the GNU General Public License version 2 or (at your option) any later version. You can find the complete license in the LICENSE.md file.'
    }
  ]
}
hidetaka okamotohidetaka okamoto

動かしたみた感じ、おっそい。
Streamで返事を返す実装にするか、OpenAIの方でEmbededした方が良いかもしれない。

hidetaka okamotohidetaka okamoto

とりあえずJSONファイルに出力したベクトルデータに対する入力ができた(っぽい)。
次はLoaderの方を色々試したい。
外部REST APIやRSS情報・CSVデータなどのデータが使えるようになったら、何かが作れそうな気がする。

hidetaka okamotohidetaka okamoto

ウェブサイトのデータを読み込む

https://js.langchain.com/docs/modules/indexes/document_loaders/examples/web_loaders/web_cheerio

npm i cheerio

コードはこんな感じ。

import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader(
  "https://news.ycombinator.com/item?id=34817881"
);
hidetaka okamotohidetaka okamoto

エラーを引く時がある。

/node_modules/langchain/dist/document_loaders/web/cheerio.js:39
            signal: timeout ? AbortSignal.timeout(timeout) : undefined,
                                          ^

TypeError: AbortSignal.timeout is not a function

ググると、ポリフィル的なコードが紹介されていた。

https://gamliela.com/blog/advanced-testing-with-jest

とりあえずこうしてみる。


+  if (!AbortSignal.timeout) {
+    AbortSignal.timeout = (ms) => {
+      const controller = new AbortController();
+      setTimeout(() => controller.abort(new DOMException("TimeoutError")), ms);
+      return controller.signal;
+    };
+  }

const loader = new CheerioWebBaseLoader(
  "https://news.ycombinator.com/item?id=34817881"
);

JSDOM系とブラウザAPIの違い?的なのが背景っぽいので、もしかするとNodeのバージョンとかにも影響を受けるかも?

遭遇したのは、Node v17.2.0

hidetaka okamotohidetaka okamoto

今度は「そんなに処理できないよ」エラーに遭遇。

    data: {
      error: {
        message: "This model's maximum context length is 8191 tokens, however you requested 20651 tokens (20651 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.",
        type: 'invalid_request_error',
        param: null,
        code: null
      }
    }

Selectorをつけて、余計なデータを読ませないようにする。

  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
+    {
+      selector: 'main'
+    }
  );

hidetaka okamotohidetaka okamoto

ベクターデータへの変換は、例によってJSONファイルで行う。


  const docs = await loader.load()
+  const embeddings = new OpenAIEmbeddings({
+    openAIApiKey: process.env.OPENAI_API_KEY
+  })
+  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
+  writeFileSync('./dummy-data.json', JSON.stringify(vectorStoreData))
hidetaka okamotohidetaka okamoto

同じく上のステップと同じ方法で質問してみる。


  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const jsonLoader = new JSONLoader('./dummy-data.json')
  const d = await jsonLoader.load()
  const vectorStoreData = await HNSWLib.fromDocuments(d, embeddings)
  const vectorStoreTool = new VectorStoreToolkit({
    name: 'demo',
    description: 'demo',
    vectorStore: vectorStoreData,
  }, chat)
  const agent = createVectorStoreAgent(chat, vectorStoreTool)
  return agent.call({
      input: '「過去にGPTが発言したテキスト」を利用する方法を、日本語で教えて'
  })

hidetaka okamotohidetaka okamoto

レスポンス

{
  output: 'To access past texts generated by GPT in Japanese, you can use the `AIChatMessage` class.',
  intermediateSteps: [
    {
      action: [Object],
      observation: "According to the context provided, you can use the `AIChatMessage` class to access past texts generated by GPT. Here's an example code snippet:\n" +
        '\n' +
        '```\n' +
        'return chat.call([\n' +
        `  new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),\n` +
        '  new HumanChatMessage("こんにちは"),\n' +
        `  new AIChatMessage("Hello! That's \\"Konnichiwa\\" in Japanese."),\n` +
        '  new HumanChatMessage("ほかの言い方を、英語で教えてください。"),\n' +
        ']).then(result => result.text)\n' +
        '```\n' +
        '\n' +
        'In this example, the `AIChatMessage` class is used to access the text generated by GPT in response to the previous user message.'
    }
  ]
}
hidetaka okamotohidetaka okamoto

比較のために、別のベクターデータを読ませて質問してみる。

{
  output: "There is no specific tool for accessing past statements made by GPT in Japanese, but you can use OpenAI's GPT models through their API to generate new text in Japanese based on a prompt or topic of your choice.",
  intermediateSteps: [
    {
      action: [Object],
      observation: "OpenAI's GPT models are capable of generating text in Japanese, but there is no specific search function for past statements made by GPT in Japanese. However, you can use the GPT models to generate new text in Japanese based on a prompt or topic of your choice. You can access the GPT models through OpenAI's API, which requires an API key. Once you have an API key, you can use it to make requests to the API and generate text in Japanese."
    }
  ]
}

LangChain.jsの話が出なくなったので、「任意のウェブページの情報を使った回答生成ができている」と判断できそう。
ただ、やっぱり遅い。1ページだけでこの速度と考えると、「うちのサイトのデータを全部読ませて〜」は別の方法を取る必要があるかもしれない?

hidetaka okamotohidetaka okamoto

ググると「Chunkを作ると良いかもしれない」という情報を得た。

https://ict-worker.com/ai/langchain-chunk.html

とりあえず読み込まれたDocsの中身を見てみる。

  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
    {
      selector: 'main'
    }
  );
  const docs = await loader.load()

  console.log(docs)

結果

[
  Document {
    pageContent: 'hidetaka okamoto4日前\n' +
      ' ChatGPT的な使い方をする場合\n' +
      'import { ChatOpenAI } from "langchain/chat_models/openai";\n' +
      'import { HumanChatMessage, SystemChatMessage } from "langchain/schema";\n' +
      '\n' +
      'export const run = async () => {\n' +
      '  const chat = new ChatOpenAI({\n' +
      '    temperature: 0,\n' +
      '    openAIApiKey: process.env.OPENAI_API_KEY\n' +
      '  });\n' +
      '  return chat.call([\n' +
      `    new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),\n` +
      '    new HumanChatMessage("こんにちは")\n' +
      '  ]).then(result => result.text)\n' +
      '};\n' +

\nで区切れば良さそうな気がする。

hidetaka okamotohidetaka okamoto

雑に区切ってみる。

  const docs = await loader.load()

+  const splitter = new CharacterTextSplitter({
+    separator: "\n",
+    chunkSize: 7,
+    chunkOverlap: 3,
+  });
+  const output = await splitter.createDocuments([docs[0].pageContent]);
+  console.log(output)

結果

  Document {
    pageContent: "output_language: 'English',",
    metadata: { loc: [Object] }
  },
  Document {
    pageContent: "text: 'こんばんは'",
    metadata: { loc: [Object] }
  },
  Document { pageContent: '})', metadata: { loc: [Object] } },
  Document {
    pageContent: 'hidetaka okamoto4日前記憶もできるらしい。',
    metadata: { loc: [Object] }
  },

hidetaka okamotohidetaka okamoto

再実行結果。なんか思ってたのと違う。

{
  output: "While there is no specific tool for accessing past statements made by GPT in Japanese, you can use OpenAI's GPT models through their API to generate new text in Japanese based on a prompt or topic of your choice. You can access OpenAI's GPT models through their API, which requires an API key. Once you have an API key, you can use it to make requests to the API and generate text in various languages, including Japanese. You can find more information about OpenAI's API on their website at https://api.openai.com/v1.",
  intermediateSteps: [
    {
      action: [Object],
      observation: "There is no specific tool for accessing past statements made by GPT in Japanese. However, you can use OpenAI's GPT models through their API to generate new text in Japanese based on a prompt or topic of your choice."
    },
    {
      action: [Object],
      observation: "You can access OpenAI's GPT models through their API, which requires an API key. Once you have an API key, you can use it to make requests to the API and generate text in various languages, including Japanese. However, there is no specific tool for accessing past statements made by GPT in Japanese. You can use the GPT models to generate new text in Japanese based on a prompt or topic of your choice. You can find more information about OpenAI's API on their website at https://api.openai.com/v1."
    }
  ]
}
hidetaka okamotohidetaka okamoto

途中で気づいたけど、Loader側にloadAndSplitメソッドがある。
こっちの方がシンプルかも。


  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
    {
      selector: 'main'
    }
  );
  const docs = await loader.loadAndSplit(new RecursiveCharacterTextSplitter({
    chunkSize: 7,
    chunkOverlap: 3,
  }));
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
hidetaka okamotohidetaka okamoto

それはそれとして、期待した答えでないなぁ・・・

{
  output: 'It is recommended to try searching on search engines or forums that specialize in natural language processing or AI technology for more information on how to access past statements made by GPT in Japanese.',
  intermediateSteps: [
    {
      action: [Object],
      observation: 'Based on the context provided, it seems that the given links do not provide information on how to search for past statements made by GPT in Japanese. It is recommended to try searching on search engines or forums that specialize in natural language processing or AI technology for more information.'
    }
  ]
}
hidetaka okamotohidetaka okamoto

ちょっと気分を変えて、Chainのプリセットを試してみる。

https://js.langchain.com/docs/modules/chains/other_chains/summarization


  if (!AbortSignal.timeout) {
    AbortSignal.timeout = (ms) => {
      const controller = new AbortController();
      setTimeout(() => controller.abort(new DOMException("TimeoutError")), ms);
      return controller.signal;
    };
  }
  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
    {
      selector: 'main'
    }
  );
  const docs = await loader.loadAndSplit(new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
  }));
  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const chain = loadSummarizationChain(model, { type: 'map_reduce' })
  return chain.call({
    input_documents: docs
  })
hidetaka okamotohidetaka okamoto

まぁ動いてるっちゃ動いてる。英語だけど。

{
  text: ' This article provides instructions on how to use ChatGPT, a chatbot model based on OpenAI, to generate text in Japanese. It uses modules from the langchain library, such as SystemChatMessage and HumanChatMessage, to create a conversation. It also provides a template for a chatbot that translates between Japanese and English, and a code snippet to create a dummy-data.json file. Additionally, it discusses an error that occurs when trying to update a web page using the CheerioWebBaseLoader, and suggests using a JSON file to convert to vector data. Finally, it suggests searching on search engines or forums that specialize in natural language processing or AI technology for more information.'
}
hidetaka okamotohidetaka okamoto

この仕組みさえわかれば上書きできるね。

  if (!AbortSignal.timeout) {
    AbortSignal.timeout = (ms) => {
      const controller = new AbortController();
      setTimeout(() => controller.abort(new DOMException("TimeoutError")), ms);
      return controller.signal;
    };
  }
  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });

  const webLoader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
    {
      selector: 'main'
    }
  )
  const docs = await webLoader.loadAndSplit()
  const prompt = new PromptTemplate({
    inputVariables: ['text'],
    template: `
    Write a concise summary in Japanese of the following:
    "{text}"
    CONCISE SUMMARY:
    `
  })
  const chain = loadSummarizationChain(model, {
    type: 'map_reduce',
    combineMapPrompt: prompt,
    combinePrompt: prompt,
  })
  return chain.call({
    input_documents: docs
  })
hidetaka okamotohidetaka okamoto

長すぎるって怒られたので別記事を食べさせた。

{
  text: ' hidetaka okamotoは1ヶ月前、URLをバッチで監視したいと考えていました。そのために、Sitemap XMLのデータからURLを引っ張り、Diffをとる必要がありました。Cloudfare Workers + R2を使うことで、URLの数が1000件超える場合にも対応できるようになりました。'
}

日本語になった。

hidetaka okamotohidetaka okamoto

JSONデータを読ませるなら、JSON Agent Toolkitも良さそう。

https://js.langchain.com/docs/modules/agents/toolkits/json

WP APIについて聞いてみるコード

  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const response = await fetch('https://<適当なWordPressサイト>/wp-json')
  const json = await response.json()
  console.log(json)
  const jsonTool = new JsonToolkit(new JsonSpec(json as any))
  const executor = createJsonAgent(model, jsonTool)
  return executor.call({
    input: 'What are the required parameters in the request body to the POST /wp/v2/post request?'
  })

レスポンス。APIのことをちゃんと話してくれている。

{
  output: 'The required parameters for the POST /wp/v2/post request are listed on the WordPress developer website at https://developer.wordpress.org/rest-api/.',
  intermediateSteps: [
    {
      action: [Object],
      observation: 'name, description, url, home, gmt_offset, timezone_string, namespaces, authentication, routes, site_logo, site_icon, _links'
    },
    {
      action: [Object],
      observation: '{"help":[{"href":"https://developer.wordpress.org/rest-api/"}]}'
    },
    {
      action: [Object],
      observation: '{"href":"https://developer.wordpress.org/rest-api/"}'
    }
  ]
}
hidetaka okamotohidetaka okamoto

記事データを流そうとすると、tokenが溢れる。

  const response = await fetch('https://<適当なWordPressサイト>/wp-json/wp/v2/posts?per_page=2')
  const json = await response.json()
  const embededJsonData = (json as any).map((data: any) => {
    return data.content.rendered
  })
  const jsonTool = new JsonToolkit(new JsonSpec(embededJsonData as any))
  const executor = createJsonAgent(model, jsonTool)
  return executor.call({
    input: 'What are the required parameters in the request body to the POST /wp/v2/post request?'
  })

Error

    data: {
      error: {
        message: "This model's maximum context length is 4097 tokens, however you requested 5884 tokens (5628 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.",
        type: 'invalid_request_error',
        param: null,
        code: null
      }
    }
hidetaka okamotohidetaka okamoto

WebBrowser toolも試してみる。

https://js.langchain.com/docs/modules/agents/tools/webbrowser


  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })

  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const browser = new WebBrowser({model, embeddings})
  const result = await browser.call(
    `"https://hidetaka.dev","who is Hidetaka Okamoto"`
  );

  console.log(result);
  return browser.call(
    `"https://js.langchain.com/docs/modules/agents/tools/webbrowser","How we can load the external data? Please answer in Japanese."`
  )

レスポンス

Relevant Links:
- [Hidetaka.dev](https://hidetaka.dev/)
- [About](https://hidetaka.dev/about)
- [Articles](https://hidetaka.dev/articles)
- [Projects](https://hidetaka.dev/projects)
- [OSS](https://hidetaka.dev/oss)


Relevant Links:
- [Python](https://github.com/hwchase17/langchain)
- [JS/TS](https://github.com/hwchase17/langchainjs)
- [Homepage](https://langchain.com)
- [Blog](https://blog.langchain.dev)
- [Copyright](https://langchain.com/copyright)

回答はしてくれてるけど、ちょっと思ってたのと違う感も。
関連記事とかに使えそう・・・?

hidetaka okamotohidetaka okamoto

RetreivalQAChainなるものがCloudflareのブログに出てきたので、試してみる。

https://js.langchain.com/docs/modules/chains/index_related_chains/retrieval_qa

  • Documentを作成
  • Embeding APIでベクトルデータに変換
  • VectorStoreに保存
  • Retrieverを使ってStoreを参照できるようにする

みたいな理解であってるのかな。

hidetaka okamotohidetaka okamoto

  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  if (!AbortSignal.timeout) {
    AbortSignal.timeout = (ms) => {
      const controller = new AbortController();
      setTimeout(() => controller.abort(new DOMException("TimeoutError")), ms);
      return controller.signal;
    };
  }
  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
    {
      selector: 'main'
    }
  );
  const docs = await loader.loadAndSplit()
  
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  const chain = RetrievalQAChain.fromLLM(model, vectorStoreData.asRetriever())
  return chain.call({
    query: "CheerioWebBaseLoaderの使い方を教えて"
  })

実行結果

{
  text: ' CheerioWebBaseLoaderを使うには、まずnpm i cheerioを実行してCheerioをインストールします。次に、以下のようなコードを書きます。\n' +
    '\n' +
    'import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";\n' +
    '\n' +
    'const loader = new CheerioWebBaseLoader(\n' +
    '  "https://news.ycombinator.com/item?id=34817881"\n' +
    ');\n' +
    '\n' +
    'また、タイムアウトエラーを回避するために、以下のコードを追加することもできます。\n' +
    '\n' +
    'if (!AbortSignal.timeout) {\n' +
    '  AbortSignal.timeout = (ms) => {\n' +
    '    const controller = new AbortController();\n' +
    '    setTimeout(() => controller.abort(new DOMException("TimeoutError")), ms);\n' +
    '    return controller.signal;\n' +
    '  };\n' +
    '}'
}

これは良い感じでは。

hidetaka okamotohidetaka okamoto

Splitterを変えてみても動いた。

  const docs = await loader.loadAndSplit(
  new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
  }));
hidetaka okamotohidetaka okamoto

REST APIのJSONレスポンスを対象にする方法にチャレンジ。

最終系はこうなるはず。

  const jsonLoader = new JSONLoader(jsonData)
  const docs = await jsonLoader.loadAndSplit()
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  const chain = RetrievalQAChain.fromLLM(model, vectorStoreData.asRetriever())
  return chain.call({
    query: "Astroの使い方を教えて"
  })

なので、JSONLoaderfetchのレスポンスを入れる方法を調べる。

hidetaka okamotohidetaka okamoto

JSONLoaderstring | Blobをうけつけている。

https://github.com/hwchase17/langchainjs/blob/f97565659c33011ca6a8bc7c5da11aba251911d7/langchain/src/document_loaders/fs/json.ts#L7

export class JSONLoader extends TextLoader {
  public pointers: string[];

  constructor(filePathOrBlob: string | Blob, pointers: string | string[] = []) {
    super(filePathOrBlob);
    this.pointers = Array.isArray(pointers) ? pointers : [pointers];
  }

ただし、ここでのstringfilepathを指す。
そのため、JSON.stringifyしたデータなどを入れてもエラーになる。

https://github.com/hwchase17/langchainjs/blob/f97565659c33011ca6a8bc7c5da11aba251911d7/langchain/src/document_loaders/fs/text.ts#L18-L22

  public async load(): Promise<Document[]> {
    let text: string;
    let metadata: Record<string, string>;
    if (typeof this.filePathOrBlob === "string") {
      const { readFile } = await TextLoader.imports();
      text = await readFile(this.filePathOrBlob, "utf8");
      metadata = { source: this.filePathOrBlob };
    } 
hidetaka okamotohidetaka okamoto

そのため注目するのはBlobの方になる。
ということで、やるべきことは「fetchのレスポンスをBlobに変換する」になった。

hidetaka okamotohidetaka okamoto

この辺になると、ググればある程度やり方も出てくる。

今回はStack Overflowのこれを試した。

https://stackoverflow.com/questions/53929108/how-to-convert-a-javascript-object-to-utf-8-blob-for-download

  const response = await fetch('https://wp-api.wp-kyoto.net/wp-json/wp/v2/posts')
  const jsonResponse = await response.json()

  // これはtoken数を節約するため、使わなさそうなデータをdropしている処理
  const indexData = (jsonResponse as any).map((data: any) => data.content.rendered)

  // Blobへの変換
  const jsonString = JSON.stringify(indexData)
  const bytes = new TextEncoder().encode(jsonString)
  const jsonBlobData = new Blob([bytes], {
    type: 'application/json:charset=utf-8'
  })
hidetaka okamotohidetaka okamoto

ReferenceError: Blob is not definedが出た場合は、import { Blob } from "buffer";する。

ただしLangChainのJSONLoaderが想定するBlobと型が合わないケースがあるらしく、anyを使った。

  const jsonBlobData = new Blob([bytes], {
    type: 'application/json:charset=utf-8'
  })
  const jsonLoader = new JSONLoader(jsonBlobData as any as globalThis.Blob)
  const docs = await jsonLoader.loadAndSplit()
hidetaka okamotohidetaka okamoto

完成系。


  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });

  const response = await fetch('https://<Your WordPress site URL>/wp-json/wp/v2/posts')
  const jsonResponse = await response.json()
  const indexData = (jsonResponse as any).map((data: any) => data.content.rendered)
  const jsonString = JSON.stringify(indexData)
  const bytes = new TextEncoder().encode(jsonString)
  const jsonBlobData = new Blob([bytes], {
    type: 'application/json:charset=utf-8'
  })
  const jsonLoader = new JSONLoader(jsonBlobData as any as globalThis.Blob)
  const docs = await jsonLoader.loadAndSplit()
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  const chain = RetrievalQAChain.fromLLM(model, vectorStoreData.asRetriever())
  return chain.call({
    query: "Astroの使い方を教えて"
  })

実行結果

{
  text: ' Astroを使うと、ブランチとビルドの設定を行うことができます。また、独自のタグを作成して、Markdocコンテンツを読み込むことも可能です。'
}

回答そっけないなーとは思うけども、意図した動きにはなってそう。

hidetaka okamotohidetaka okamoto

保存と利用をわけてみる。

保存だけ

const updateIndex = async () => {
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })

  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });

  const response = await fetch('https://<Your WordPress site URL>/wp-json/wp/v2/posts')
  const jsonResponse = await response.json()
  const indexData = (jsonResponse as any).map((data: any) => data.content.rendered)
  const jsonString = JSON.stringify(indexData)
  const bytes = new TextEncoder().encode(jsonString)
  const jsonBlobData = new Blob([bytes], {
    type: 'application/json:charset=utf-8'
  })
  const jsonLoader = new JSONLoader(jsonBlobData as any as globalThis.Blob)
  const docs = await jsonLoader.loadAndSplit()

  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  writeFileSync('./dummy-data3.json', JSON.stringify(vectorStoreData))
}

利用・検索だけ

const runAnswer = async () => {
  const model = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const jsonLoader = new JSONLoader('./dummy-data3.json')
  const d = await jsonLoader.load()
  const vectorStoreData = await HNSWLib.fromDocuments(d, embeddings)
  const chain = RetrievalQAChain.fromLLM(model, vectorStoreData.asRetriever())
  return chain.call({
    query: "AstroをCloudflare Pagesにデプロイする方法を教えて"
  })

検索結果

{
  text: 'この記事によると、Astroで構築したアプリのデプロイ設定を行うには、ブランチとビルドの設定を行う必要があります。また、Cloudflare Pagesへのデプロイでは、Account / Cloudflare Pages / Editを選び、ビルド・デプロイログを見守ることができます。詳細については、記事を参照してください。'
}

ちゃんと動いてる。

hidetaka okamotohidetaka okamoto

ダメ元で100記事入れてみたけど、動いた。

{
  text: 'AstroをCloudflare Pagesにデプロイする方法は以下の手順になります。\n' +
    '\n' +
    '1. Astroでビルドを行います。\n' +
    '2. Cloudflare Pagesにログインし、新しいプロジェクトを作成します。\n' +
    '3. プロジェクトの設定で、ビルドコマンドとデプロイ先のディレクトリを指定します。\n' +
    '4. デプロイを実行します。\n' +
    '\n' +
    '具体的な手順は以下の通りです。\n' +
    '\n' +
    '1. Astroでビルドを行います。\n' +
    '```\n' +
    'npm run build\n' +
    '```\n' +
    '\n' +
    '2. Cloudflare Pagesにログインし、新しいプロジェクトを作成します。\n' +
    '\n' +
    '3. プロジェクトの設定で、ビルドコマンドとデプロイ先のディレクトリを指定します。\n' +
    'ビルドコマンドには、Astroでビルドしたファイルを指定します。\n' +
    'デプロイ先のディレクトリには、Astroでビルドしたファイルが含まれるディレクトリを指定します。\n' +
    '\n' +
    '4. デプロイを実行します。\n' +
    '```\n' +
    'npx wrangler pages publish dist\n' +
    '```\n' +
    '\n' +
    '以上がAstroをCloudflare Pagesにデプロイする手順になります。'
}
hidetaka okamotohidetaka okamoto

DynamicToolだったかな。
Toolの中で外部APIを叩くタイプにもチャレンジしてみたい。

hidetaka okamotohidetaka okamoto

https://js.langchain.com/docs/modules/agents/toolkits/openapi
OpenAPI Agent Toolkitで遊ぼうとした。

例えばStripeのOpenAPI Specとかを入れてみるイメージ。

https://github.com/stripe/openapi

hidetaka okamotohidetaka okamoto

サンプルコード通りにやったらエラー出た。

    data: {
      error: {
        message: "This model's maximum context length is 4097 tokens, however you requested 5490 tokens (5234 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.",
        type: 'invalid_request_error',
        param: null,
        code: null
      }
    }

API specがデカすぎるのかな。

hidetaka okamotohidetaka okamoto

実体は割とシンプルっぽい。

https://github.com/hwchase17/langchainjs/blob/6df22523fd8bd546ded1ea9bf5314abca88983d8/langchain/src/agents/agent_toolkits/openapi/openapi.ts#L30-L46

export class OpenApiToolkit extends RequestsToolkit {
  constructor(jsonSpec: JsonSpec, llm: BaseLanguageModel, headers?: Headers) {
    super(headers);
    const jsonAgent = createJsonAgent(llm, new JsonToolkit(jsonSpec));
    this.tools = [
      ...this.tools,
      new DynamicTool({
        name: "json_explorer",
        func: async (input: string) => {
          const result = await jsonAgent.call({ input });
          return result.output as string;
        },
        description: JSON_EXPLORER_DESCRIPTION,
      }),
    ];
  }
}
  • JSONのOpenAPI specをなんとか読ませる
  • その上でDynamicToolでAPIを叩く

ができればそれっぽくなるかも・・・?

hidetaka okamotohidetaka okamoto

強引に読ませてみた。

  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const response = await fetch('https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.json')
  const jsonData = await response.json()
  const jsonString = JSON.stringify(jsonData)
  const bytes = new TextEncoder().encode(jsonString)
  const jsonBlobData = new Blob([bytes], {
    type: 'application/json:charset=utf-8'
  })
  const jsonLoader = new JSONLoader(jsonBlobData as any as globalThis.Blob)
  const docs = await jsonLoader.loadAndSplit()
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  const chain = RetrievalQAChain.fromLLM(model, vectorStoreData.asRetriever())
  return chain.call({
    query: "How to list the product data?"
  })

かけた時間の割に、微妙な回答。

{
  text: ' You can list the product data by combining the ProductList and price_data_with_product_data and then looping through the list of line items in the cart.'
}
hidetaka okamotohidetaka okamoto
hidetaka okamotohidetaka okamoto

サンプルコードのDocsはTokenが溢れるので注意


-  const docs = await loader.load();
+  const docs = await loader.loadAndSplit();

hidetaka okamotohidetaka okamoto
  const loader = new GithubRepoLoader(
    "https://github.com/hwchase17/langchainjs",
    { branch: "main", recursive: false, unknown: "warn" }
  );
  const docs = await loader.loadAndSplit();
  console.log({ docs });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  const chain = RetrievalQAChain.fromLLM(model, vectorStoreData.asRetriever())
  return chain.call({
    query: "LangChain.jsとはなに?"
  })

ワンショットだとこんな感じ。

hidetaka okamotohidetaka okamoto

レスポンス。

{
  text: ' LangChain.jsは、大規模言語モデル(LLMs)を使用して、以前に作成できなかったようなアプリケーションを構築する開発者を支援するためのライブラリです。'
}
hidetaka okamotohidetaka okamoto

気になるリポジトリの使い方を雑に聞くやつ

  const loader = new GithubRepoLoader(
    "https://github.com/wpkyoto/stripe-pwa-elements",
    { branch: "main", recursive: false, unknown: "warn" }
  );
  const docs = await loader.loadAndSplit();
  console.log({ docs });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const model = new OpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  const chain = RetrievalQAChain.fromLLM(model, vectorStoreData.asRetriever())
  return chain.call({
    query: "このライブラリの使い方を、サンプルコード付きで紹介してください。"
  })
{
  text: ' 以下のサンプルコードを使用して、<stripe-payment-sheet> コンポーネントを使用してください。\n' +
    '\n' +
    '<div id="payment-sheet-container"></div>\n' +
    '\n' +
    '<script type="module">\n' +
    "  import { StripePaymentSheet } from '@stripe-elements/stripe-elements';\n" +
    '\n' +
    '  const paymentSheet = new StripePaymentSheet({\n' +
    "    container: '#payment-sheet-container',\n" +
    '    onToken: (token) => {\n' +
    "      document.querySelector('#result').innerHTML = `Token: ${token.id}`;\n" +
    '    },\n' +
    '  });\n' +
    '</script>'
}
hidetaka okamotohidetaka okamoto
  return chain.call({
    query: "このライブラリのチュートリアルを、Markdown形式で1000文字程度にて作成してください。"
  })

おいこらw

{ text: " I don't know." }
hidetaka okamotohidetaka okamoto

具体的に指示してみる。

  return chain.call({
    query: [
      `このライブラリのチュートリアルを作成します。次のステップについて、それぞれサンプルコードや手順を紹介してください`,
      '- Step1: npmでのインストール方法',
      '- Step2: HTMLタグとして埋め込む方法',
      '- Step3: JavaScriptで埋め込む方法'
    ].join('\n')

結果

{
  text: ' \n' +
    'Step1: npmでのインストール方法\n' +
    'npm install stripe-elements\n' +
    '\n' +
    'Step2: HTMLタグとして埋め込む方法\n' +
    '<div id="stripe"></div>\n' +
    '\n' +
    'Step3: JavaScriptで埋め込む方法\n' +
    "const stripe = Stripe('YOUR_STRIPE_PUBLISHABLE_API_KEY');\n" +
    'const stripeElement = stripe.elements();\n' +
    "const stripeElement = stripeElement.create('card');\n" +
    "stripeElement.mount('#stripe');"
}

「雑に作ったライブラリの、Docsサイト自動生成!」みたいなのはちょっと難しそうかな。

hidetaka okamotohidetaka okamoto

ちょっとした使い方とかなら答えてくれそうかも?

  return chain.call({
    query: [
      'stripe-payment-sheetタグで、formタグのsubmit操作時に処理を走らせる方法を教えてください。'
    ].join('\n')
{
  text: ' stripe-payment-sheetタグのshould-use-default-form-submit-action属性をfalseに設定し、formSubmitイベントを追加して、stripe.createPaymentMethodを実行します。'
}