Langchain.jsを試しつつ、WordPressとつないだりを試した記録

hidetaka okamoto 2023/05/14

ChatGPT的な使い方をする場合

import { ChatOpenAI } from "langchain/chat_models/openai";
import { HumanChatMessage, SystemChatMessage } from "langchain/schema";

export const run = async () => {
  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
  return chat.call([
    new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),
    new HumanChatMessage("こんにちは")
  ]).then(result => result.text)
};
run().then(console.log)

SystemChatMessageがrole: system、HumanChatMessageがrole: userに相当すると思われる。

hidetaka okamoto 2023/05/14

Result

AIChatMessage {
  text: `Hello! That's "Konnichiwa" in Japanese.`,
  name: undefined
}

hidetaka okamoto 2023/05/14

Ref

hidetaka okamoto 2023/05/14

「過去にGPTが発言したテキスト」は、AIChatMessageを使うと良いらしい？

  return chat.call([
    new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),
    new HumanChatMessage("こんにちは"),
    new AIChatMessage("Hello! That's \"Konnichiwa\" in Japanese."),
    new HumanChatMessage("ほかの言い方を、英語で教えてください。"),
  ]).then(result => result.text)

"Konnichiwa" can also be translated to "Good afternoon" in English.

hidetaka okamoto 2023/05/14

プロンプトを再利用したい場合は、テンプレートを使う

  const translationPrompt = ChatPromptTemplate.fromPromptMessages([
    SystemMessagePromptTemplate.fromTemplate(
      "You're a helpful assistant that translates {input_language} to {output_language}"
    ),
    HumanMessagePromptTemplate.fromTemplate(
      "{text}"
    )
  ])

呼び出し方はこんな感じ。

await chat.generatePrompt([
    await translationPrompt.formatPromptValue({
      input_language: 'Japanese',
      output_language: 'English',
      text: 'こんばんは'
    }),
    await translationPrompt.formatPromptValue({
      input_language: 'English',
      output_language: 'Japanese',
      text: "I like sushi."
    })
  ])

実行結果

[
  {
    "text": "Good evening!",
    "message": {
      "text": "Good evening!"
    }
  }
]
[
  {
    "text": "私は寿司が好きです。",
    "message": {
      "text": "私は寿司が好きです。"
    }
  }
]

hidetaka okamoto 2023/05/14

モデルを指定する場合は、合わせ技になる。

  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
  const translationPrompt = ChatPromptTemplate.fromPromptMessages([
    SystemMessagePromptTemplate.fromTemplate(
      "You're a helpful assistant that translates {input_language} to {output_language}"
    ),
    HumanMessagePromptTemplate.fromTemplate(
      "{text}"
    )
  ])
  const chain = new LLMChain({
    prompt: translationPrompt,
    llm: chat
  })
  return chain.call({
    input_language: 'Japanese',
    output_language: 'English',
    text: 'こんばんは'
  })

hidetaka okamoto 2023/05/14

記憶もできるらしい。
メモリをどこに置くかが、実運用ではキーになるやつと見た。

  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
  const chatPrompt = ChatPromptTemplate.fromPromptMessages([
    SystemMessagePromptTemplate.fromTemplate(
      "The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know."
    ),
    new MessagesPlaceholder("history"),
    HumanMessagePromptTemplate.fromTemplate("{input}"),
  ]);
  
  const chain = new ConversationChain({
    memory: new BufferMemory({ returnMessages: true, memoryKey: "history" }),
    prompt: chatPrompt,
    llm: chat,
  });
  const responseH = await chain.call({
    input: "My home is in Osaka. And I want to go to Tokyo. How should I do?",
  });
  
  console.log(responseH);
  const responseI = await chain.call({
    input: "How about return to my home?",
  });
  
  console.log(responseI);

実行結果

{
  response: 'There are several ways to travel from Osaka to Tokyo. The most common way is to take the Shinkansen bullet train, which takes about 2.5 hours and costs around 14,000 yen. Alternatively, you can take a domestic flight from Osaka to Tokyo, which takes about 1.5 hours and costs around 10,000 yen. Another option is to take a highway bus, which takes about 8-9 hours and costs around 4,000-6,000 yen. Finally, you can also drive or take a taxi, but this will take significantly longer and be more expensive.'
}
{
  response: 'You can use the same transportation options to return to Osaka from Tokyo. If you took the Shinkansen bullet train to Tokyo, you can take the same train back to Osaka. If you took a domestic flight, you can book a return flight from Tokyo to Osaka. If you took a highway bus, you can book a return ticket from Tokyo to Osaka. And if you drove or took a taxi, you can use the same method to return to Osaka.'
}

hidetaka okamoto 2023/05/14

追加データを読ませたい

まずはローカルのファイルから。

  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY
  });
    const loader = new DirectoryLoader(
    '/Users/sandbox/langchain-ts-starter/dummy-data',
    {
      '.md': path => new TextLoader(path),
      '.json': path => new JSONLoader(path)
    }
  )
  const docs = await loader.load()
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
  writeFileSync('./dummy-data.json', JSON.stringify(vectorStoreData))
  return vectorStoreData

これでdummy-data.jsonを作る。

hidetaka okamoto 2023/05/14

使う時はこんな感じになる。

import { ChatOpenAI } from "langchain/chat_models/openai";
import { JSONLoader } from "langchain/document_loaders";
import { HNSWLib } from "langchain/vectorstores";
import { VectorStoreToolkit, createVectorStoreAgent } from "langchain/agents";
import { OpenAIEmbeddings } from "langchain/embeddings";

const runAnswer = async () => {
  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const jsonLoader = new JSONLoader('./dummy-data.json')
  const d = await jsonLoader.load()
  const vectorStoreData = await HNSWLib.fromDocuments(d, embeddings)
  const vectorStoreTool = new VectorStoreToolkit({
    name: 'demo',
    description: 'demo',
    vectorStore: vectorStoreData,
  }, chat)
  const agent = createVectorStoreAgent(chat, vectorStoreTool)
  return agent.call({
      input: 'What does the Gutenberg project licensed by?'
//    input: `How we can get start the Guternberg project?`
//    input: "What is the `Gutenberg`?"
  })
}

hidetaka okamoto 2023/05/14

レスポンスの例

{
  output: 'To get started with the Gutenberg project, one can download the Gutenberg plugin and start testing the block editor features.',
  intermediateSteps: [
    {
      action: [Object],
      observation: 'The Gutenberg project is a new paradigm in WordPress site building and publishing that aims to revolutionize the entire publishing experience. It is in the second phase of a four-phase process that will touch every piece of WordPress, including Editing, Customization (which includes Full Site Editing, Block Patterns, Block Directory, and Block-based themes), Collaboration, and Multilingual. The project is focused on a new editing experience, the block editor, which introduces a modular approach to pages and posts. Each piece of content in the editor, from a paragraph to an image gallery to a headline, is its own block. The Gutenberg plugin gives you the latest version of the block editor so you can join in testing bleeding-edge features, start playing with blocks, and maybe get inspired to build your own.'
    }
  ]
}

{
  output: 'The Gutenberg project is licensed under the GNU General Public License version 2 or any later version.',
  intermediateSteps: [
    {
      action: [Object],
      observation: 'The Gutenberg project is released under the terms of the GNU General Public License version 2 or (at your option) any later version. You can find the complete license in the LICENSE.md file.'
    }
  ]
}

hidetaka okamoto 2023/05/14

動かしたみた感じ、おっそい。
Streamで返事を返す実装にするか、OpenAIの方でEmbededした方が良いかもしれない。

hidetaka okamoto 2023/05/14

とりあえずJSONファイルに出力したベクトルデータに対する入力ができた（っぽい）。
次はLoaderの方を色々試したい。
外部REST APIやRSS情報・CSVデータなどのデータが使えるようになったら、何かが作れそうな気がする。

hidetaka okamoto 2023/05/18

ウェブサイトのデータを読み込む

npm i cheerio

コードはこんな感じ。

import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader(
  "https://news.ycombinator.com/item?id=34817881"
);

hidetaka okamoto 2023/05/18に更新

エラーを引く時がある。

/node_modules/langchain/dist/document_loaders/web/cheerio.js:39
            signal: timeout ? AbortSignal.timeout(timeout) : undefined,
                                          ^

TypeError: AbortSignal.timeout is not a function

ググると、ポリフィル的なコードが紹介されていた。

とりあえずこうしてみる。


+  if (!AbortSignal.timeout) {
+    AbortSignal.timeout = (ms) => {
+      const controller = new AbortController();
+      setTimeout(() => controller.abort(new DOMException("TimeoutError")), ms);
+      return controller.signal;
+    };
+  }

const loader = new CheerioWebBaseLoader(
  "https://news.ycombinator.com/item?id=34817881"
);

JSDOM系とブラウザAPIの違い？的なのが背景っぽいので、もしかするとNodeのバージョンとかにも影響を受けるかも？

遭遇したのは、Node v17.2.0

hidetaka okamoto 2023/05/18に更新

今度は「そんなに処理できないよ」エラーに遭遇。

    data: {
      error: {
        message: "This model's maximum context length is 8191 tokens, however you requested 20651 tokens (20651 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.",
        type: 'invalid_request_error',
        param: null,
        code: null
      }
    }

Selectorをつけて、余計なデータを読ませないようにする。

  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
+    {
+      selector: 'main'
+    }
  );

hidetaka okamoto 2023/05/18

ベクターデータへの変換は、例によってJSONファイルで行う。


  const docs = await loader.load()
+  const embeddings = new OpenAIEmbeddings({
+    openAIApiKey: process.env.OPENAI_API_KEY
+  })
+  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)
+  writeFileSync('./dummy-data.json', JSON.stringify(vectorStoreData))

hidetaka okamoto 2023/05/18

同じく上のステップと同じ方法で質問してみる。


  const chat = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  })
  const jsonLoader = new JSONLoader('./dummy-data.json')
  const d = await jsonLoader.load()
  const vectorStoreData = await HNSWLib.fromDocuments(d, embeddings)
  const vectorStoreTool = new VectorStoreToolkit({
    name: 'demo',
    description: 'demo',
    vectorStore: vectorStoreData,
  }, chat)
  const agent = createVectorStoreAgent(chat, vectorStoreTool)
  return agent.call({
      input: '「過去にGPTが発言したテキスト」を利用する方法を、日本語で教えて'
  })

hidetaka okamoto 2023/05/18

レスポンス

{
  output: 'To access past texts generated by GPT in Japanese, you can use the `AIChatMessage` class.',
  intermediateSteps: [
    {
      action: [Object],
      observation: "According to the context provided, you can use the `AIChatMessage` class to access past texts generated by GPT. Here's an example code snippet:\n" +
        '\n' +
        '```\n' +
        'return chat.call([\n' +
        `  new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),\n` +
        '  new HumanChatMessage("こんにちは"),\n' +
        `  new AIChatMessage("Hello! That's \\"Konnichiwa\\" in Japanese."),\n` +
        '  new HumanChatMessage("ほかの言い方を、英語で教えてください。"),\n' +
        ']).then(result => result.text)\n' +
        '```\n' +
        '\n' +
        'In this example, the `AIChatMessage` class is used to access the text generated by GPT in response to the previous user message.'
    }
  ]
}

hidetaka okamoto 2023/05/18

比較のために、別のベクターデータを読ませて質問してみる。

{
  output: "There is no specific tool for accessing past statements made by GPT in Japanese, but you can use OpenAI's GPT models through their API to generate new text in Japanese based on a prompt or topic of your choice.",
  intermediateSteps: [
    {
      action: [Object],
      observation: "OpenAI's GPT models are capable of generating text in Japanese, but there is no specific search function for past statements made by GPT in Japanese. However, you can use the GPT models to generate new text in Japanese based on a prompt or topic of your choice. You can access the GPT models through OpenAI's API, which requires an API key. Once you have an API key, you can use it to make requests to the API and generate text in Japanese."
    }
  ]
}

LangChain.jsの話が出なくなったので、「任意のウェブページの情報を使った回答生成ができている」と判断できそう。
ただ、やっぱり遅い。１ページだけでこの速度と考えると、「うちのサイトのデータを全部読ませて〜」は別の方法を取る必要があるかもしれない？

hidetaka okamoto 2023/05/18

ググると「Chunkを作ると良いかもしれない」という情報を得た。

とりあえず読み込まれたDocsの中身を見てみる。

  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
    {
      selector: 'main'
    }
  );
  const docs = await loader.load()

  console.log(docs)

結果

[
  Document {
    pageContent: 'hidetaka okamoto4日前\n' +
      ' ChatGPT的な使い方をする場合\n' +
      'import { ChatOpenAI } from "langchain/chat_models/openai";\n' +
      'import { HumanChatMessage, SystemChatMessage } from "langchain/schema";\n' +
      '\n' +
      'export const run = async () => {\n' +
      '  const chat = new ChatOpenAI({\n' +
      '    temperature: 0,\n' +
      '    openAIApiKey: process.env.OPENAI_API_KEY\n' +
      '  });\n' +
      '  return chat.call([\n' +
      `    new SystemChatMessage("You're a helpful assistant that translates Japanese to English"),\n` +
      '    new HumanChatMessage("こんにちは")\n' +
      '  ]).then(result => result.text)\n' +
      '};\n' +

\nで区切れば良さそうな気がする。

hidetaka okamoto 2023/05/18

雑に区切ってみる。

  const docs = await loader.load()

+  const splitter = new CharacterTextSplitter({
+    separator: "\n",
+    chunkSize: 7,
+    chunkOverlap: 3,
+  });
+  const output = await splitter.createDocuments([docs[0].pageContent]);
+  console.log(output)

結果

  Document {
    pageContent: "output_language: 'English',",
    metadata: { loc: [Object] }
  },
  Document {
    pageContent: "text: 'こんばんは'",
    metadata: { loc: [Object] }
  },
  Document { pageContent: '})', metadata: { loc: [Object] } },
  Document {
    pageContent: 'hidetaka okamoto4日前記憶もできるらしい。',
    metadata: { loc: [Object] }
  },

hidetaka okamoto 2023/05/18

再実行結果。なんか思ってたのと違う。

{
  output: "While there is no specific tool for accessing past statements made by GPT in Japanese, you can use OpenAI's GPT models through their API to generate new text in Japanese based on a prompt or topic of your choice. You can access OpenAI's GPT models through their API, which requires an API key. Once you have an API key, you can use it to make requests to the API and generate text in various languages, including Japanese. You can find more information about OpenAI's API on their website at https://api.openai.com/v1.",
  intermediateSteps: [
    {
      action: [Object],
      observation: "There is no specific tool for accessing past statements made by GPT in Japanese. However, you can use OpenAI's GPT models through their API to generate new text in Japanese based on a prompt or topic of your choice."
    },
    {
      action: [Object],
      observation: "You can access OpenAI's GPT models through their API, which requires an API key. Once you have an API key, you can use it to make requests to the API and generate text in various languages, including Japanese. However, there is no specific tool for accessing past statements made by GPT in Japanese. You can use the GPT models to generate new text in Japanese based on a prompt or topic of your choice. You can find more information about OpenAI's API on their website at https://api.openai.com/v1."
    }
  ]
}

hidetaka okamoto 2023/05/18

Splitterを変えてみる。

  const docs = await loader.load()

  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 7,
    chunkOverlap: 3,
  });
  const output = await splitter.splitDocuments(docs);
  const vectorStoreData = await MemoryVectorStore.fromDocuments(output, embeddings)
  writeFileSync('./dummy-data2.json', JSON.stringify(vectorStoreData))

hidetaka okamoto 2023/05/18

途中で気づいたけど、Loader側にloadAndSplitメソッドがある。
こっちの方がシンプルかも。


  const loader = new CheerioWebBaseLoader(
    "https://zenn.dev/hideokamoto/scraps/744cfdd408d5bc",
    {
      selector: 'main'
    }
  );
  const docs = await loader.loadAndSplit(new RecursiveCharacterTextSplitter({
    chunkSize: 7,
    chunkOverlap: 3,
  }));
  const vectorStoreData = await MemoryVectorStore.fromDocuments(docs, embeddings)

hidetaka okamoto 2023/05/18

それはそれとして、期待した答えでないなぁ・・・

{
  output: 'It is recommended to try searching on search engines or forums that specialize in natural language processing or AI technology for more information on how to access past statements made by GPT in Japanese.',
  intermediateSteps: [
    {
      action: [Object],
      observation: 'Based on the context provided, it seems that the given links do not provide information on how to search for past statements made by GPT in Japanese. It is recommended to try searching on search engines or forums that specialize in natural language processing or AI technology for more information.'
    }
  ]
}

ChatGPT的な使い方をする場合

追加データを読ませたい

ウェブサイトのデータを読み込む

WP APIについて聞いてみるコード

保存だけ

利用・検索だけ

検索結果