LangChainでGeminiを使用する

個人的にLangChainでは置いてかれがちな印象だったVertexAI Geminiモデルが気づいたらある程度利用できるようになっているので取り回しを調べてサンプルを記録します。

これまでVertexAI APIを触っていてLangChainはあまり見てこなかったので基本的な使い方も含めて確認

 chathttps://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm/

Richard_R

sample1
公式ドキュメントより、シンプルなチャット生成

import vertexai
from langchain_google_vertexai import ChatVertexAI

vertexai.init(
    project="xxx",
    location="asia-northeast1"
)

llm = ChatVertexAI(
    model="gemini-1.5-flash-002",
    temperature=0,
    max_tokens=None,
    max_retries=6,
    stop=None,
)

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content)

認証

gcloud auth application-default login

このコマンドをターミナルで実行しsdk用の認証を済ませ、スクリプトにはvertexai.initを入れておく必要あり

Richard_R

sample2
公式ドキュメントより、テンプレートを使用したchain

import vertexai
from langchain_google_vertexai import ChatVertexAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "input_language": "English",
        "output_language": "Japanese",
        "input": "I love programming.",
    }
).content

Richard_R

LCEL : LangChain Expression Languageというのがある

Richard_R

stream: stream back chunks of the response
invoke: call the chain on an input
batch: call the chain on a list of inputs

Richard_R

sample3
画像を使用したマルチモーダルインプット

2つの方法がある

画像をbyte stringとしてから渡す
画像URLを渡す

byte stringパターン

from langchain_core.messages import HumanMessage
import base64
import httpx

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
)
# llmはsample1で定義したgeminiモデル
response = llm.invoke([message])
print(response.content)

URLパターン

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = llm.invoke([message])
print(response.content)

ただしURLパターンはサポートしているモデルのみ可能とのこと
Geminiは対応しているようだ
画像複数パターンもドキュメントに掲載されている

Richard_R

ちょこちょこ調べたがpdfをInputにするのは不可能ではないが少しハードルがあるようだ
ここはVertexAI SDKのメリットだ