😆

LangChainを使ってAmazon Bedrockから構造化データを出力する

T-KND

2024/08/14に公開

はじめに

Amazon Bedrockを通してLLMからJSONのような構造化されたデータを出力したいという時がありました。
LangChainのwith_structured_output()とPydanticクラス（またはTypedDictクラス）を使用してそれが実現できることを知ったので、本記事にまとめておきます。

こちらのドキュメントを参考に作業を進めていきます。

なお、with_structured_output()が使用できるChat Model（Provider）は、こちらのドキュメントにまとめられています。

環境

Python: 3.12.3
langchain: 0.2.13

with_structured_output()とPydanticクラスを使用する場合

Pydanticクラスを使用する利点は、LLMによって生成された出力を検証できることです。必須のフィールドが欠落していたり、フィールドの型が間違っていたりすると、エラーが出ます。

sample.ipynb

import boto3
from langchain_aws import ChatBedrock
from typing import Optional
from langchain_core.pydantic_v1 import BaseModel, Field

# LLMの設定
boto3_session = boto3.Session(
    profile_name="PROFILE_NAME",
)
bedrock_client = boto3_session.client(
    service_name="bedrock-runtime",
    region_name="REGION_NAME",
)
llm = ChatBedrock(
    client=bedrock_client,
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    model_kwargs={
        "max_tokens": 4096,
        "temperature": 0,
        "top_p": 1,
    },
)

# LLMの出力の型
class Joke(BaseModel):
    """ユーザーに伝えるジョーク"""

    setup: str = Field(description="ジョークの設定(フリ)")
    punchline: str = Field(description="ジョークの結果(オチ)")
    rating: Optional[int] = Field(
        default=None, description="どれくらいおもしろいジョークなのか1から10で表す"
    )

structured_llm = llm.with_structured_output(Joke)
res: Joke = structured_llm.invoke("猫を使ったジョークを考えてください。")

print("フリ: ", res.setup)
print("オチ: ", res.punchline)
print("評価: ", res.rating)

実行結果

フリ:  猫が寝ている時、ベッドの上に乗ろうとしたら...
オチ:  猫が「ニャー!」と鳴いて、私を引っかいた。
評価:  7

with_structured_output()とTypedDictクラスを使用する場合

LLMによって生成された出力の検証が不要な場合やLLMの出力をストリーミングできるようにしたい場合は、このパターンが良いようです。

sample2.ipynb

import boto3
from langchain_aws import ChatBedrock
from typing import Optional
from typing_extensions import Annotated, TypedDict

# LLMの設定
boto3_session = boto3.Session(
    profile_name="PROFILE_NAME",
)
bedrock_client = boto3_session.client(
    service_name="bedrock-runtime",
    region_name="REGION_NAME",
)
llm = ChatBedrock(
    client=bedrock_client,
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    model_kwargs={
        "max_tokens": 4096,
        "temperature": 0,
        "top_p": 1,
    },
)

class Joke(TypedDict):
    """ユーザーに伝えるジョーク"""

    setup: str = Annotated[str, ..., "ジョークの設定(フリ)"]
    punchline: str = Annotated[str, ..., "ジョークの結果(オチ)"]
    rating: Annotated[Optional[int], None, "どれくらいおもしろいジョークなのか1から10で表す"]

structured_llm = llm.with_structured_output(Joke)
res: Joke = structured_llm.invoke("猫を使ったジョークを考えてください。")

print("フリ: ", res["setup"])
print("オチ: ", res["punchline"])
print("評価: ", res["rating"])

実行結果

フリ:  猫が寝ている時、ベッドの上に乗ろうとしたら...
オチ:  猫が「ニャーッ!」と怒って引っかいてきた。
評価:  7

LLMの出力をストリーミングで取得してみます。

sample02.ipynb

for chunk in structured_llm.stream("猫を使ったジョークを考えてください。"):
    print(chunk)

{'setup': '猫が寝ている時、ベッドの上に乗ろうとしたら...', 'punchline': '猫が「ニャーッ!」と怒鳴ったよ。', 'rating': 7}

以下のようなドキュメントに記載されている結果になるかと思いましたが、1つのチャンクしか取得できませんでした。

{}
{'setup': ''}
{'setup': 'Why'}
{'setup': 'Why was'}
{'setup': 'Why was the'}
{'setup': 'Why was the cat'}
{'setup': 'Why was the cat sitting'}
{'setup': 'Why was the cat sitting on'}
{'setup': 'Why was the cat sitting on the'}
{'setup': 'Why was the cat sitting on the computer'}
{'setup': 'Why was the cat sitting on the computer?'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': ''}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!', 'rating': 7}

おわりに

自分でプロンプトのみで構造化されたデータを出力しようとするとキーが間違っていたり、ダブルクォーテーションやシングルクォーテーションがおかしかったりして困りましたが、これを使うことでその部分を自分で実装せずに済みそうです。

ここまでご覧いただき、ありがとうございました。

NCDCエンジニアブログ

NCDC株式会社( ncdc.co.jp/ )のエンジニアチームです。募集中のエンジニアのポジションや、採用している技術スタックの紹介などはこちら( github.com/ncdcdev/recruitment )をご覧ください！ ※エンジニア以外も記事を投稿することがあります