👏

Dalle-E 3 APIに画像をインプットしてプロンプトしたい

takekawa tomoki

2024/06/01に公開

前提

Dalle-E 3 のAPIに画像をインプットしてプロンプトを生成することができない。
どうにかして、画像情報をプロンプトにインプットしたい。
そのため、GPT-4oを経由してプロンプトを投げる。

環境

Azure Open AI ServiceにDalle-E 3およびGPT-4oがデプロイ済みであること
python:3.10.12
openai:1.30.3
以下の画像を使用する。

流れ

GPT-4oに画像を送信し、画像の説明を取得する。その説明をDalle-E 3に送信し、画像を生成する。

手順

Dalle-E 3/GPT-4oのエンドポイントとキーを取得する
以下のコードを実行する

import openai
import base64
import os
import requests
import json
from openai import AzureOpenAI

client = AzureOpenAI(
    api_version="2023-05-15", # 固定
    api_key="<key>",
    azure_endpoint="<endpoint>"
)


#Open the image file and encode it as a base64 string
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")
    
base64_image = encode_image("<ファイルパス>") #ファイルパスを入力

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": [
            {"type": "text", "text": "Describe this image:"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{base64_image}"}
            }
        ]}
    ],
    temperature=0.0,
)

gpt_response = response.choices[0].message.content
print(gpt_response)
# DALL-E 3にプロンプトを送信
dalle_client = AzureOpenAI(
    api_version="2024-02-01", # 固定
    api_key="<key>", 
    azure_endpoint="<endpoint>" 
)

result = dalle_client.images.generate(
    model="Dalle3", 
    prompt = f"あなたは、Teamsのアイコンを作成するスペシャリストです。以下の画像の説明を元にセンスがあるアイコンを作成してください。{gpt_response}",  # GPT-4oの結果をプロンプトとして使用
    n=1
)

# レスポンスをJSON形式に変換
json_response = json.loads(result.model_dump_json())

# 画像を保存するディレクトリを設定
image_dir = os.path.join(os.curdir, 'images')

# ディレクトリが存在しない場合、作成する
if not os.path.isdir(image_dir):
    os.mkdir(image_dir)

# 画像の保存パスを初期化
image_path = os.path.join(image_dir, 'generated_image.png')

# 生成された画像を取得
image_url = json_response["data"][0]["url"]  
generated_image = requests.get(image_url).content  

# 画像をファイルに書き込む
with open(image_path, "wb") as image_file:
    image_file.write(generated_image)

print(f"Image saved to {image_path}")

画像生成されたことを確認

まとめ

どうにかして、Dalle-E 3 APIに画像情報を入力し、画像を出力することができました。
早く、GPT-4 APIで画像を出力、Dalle-E 3 APIで画像を入力できるようにならないかと思っています。

ヘッドウォータース

株式会社ヘッドウォータースのテックブログです。 AIエージェント、生成AI、LLM、Azureのサービスや資格、IoT、XR系などData&AIとApp modernizeに関して幅広く投稿します！

前提

環境

流れ

手順

まとめ

Discussion