OpenAI API の json モードと画像入力モードを試す

mizchi

11/6の発表で自分がほしい機能がいくつか追加されたので素振りする

npm の openai は型指定が追いついてないので、型を無視しながら投げることになる

mizchi

JSON Mode

model gpt-4-1106-preview で、出力に明示的に JSON を指定できるようになった。試してみた感じ、プロンプトに json という語句を含めないとエラーになる。

成功する例

import "dotenv/config";
import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function main() {
  const completion = await openai.chat.completions.create({
    messages: [
      {
        role: "system",
        content: `Create dummy json object by given typescript type.
type Output = {
  id: number;
  name: string;
  age: number;
};`,
      },
    ],
    model: "gpt-4-1106-preview",
    // @ts-ignore
    response_format: {
      type: "json_object",
    },
  });

  for (const choice of completion.choices) {
    console.log(choice.message.content);
  }
}
main().catch(console.error);

/* Output
{
  index: 0,
  message: {
    role: 'assistant',
    content: '{\n  "id": 1,\n  "name": "John Doe",\n  "age": 30\n}'
  },
  finish_reason: 'stop'
}
*/

mizchi

gtp-4-vision-preview で画像入力

画像を入力に取れるようになる。

すでにアップロードされてある画像URL を指定して解説させる

import fs from "fs";
import path from "path";
import "dotenv/config";
import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const __dirname = path.dirname(new URL(import.meta.url).pathname);

async function main() {
  const b64Image = fs.readFileSync(path.join(__dirname, "./image.png"), {
    encoding: "base64",
  });

  const completion = await openai.chat.completions.create({
    model: "gpt-4-vision-preview",
    messages: [
      {
        role: "user",
        // @ts-ignore
        content: [
          {
            type: "text",
            text: "What’s in this image?",
          },
          {
            type: "image_url",
            image_url: {
               url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        ],
      },
    ],
    max_tokens: 150,
  });

  for (const choice of completion.choices) {
    console.log(choice.message.content);
  }
}
main().catch(console.error);

Output

The image shows a wooden boardwalk traversing through a lush green field. The perspective of the photo has the boardwalk leading from the foreground into the distance, giving a sense of depth and inviting the viewer to imagine walking along the path. On either side of the walkway is tall grass, and beyond the grass are various shrubs and trees. Above, the sky is blue with wispy clouds scattered throughout, indicating fair weather. This scene is characteristic of a natural, possibly wetland area, where boardwalks are often constructed to allow people to explore without disturbing the delicate ecosystem. The lighting suggests that it could be either late afternoon or early morning, judging from the softness and the angle of the sunlight.

ちなみに値段は width*height で決まるようなので、解像度を低くしてケチったりできそう。

mizchi

gtp-4-vision-preview で画像入力 by base64

画像を base64 で入力する。
これは猫の画像をローカルに image.png として保存して投げてみた。

import fs from "fs";
import path from "path";
import "dotenv/config";
import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const __dirname = path.dirname(new URL(import.meta.url).pathname);

async function main() {
  const b64Image = fs.readFileSync(path.join(__dirname, "./image.png"), {
    encoding: "base64",
  });

  const completion = await openai.chat.completions.create({
    model: "gpt-4-vision-preview",
    messages: [
      {
        role: "user",
        // @ts-ignore
        content: [
          {
            type: "text",
            text: "What’s in this image?",
          },
          {
            type: "image_url",
            image_url: {
              url: `data:image/jpeg;base64,${b64Image}`,
            },
          },
        ],
      },
    ],
    max_tokens: 150,
  });

  for (const choice of completion.choices) {
    console.log(choice.message.content);
  }
}
main().catch(console.error);

Output

The image shows a close-up of a cat with a yellow background. The cat has a white face with some tabby markings, particularly around the ears and the top of the head. It has striking green eyes and appears to be looking directly at the camera, giving it a focused or curious expression.

mizchi

自分がやりたかったのはマークアップの自動生成なので、ビルドした画像のアップロードと修正指示からの再生性でいい感じになりそう