iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐡

How TypeChat Generates Prompts and Executes Code

に公開

I realized after finishing that it's mostly the same as this article: https://zenn.dev/ptna/articles/a3882d095fa685

Since I already wrote it, I'll keep it as an article from a different perspective. Rather than focusing on concept understanding, I'll dive deeper into the code.

I read the code as a reference to see how Typechat generates prompts for code generation, rather than just using Typechat as is.

What is Typechat?

A Microsoft-made prompt generator that converts natural language into code execution steps.
It defines a TypeScript API schema and translates natural language input into code execution steps for those APIs.

My rough understanding:

  • The programmer implements the callable API schema and the interpreter that executes it.
  • ChatGPT processes the input/output and extracts it as API call steps.
  • The interpreter is executed.

Simple Usage and What's Happening Under the Hood

Let's implement a simple calculator, referring to examples/math.
The validator part is omitted here. In reality, it also includes parts that validate the output and request corrections.

import { createLanguageModel, createProgramTranslator, evaluateJsonProgram, getData } from "typechat";

const model = createLanguageModel({
  OPENAI_API_KEY: "your api key",
  OPENAI_MODEL: 'gpt-3.5-turbo',
  OPENAI_ENDPOINT: 'https://api.openai.com/v1/chat/completions'
});

const schema = `// This is a schema for writing programs that evaluate expressions.
export type API = {
  // Add two numbers
  add(x: number, y: number): number;
  // Subtract two numbers
  sub(x: number, y: number): number;
  // Multiply two numbers
  mul(x: number, y: number): number;
  // Divide two numbers
  div(x: number, y: number): number;
  // Negate a number
  neg(x: number): number;
  // Identity function
  id(x: number): number;
  // Unknown request
  unknown(text: string): number;
}
`;

const translator = createProgramTranslator(model, schema);
async function handleCall(func: string, args: any[]): Promise<unknown> {
    console.log(`${func}(${args.map(arg => typeof arg === "number" ? arg : JSON.stringify(arg, undefined, 2)).join(", ")})`);
    switch (func) {
        case "add":
            return args[0] + args[1];
        case "sub":
            return args[0] - args[1];
        case "mul":
            return args[0] * args[1];
        case "div":
            return args[0] / args[1];
        case "neg":
            return -args[0];
        case "id":
            return args[0];
    }
    return NaN;
}

async function main(request: string) {
  const response = await translator.translate(request);
  console.log("----------");
  console.log(JSON.stringify(response, null, 2));

  const program = response.data;
  console.log("----------");
  console.log(getData(translator.validator.createModuleTextFromJson(program)));

  const result = await evaluateJsonProgram(program, handleCall);
  console.log("----------");
  console.log(`Result: ${typeof result === "number" ? result : "Error"}`);
}

const request = process.argv[2];
console.log("Translating program...", request);
main(request).catch(console.error);

Run this to verify:

{
  "success": true,
  "data": {
    "@steps": [
      {
        "@func": "add",
        "@args": [
          1,
          2
        ]
      }
    ]
  }
}
----------
import { API } from "./schema";
function program(api: API) {
  return api.add(1, 2);
}
add(1, 2)
----------

Let's break this down.

Typechat Execution Flow

  • Prompt: Define the function call schema and instruct the AI to convert user input into that format.
  • User: Define the API to be exposed to the AI.
  • AI: Convert user input into function call steps for the API.
  • User: Execute the function call steps using an interpreter.

Simple diagram:

Typechat formats the input/output and sends it to ChatGPT.
The parts the user implements are the API schema and the execution of those function call steps.

Users provide requests with this model in mind.

Prompt

It is defined as follows:

src/program.ts
const programSchemaText = `// A program consists of a sequence of function calls that are evaluated in order.
export type Program = {
    "@steps": FunctionCall[];
}

// A function call specifies a function name and a list of argument expressions. Arguments may contain
// nested function calls and result references.
export type FunctionCall = {
    // Name of the function
    "@func": string;
    // Arguments for the function, if any
    "@args"?: Expression[];
};

// An expression is a JSON value, a function call, or a reference to the result of a preceding expression.
export type Expression = JsonValue | FunctionCall | ResultReference;

// A JSON value is a string, a number, a boolean, null, an object, or an array. Function calls and result
// references can be nested in objects and arrays.
export type JsonValue = string | number | boolean | null | { [x: string]: Expression } | Expression[];

// A result reference represents the value of an expression from a preceding step.
export type ResultReference = {
    // Index of the previous expression in the "@steps" array
    "@ref": number;
};
`;

Instructing it to provide the structure of function execution steps in TypeScript and output it as JSON.

User: Defining Public APIs for AI

// This is a schema for writing programs that evaluate expressions.
export type API = {
  // Add two numbers
  add(x: number, y: number): number;
  // Subtract two numbers
  sub(x: number, y: number): number;
  // Multiply two numbers
  mul(x: number, y: number): number;
  // Divide two numbers
  div(x: number, y: number): number;
  // Negate a number
  neg(x: number): number;
  // Identity function
  id(x: number): number;
  // Unknown request
  unknown(text: string): number;
}

The AI recognizes this as a callable API.

Executing the Request

Ultimately, a request like the following is constructed and passed to ChatGPT.

You are a service that translates user requests into programs represented as JSON using the following TypeScript definitions:
```
// A program consists of a sequence of function calls that are evaluated in order.
export type Program = {
    "@steps": FunctionCall[];
}

// A function call specifies a function name and a list of argument expressions. Arguments may contain
// nested function calls and result references.
export type FunctionCall = {
    // Name of the function
    "@func": string;
    // Arguments for the function, if any
    "@args"?: Expression[];
};

// An expression is a JSON value, a function call, or a reference to the result of a preceding expression.
export type Expression = JsonValue | FunctionCall | ResultReference;

// A JSON value is a string, a number, a boolean, null, an object, or an array. Function calls and result
// references can be nested in objects and arrays.
export type JsonValue = string | number | boolean | null | { [x: string]: Expression } | Expression[];

// A result reference represents the value of an expression from a preceding step.
export type ResultReference = {
    // Index of the previous expression in the "@steps" array
    "@ref": number;
};
```
The programs can call functions from the API defined in the following TypeScript definitions:
```
// This is a schema for writing programs that evaluate expressions.
export type API = {
  // Add two numbers
  add(x: number, y: number): number;
  // Subtract two numbers
  sub(x: number, y: number): number;
  // Multiply two numbers
  mul(x: number, y: number): number;
  // Divide two numbers
  div(x: number, y: number): number;
  // Negate a number
  neg(x: number): number;
  // Identity function
  id(x: number): number;
  // Unknown request
  unknown(text: string): number;
}
```
The following is a user request:
"""
1+2
"""
The following is the user request translated into a JSON program object with 2 spaces of indentation and no properties with the value undefined:

At this point, the following output is obtained.

{
  "success": true,
  "data": {
    "@steps": [
      {
        "@func": "add",
        "@args": [
          1,
          1
        ]
      }
    ]
  }
}

Interpreter

Since this only provides execution steps, you need to implement the interpreter that executes them yourself.

As it is math-related, the following four arithmetic operations are implemented.

async function handleCall(func: string, args: any[]): Promise<unknown> {
    console.log(`${func}(${args.map(arg => typeof arg === "number" ? arg : JSON.stringify(arg, undefined, 2)).join(", ")})`);
    switch (func) {
        case "add":
            return args[0] + args[1];
        case "sub":
            return args[0] - args[1];
        case "mul":
            return args[0] * args[1];
        case "div":
            return args[0] / args[1];
        case "neg":
            return -args[0];
        case "id":
            return args[0];
    }
    return NaN;
}

So, by passing it through this, you get the result.

Implementing a Puppeteer Driver

An API for controlling the browser:

import { createLanguageModel, createProgramTranslator, evaluateJsonProgram, getData } from ".";

const model = createLanguageModel({
  OPENAI_API_KEY: process.env.OPENAI_API_KEY,
  OPENAI_MODEL: 'gpt-3.5-turbo',
  OPENAI_ENDPOINT: 'https://api.openai.com/v1/chat/completions'
});

import puppeteer from 'puppeteer';

async function createCallHandler() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setViewport({width: 1080, height: 1024});
  return async (funcName: string, args: any[]) => {
    switch (funcName) {
      case 'goto':  {
        const [input] = args;
        await page.goto(input);
        await page.waitForSelector('body');
      }
      case 'type': {
        const [selector, text] = args;
        await page.waitForSelector(selector);
        page.type(selector, text);
      }
      case 'click': {
        const [selector] = args;
        await page.waitForSelector(selector);
        page.click(selector, text);
      }
    }  
  }
}

const schema = `// This is as schema for browser controll api with puppeteer
export type API = {
  // open url
  goto(url: string): void,
  // type url to string
  type(selector: string, value: string): void,
  // click selector element
  click(selector: string): void
}
`;

const translator = createProgramTranslator(model, schema);

async function main(request: string) {
  const response = await translator.translate(request);
  console.log("----------");
  console.log(JSON.stringify(response, null, 2));
  // console.log(response.)

  const program = response.data;
  console.log("----------");
  console.log(getData(translator.validator.createModuleTextFromJson(program)));

  const handler = await createCallHandler();
  const result = await evaluateJsonProgram(program, handler);
  console.log("----------");
}

const request = process.argv[2];
main(request).catch(console.error);

Strictly speaking, the API needs to be promisified, and this code is just to give a general idea and won't actually run, but it produces the following output.

$ pnpm tsx src/__browser.ts 'Open Google, type "puppeteer" into the input form, and click the search button'
{
  "success": true,
  "data": {
    "@steps": [
      {
        "@func": "goto",
        "@args": [
          "https://www.google.com"
        ]
      },
      {
        "@func": "type",
        "@args": [
          "input[name='q']",
          "puppeteer"
        ]
      },
      {
        "@func": "click",
        "@args": [
          "input[name='btnK']"
        ]
      }
    ]
  }
}
----------
import { API } from "./schema";
function program(api: API) {
  const step1 = api.goto("https://www.google.com");
  const step2 = api.type("input[name='q']", "puppeteer");
  return api.click("input[name='btnK']");
}

It's unclear if the selectors are correct, but in practice, you would insert a phase to extract the HTML and search it. In that case, you might want something like a "state" for storage.

In the end, the standard JSON Program starts feeling a bit insufficient, so it's likely you'll end up building your own.

What can be learned from Typechat

By providing an API schema and limiting the targets for code generation to those calling steps, high-precision output can be obtained.
Although I won't go into detail here, ChatGPT Plugins are also designed this way, so it can be considered a best practice.

One insight from the validator part is that by providing information on how a failure occurred, it is possible to enable automatic retries to some extent.

Discussion