Closed2024/12/16にクローズ12

Playwright × AIのStagehandを試す

 WhatPlaywrightとAIを使った自動ブラウザ操作を調べていて見つけたStagehandを試してみる。
https://github.com/browserbase/stagehand

セットアップ

プロジェクト作成

$mkdir stagehand-test
$cd stagehand-test
$npm init
$npm install @browserbasehq/stagehand zod

.envを追加

OPENAI_API_KEY=sk-xxxxxxx...

ブラウザをインストール

$npm exec playwright install

tsxをインストール

.tsの実行用として。

$npm install -D tsx

src/index.tsの作成

READMEに載っている以下のコードをそのまま。

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";

const stagehand = new Stagehand({
  env: "LOCAL",
});

await stagehand.init();
await stagehand.page.goto("https://github.com/browserbase/stagehand");
await stagehand.act({ action: "click on the contributors" });
const contributor = await stagehand.extract({
  instruction: "extract the top contributor",
  schema: z.object({
    username: z.string(),
    url: z.string(),
  }),
});
await stagehand.close();
console.log(`Our favorite contributor is ${contributor.username}`);

実行。

$npx tsx src/index.ts

エラー。

ERROR: Top-level await is currently not supported with the "cjs" output format

ESMにすれば良いのでpackage.jsonに下記を追加。

+  "type": "module",

再度実行。

$npx tsx src/index.ts

実行ログを見る

以下のようなログが。

2024-12-15T20:20:27.713Z::[stagehand:openai] response {"response":{"value":"{\"id\":\"chatcmpl-AepOEKCge9h6GglnJafgei3RcTZfb\",\"object\":\"chat.completion\",\"created\":1734294026,\"model\":\"gpt-4o-2024-08-06\",\"choices\":[{\"index\":0,\"message\":{\"role\":\"assistant\",\"content\":null,\"tool_calls\":[{\"id\":\"call_fdfjHX21BEywvqJ44E7Haq9y\",\"type\":\"function\",\"function\":{\"name\":\"doAction\",\"arguments\":\"{\\\"method\\\":\\\"click\\\",\\\"element\\\":68,\\\"args\\\":[],\\\"step\\\":\\\"Clicked on the 'Contributors' link to navigate to the contributors section of the repository.\\\",\\\"why\\\":\\\"The 'Contributors' link is expected to lead to the section where contributors are listed, which is the user's goal to access.\\\",\\\"completed\\\":true}\"}}],\"refusal\":null},\"logprobs\":null,\"finish_reason\":\"tool_calls\"}],\"usage\":{\"prompt_tokens\":5131,\"completion_tokens\":73,\"total_tokens\":5204,\"prompt_tokens_details\":{\"cached_tokens\":0,\"audio_tokens\":0},\"completion_tokens_details\":{\"reasoning_tokens\":0,\"audio_tokens\":0,\"accepted_prediction_tokens\":0,\"rejected_prediction_tokens\":0}},\"system_fingerprint\":\"fp_a79d8dac1f\"}","type":"object"},"requestId":{"value":"nhqxlx97dem","type":"string"}}
2024-12-15T20:20:27.714Z::[stagehand:action] received response from LLM {"response":{"value":"{\"method\":\"click\",\"element\":68,\"args\":[],\"step\":\"Clicked on the 'Contributors' link to navigate to the contributors section of the repository.\",\"why\":\"The 'Contributors' link is expected to lead to the section where contributors are listed, which is the user's goal to access.\",\"completed\":true}","type":"object"}}

2024-12-15T20:20:27.714Z::[stagehand:action] executing method {"method":{"value":"click","type":"string"},"elementId":{"value":"68","type":"integer"},"xpaths":{"value":"[\"/html/body[1]/div[1]/div[4]/div[1]/main[1]/turbo-frame[1]/div[1]/div[1]/div[1]/nav[1]/a[2]\",\"//nav[@aria-label='Insights']/a[2]\"]","type":"object"},"args":{"value":"[]","type":"object"}}

2024-12-15T20:20:27.728Z::[stagehand:action] page URL before action {"url":{"value":"https://github.com/browserbase/stagehand/pulse","type":"string"}}
2024-12-15T20:20:27.770Z::[stagehand:action] clicking element, checking for page navigation {"xpath":{"value":"/html/body[1]/div[1]/div[4]/div[1]/main[1]/turbo-frame[1]/div[1]/div[1]/div[1]/nav[1]/a[2]","type":"string"}}

2024-12-15T20:20:29.272Z::[stagehand:action] clicked element {"newOpenedTab":{"value":"no new tabs opened","type":"string"}}

2024-12-15T20:20:29.274Z::[stagehand:action] finished waiting for (possible) page navigation 
2024-12-15T20:20:29.274Z::[stagehand:action] new page detected with URL {"url":{"value":"https://github.com/browserbase/stagehand/graphs/contributors","type":"string"}}
2024-12-15T20:20:29.660Z::[stagehand:action] action marked as completed, verifying if this is true... {"action":{"value":"click on the contributors","type":"string"}}

2024-12-15T20:20:29.716Z::[stagehand:openai] creating chat completion {"options":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"\\nYou are a browser automation assistant. The job has given you a goal and a list of steps that have been taken so far. Your job is to determine if the user's goal has been completed based on the provided information.\\n\\n# Input\\nYou will receive:\\n1. The user's goal: A clear description of what the user wants to achieve.\\n2. Steps taken so far: A list of actions that have been performed up to this point.\\n3. An image of the current page\\n\\n# Your Task\\nAnalyze the provided information to determine if the user's goal has been fully completed.\\n\\n# Output\\nReturn a boolean value:\\n- true: If the goal has been definitively completed based on the steps taken and the current page.\\n- false: If the goal has not been completed or if there's any uncertainty about its completion.\\n\\n# Important Considerations\\n- False positives are okay. False negatives are not okay.\\n- Look for evidence of errors on the page or something having gone wrong in completing the goal. If one does not exist, return true.\\n\"},{\"role\":\"user\",\"content\":\"\\n# My Goal\\nclick on the contributors\\n\\n# Steps You've Taken So Far\\n\\n## Step: Clicked on the 'Insights' tab to navigate to the insights section of the repository.\\n  Element: <a id=\\\"insights-tab\\\" class=\\\"UnderlineNav-item no-wrap js-responsive-underlinenav-item js-selected-navigation-item\\\" href=\\\"/browserbase/stagehand/pulse\\\" data-tab-item=\\\"i6insights-tab\\\" data-selected-links=\\\"repo_graphs repo_contributors dependency_graph dependabot_updates pulse people community /browserbase/stagehand/pulse\\\" data-pjax=\\\"#repo-content-pjax-container\\\" data-turbo-frame=\\\"repo-content-turbo-frame\\\" data-analytics-event=\\\"{\\\"category\\\"\\n  Action: click\\n  Reasoning: The 'Insights' tab typically contains information about contributors, which is the goal to access.\\n  Result (Important): Page URL changed from https://github.com/browserbase/stagehand to https://github.com/browserbase/stagehand/pulse\\n\\n## Step: Clicked on the 'Contributors' link to navigate to the contributors section of the repository.\\n  Element: <a class=\\\"js-selected-navigation-item menu-item\\\" href=\\\"/browserbase/stagehand/graphs/contributors\\\" data-selected-links=\\\" /browserbase/stagehand/graphs/contributors\\\">Contributors</a>\\n  Action: click\\n  Reasoning: The 'Contributors' link is expected to lead to the section where contributors are listed, which is the user's goal to access.\\n  Result (Important): Page URL changed from https://github.com/browserbase/stagehand/pulse to https://github.com/browserbase/stagehand/graphs/contributors\\n\\n\\n\"}],\"temperature\":0.1,\"top_p\":1,\"frequency_penalty\":0,\"presence_penalty\":0,\"response_model\":{\"name\":\"Verification\",\"schema\":{\"_def\":{\"unknownKeys\":\"strip\",\"catchall\":{\"_def\":{\"typeName\":\"ZodNever\"},\"~standard\":{\"version\":1,\"vendor\":\"zod\"}},\"typeName\":\"ZodObject\"},\"~standard\":{\"version\":1,\"vendor\":\"zod\"},\"_cached\":null}},\"requestId\":\"nhqxlx97dem\"}","type":"object"},"modelName":{"value":"gpt-4o","type":"string"}}

2024-12-15T20:20:29.718Z::[stagehand:openai] creating chat completion {"openAiOptions":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"\\nYou are a browser automation assistant. The job has given you a goal and a list of steps that have been taken so far. Your job is to determine if the user's goal has been completed based on the provided information.\\n\\n# Input\\nYou will receive:\\n1. The user's goal: A clear description of what the user wants to achieve.\\n2. Steps taken so far: A list of actions that have been performed up to this point.\\n3. An image of the current page\\n\\n# Your Task\\nAnalyze the provided information to determine if the user's goal has been fully completed.\\n\\n# Output\\nReturn a boolean value:\\n- true: If the goal has been definitively completed based on the steps taken and the current page.\\n- false: If the goal has not been completed or if there's any uncertainty about its completion.\\n\\n# Important Considerations\\n- False positives are okay. False negatives are not okay.\\n- Look for evidence of errors on the page or something having gone wrong in completing the goal. If one does not exist, return true.\\n\"},{\"role\":\"user\",\"content\":\"\\n# My Goal\\nclick on the contributors\\n\\n# Steps You've Taken So Far\\n\\n## Step: Clicked on the 'Insights' tab to navigate to the insights section of the repository.\\n  Element: <a id=\\\"insights-tab\\\" class=\\\"UnderlineNav-item no-wrap js-responsive-underlinenav-item js-selected-navigation-item\\\" href=\\\"/browserbase/stagehand/pulse\\\" data-tab-item=\\\"i6insights-tab\\\" data-selected-links=\\\"repo_graphs repo_contributors dependency_graph dependabot_updates pulse people community /browserbase/stagehand/pulse\\\" data-pjax=\\\"#repo-content-pjax-container\\\" data-turbo-frame=\\\"repo-content-turbo-frame\\\" data-analytics-event=\\\"{\\\"category\\\"\\n  Action: click\\n  Reasoning: The 'Insights' tab typically contains information about contributors, which is the goal to access.\\n  Result (Important): Page URL changed from https://github.com/browserbase/stagehand to https://github.com/browserbase/stagehand/pulse\\n\\n## Step: Clicked on the 'Contributors' link to navigate to the contributors section of the repository.\\n  Element: <a class=\\\"js-selected-navigation-item menu-item\\\" href=\\\"/browserbase/stagehand/graphs/contributors\\\" data-selected-links=\\\" /browserbase/stagehand/graphs/contributors\\\">Contributors</a>\\n  Action: click\\n  Reasoning: The 'Contributors' link is expected to lead to the section where contributors are listed, which is the user's goal to access.\\n  Result (Important): Page URL changed from https://github.com/browserbase/stagehand/pulse to https://github.com/browserbase/stagehand/graphs/contributors\\n\\n\\n\"},{\"role\":\"user\",\"content\":[{\"type\":\"image_url\",\"image_url\":{\"url\":\"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQA。。。==\"}},{\"type\":\"text\",\"text\":\"This is a screenshot of the whole visible page.\"}]}],\"temperature\":0.1,\"top_p\":1,\"frequency_penalty\":0,\"presence_penalty\":0,\"model\":\"gpt-4o\"}","type":"object"}}

2024-12-15T20:20:31.664Z::[stagehand:openai] response {"response":{"value":"{\"id\":\"chatcmpl-AepOJ9NWtKxiG5EWW2jIeb6XCwg3y\",\"object\":\"chat.completion\",\"created\":1734294031,\"model\":\"gpt-4o-2024-08-06\",\"choices\":[{\"index\":0,\"message\":{\"role\":\"assistant\",\"content\":\"{\\\"completed\\\":true}\",\"refusal\":null},\"logprobs\":null,\"finish_reason\":\"stop\"}],\"usage\":{\"prompt_tokens\":1727,\"completion_tokens\":5,\"total_tokens\":1732,\"prompt_tokens_details\":{\"cached_tokens\":0,\"audio_tokens\":0},\"completion_tokens_details\":{\"reasoning_tokens\":0,\"audio_tokens\":0,\"accepted_prediction_tokens\":0,\"rejected_prediction_tokens\":0}},\"system_fingerprint\":\"fp_a79d8dac1f\"}","type":"object"},"requestId":{"value":"nhqxlx97dem","type":"string"}}

2024-12-15T20:20:31.666Z::[stagehand:action] action completion verification result {"action":{"value":"click on the contributors","type":"string"},"result":{"value":"true","type":"boolean"}}
2024-12-15T20:20:31.666Z::[stagehand:action] action completed successfully 
2024-12-15T20:20:31.667Z::[stagehand:extract] running extract {"instruction":{"value":"extract the top contributor","type":"string"},"requestId":{"value":"bcrmjsioyw4","type":"string"},"modelName":{"value":"gpt-4o","type":"string"}}
2024-12-15T20:20:31.667Z::[stagehand:extraction] starting extraction {"instruction":{"value":"extract the top contributor","type":"string"}}

2024-12-15T20:20:31.844Z::[stagehand:extraction] received output from processDom. {"chunk":{"value":"0","type":"integer"},"chunks_left":{"value":"2","type":"integer"},"chunks_total":{"value":"2","type":"integer"}}

2024-12-15T20:20:31.844Z::[stagehand:openai] creating chat completion {"options":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"You are extracting content on behalf of a user. You will be given: 1. An instruction 2. A list of DOM elements to extract from Print the exact text from the DOM elements with all symbols, characters, and endlines as is. Print null or an empty string if no new information is found. \"...}}


2024-12-15T20:20:31.845Z::[stagehand:openai] creating chat completion {"openAiOptions":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"You are extracting content on behalf of a user. You will be given: 1. An instruction 2. A list of DOM elements to extract from Print the exact text from the DOM elements with all symbols, characters, and endlines as is. Print null or an empty string if no new information is found. \"}...{\\\"location\\\":\\\"footer\\\",\\\"action\\\":\\\"dont_share_info\\\",\\\"context\\\":\\\"subfooter\\\",\\\"tag\\\":\\\"link\\\",\\\"label\\\":\\\"dont_share_info_link_subfooter_footer\\\"}\\\">Do not share my personal information</button>\\n112:Do not share my personal information\\n\"}],\"temperature\":0.1,\"top_p\":1,\"frequency_penalty\":0,\"presence_penalty\":0,\"model\":\"gpt-4o\"}","type":"object"}}

2024-12-15T20:20:32.890Z::[stagehand:openai] response {"response":{"value":"{\"id\":\"chatcmpl-AepOKAOUXkPs6MLuRL1TScKAvhRAB\",\"object\":\"chat.completion\",\"created\":1734294032,\"model\":\"gpt-4o-2024-08-06\",\"choices\":[{\"index\":0,\"message\":{\"role\":\"assistant\",\"content\":\"{\\\"username\\\":\\\"browserbase\\\",\\\"url\\\":\\\"/browserbase\\\"}\",\"refusal\":null},\"logprobs\":null,\"finish_reason\":\"stop\"}],\"usage\":{\"prompt_tokens\":4397,\"completion_tokens\":12,\"total_tokens\":4409,\"prompt_tokens_details\":{\"cached_tokens\":0,\"audio_tokens\":0},\"completion_tokens_details\":{\"reasoning_tokens\":0,\"audio_tokens\":0,\"accepted_prediction_tokens\":0,\"rejected_prediction_tokens\":0}},\"system_fingerprint\":\"fp_9faba9f038\"}","type":"object"},"requestId":{"value":"bcrmjsioyw4","type":"string"}}


2024-12-15T20:20:32.891Z::[stagehand:openai] creating chat completion {"options":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"You are tasked with refining and filtering information for the final output based on newly extracted and previously extracted content. Your responsibilities are:\\n1. Remove exact duplicates for elements in arrays and objects.\\n2. For text fields, append or update relevant text if the new content is an extension, replacement, or continuation.\\n3. For non-text fields (e.g., numbers, booleans), update with new values if they differ.\\n4. Add any completely new fields or objects.\\n\\nReturn the updated content that includes both the previous content and the new, non-duplicate, or extended information.\"},{\"role\":\"user\",\"content\":\"Instruction: extract the top contributor\\nPreviously extracted content: {}\\nNewly extracted content: {\\n  \\\"username\\\": \\\"browserbase\\\",\\n  \\\"url\\\": \\\"/browserbase\\\"\\n}\\nRefined content:\"}],\"response_model\":...

2024-12-15T20:20:32.892Z::[stagehand:openai] creating chat completion {"openAiOptions":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"You are tasked with refining and filtering information for the final output based on newly extracted and previously extracted content. Your responsibilities are:\\n1. Remove exact duplicates for elements in arrays and objects.\\n2. For text fields, append or update relevant text if the new content is an extension, replacement, or continuation.\\n3. For non-text fields (e.g., numbers, booleans), update with new values if they differ.\\n4. Add any completely new fields or objects.\\n\\nReturn the updated content that includes both the previous content and the new, non-duplicate, or extended information.\"},{\"role\":\"user\",\"content\":\"Instruction: extract the top contributor\\nPreviously extracted content: {}\\nNewly extracted content: {\\n  \\\"username\\\": \\\"browserbase\\\",\\n  \\\"url\\\": \\\"/browserbase\\\"\\n}\\nRefined content:\"}],\"temperature\":0.1,\"top_p\":1,\"frequency_penalty\":0,\"presence_penalty\":0,\"model\":\"gpt-4o\"}","type":"object"}}


2024-12-15T20:20:33.614Z::[stagehand:openai] response {"response":{"value":"{\"id\":\"chatcmpl-AepOLsCtqViUuUyXYqKDqXLVAsvqZ\",\"object\":\"chat.completion\",\"created\":1734294033,\"model\":\"gpt-4o-2024-08-06\",\"choices\":[{\"index\":0,\"message\":{\"role\":\"assistant\",\"content\":\"{\\\"username\\\":\\\"browserbase\\\",\\\"url\\\":\\\"/browserbase\\\"}\",\"refusal\":null},\"logprobs\":null,\"finish_reason\":\"stop\"}],\"usage\":{\"prompt_tokens\":221,\"completion_tokens\":12,\"total_tokens\":233,\"prompt_tokens_details\":{\"cached_tokens\":0,\"audio_tokens\":0},\"completion_tokens_details\":{\"reasoning_tokens\":0,\"audio_tokens\":0,\"accepted_prediction_tokens\":0,\"rejected_prediction_tokens\":0}},\"system_fingerprint\":\"fp_a79d8dac1f\"}","type":"object"},"requestId":{"value":"bcrmjsioyw4","type":"string"}}



2024-12-15T20:20:33.615Z::[stagehand:openai] creating chat completion {"options":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"You are an AI assistant tasked with evaluating the progress and completion status of an extraction task.\\nAnalyze the extraction response and determine if the task is completed or if more information is needed.\\n\\nStrictly abide by the following criteria:\\n1. Once the instruction has been satisfied by the current extraction response, ALWAYS set completion status to true and stop processing, regardless of remaining chunks.\\n2. Only set completion status to false if BOTH of these conditions are true:\\n   - The instruction has not been satisfied yet\\n   - There are still chunks left to process (chunksTotal > chunksSeen)\"},{\"role\":\"user\",\"content\":\"Instruction: extract the top contributor\\nExtracted content: {\\n  \\\"username\\\": \\\"browserbase\\\",\\n  \\\"url\\\": \\\"/browserbase\\\"\\n}\\nchunksSeen: 0\\nchunksTotal: 2\"}],..."modelName":{"value":"gpt-4o","type":"string"}}

2024-12-15T20:20:33.617Z::[stagehand:openai] creating chat completion {"openAiOptions":{"value":"{\"messages\":[{\"role\":\"system\",\"content\":\"You are an AI assistant tasked with evaluating the progress and completion status of an extraction task.\\nAnalyze the extraction response and determine if the task is completed or if more information is needed.\\n\\nStrictly abide by the following criteria:\\n1. Once the instruction has been satisfied by the current extraction response, ALWAYS set completion status to true and stop processing, regardless of remaining chunks.\\n2. Only set completion status to false if BOTH of these conditions are true:\\n   - The instruction has not been satisfied yet\\n   - There are still chunks left to process (chunksTotal > chunksSeen)\"},{\"role\":\"user\",\"content\":\"Instruction: extract the top contributor\\nExtracted content: {\\n  \\\"username\\\": \\\"browserbase\\\",\\n  \\\"url\\\": \\\"/browserbase\\\"\\n}\\nchunksSeen: 0\\nchunksTotal: 2\"}],\"temperature\":0.1,\"top_p\":1,\"frequency_penalty\":0,\"presence_penalty\":0,\"model\":\"gpt-4o\"}","type":"object"}}
2024-12-15T20:20:34.306Z::[stagehand:openai] response {"response":{"value":"{\"id\":\"chatcmpl-AepOLMjD8FsTkCJrVxsjmNHnlilV6\",\"object\":\"chat.completion\",\"created\":1734294033,\"model\":\"gpt-4o-2024-08-06\",\"choices\":[{\"index\":0,\"message\":{\"role\":\"assistant\",\"content\":\"{\\\"progress\\\":\\\"The top contributor has been identified as 'browserbase'.\\\",\\\"completed\\\":true}\",\"refusal\":null},\"logprobs\":null,\"finish_reason\":\"stop\"}],\"usage\":{\"prompt_tokens\":260,\"completion_tokens\":19,\"total_tokens\":279,\"prompt_tokens_details\":{\"cached_tokens\":0,\"audio_tokens\":0},\"completion_tokens_details\":{\"reasoning_tokens\":0,\"audio_tokens\":0,\"accepted_prediction_tokens\":0,\"rejected_prediction_tokens\":0}},\"system_fingerprint\":\"fp_9faba9f038\"}","type":"object"},"requestId":{"value":"bcrmjsioyw4","type":"string"}}
2024-12-15T20:20:34.307Z::[stagehand:extraction] received extraction response {"extraction_response":{"value":"{\"username\":\"browserbase\",\"url\":\"/browserbase\",\"metadata\":{\"progress\":\"The top contributor has been identified as 'browserbase'.\",\"completed\":true}}","type":"object"}}
2024-12-15T20:20:34.307Z::[stagehand:extraction] got response {"extraction_response":{"value":"{\"username\":\"browserbase\",\"url\":\"/browserbase\",\"metadata\":{\"progress\":\"The top contributor has been identified as 'browserbase'.\",\"completed\":true}}","type":"object"}}


Our favorite contributor is browserbase

 ログの分類簡単に分類すると下記の通り、

 OpenAI とのやりとりログ（[stagehand:openai]）JSON形式でモデルの応答やトークン使用量、finish_reasonなどが書かれている

 ブラウザオートメーション（Stagehand）のアクションログ（[stagehand:action]）実行したアクション（クリック、URL遷移など）に関するログ。

例えば要素クリック前後のURL、クリックした要素の情報など。

 DOM解析・抽出ログ（[stagehand:extraction] / [stagehand:extract]）DOM要素の取得に関するログ。DOM要素一覧（0:,1:,2:...など）やテキスト内容、抽出した情報に関する記述。

コスト

ログが大きかったのでAPI使用量が気になったが、$0.06だった。

 プロンプトを眺めるfunction callingを使った目標達成型のLLM。
https://github.com/browserbase/stagehand/blob/473ca3fe7a8e97cf4337ddfa43aba8f0fdf93412/lib/prompt.ts

 actSystemPromptユーザーが達成したい「目標(goal)」をもとに、Playwrightアクションを行うためのシステムプロンプト。
ユーザーのゴールやこれまでのステップ、現在のDOM要素リストが与えられる

doAction、skipSectionという2種類のツールを使用すること
ゴール達成と判断できる場合はcompletedをtrueにする。

 verifyActCompletionSystemPromptユーザーのゴール、ステップのリスト、画像を元に目標が完了したかを判断する。

 ToolsdoAction、skipSectionのTool定義は下記。
export const actTools: Array<OpenAI.ChatCompletionTool> = [
  {
    type: "function",
    function: {
      name: "doAction",
      description:
        "execute the next playwright step that directly accomplishes the goal",
      parameters: {
        type: "object",
        required: ["method", "element", "args", "step", "completed"],
        properties: {
          method: {
            type: "string",
            description: "The playwright function to call.",
          },
          element: {
            type: "number",
            description: "The element number to act on",
          },
          args: {
            type: "array",
            description: "The required arguments",
            items: {
              type: "string",
              description: "The argument to pass to the function",
            },
          },
          step: {
            type: "string",
            description:
              "human readable description of the step that is taken in the past tense. Please be very detailed.",
          },
          why: {
            type: "string",
            description:
              "why is this step taken? how does it advance the goal?",
          },
          completed: {
            type: "boolean",
            description:
              "true if the goal should be accomplished after this step",
          },
        },
      },
    },
  },
  {
    type: "function",
    function: {
      name: "skipSection",
      description:
        "skips this area of the webpage because the current goal cannot be accomplished here",
      parameters: {
        type: "object",
        properties: {
          reason: {
            type: "string",
            description: "reason that no action is taken",
          },
        },
      },
    },
  },
];

 extractウェブページから情報を抽出（テキストやDOM要素）するためのプロンプト

 refine抽出したコンテンツを整理するためのプロンプト。
既存の抽出結果（previously extracted）と新たに取得した抽出結果（newly extracted）を比較し、重複を除去したり、情報を更新・追加したりして最終的な整形済みデータを生成するプロンプトを構築する。

 metadata抽出タスクが完了したかどうかを判定するプロンプト。

 observe指定の条件に合う要素をDOMの候補リストから抽出し、配列で返すようにするためのプロンプト。

 askユーザーの質問に短くシンプルに答えるためのプロンプト。

試してみる

自社サイトでマイページへのログインを試してみる。

https://www.piano.or.jp/

変数は以下のようにして当てられる。

await stagehand.act({
  action: "enter %username% into the username field",
  variables: {
    username: "john.doe@example.com",
  },
});

試すシナリオ

ピティナ・トップページからマイページを開く
ログインする
ログインユーザー名を取得する。

ログイン名はログイン後のヘッダから取得できます。

筆者の名前は黒田、ではありません。テストアカウントでテキトーにつけました。

<REDACTED>には実際の情報を入れています。

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";

const stagehand = new Stagehand({
  env: "LOCAL",
});

await stagehand.init();
await stagehand.page.goto("https://www.piano.or.jp/");
await stagehand.act({ action: "マイページを開く" });

await stagehand.act({
  action: "メールアドレスに %email% 、パスワードに %password%を 入力する。",
  variables: {
    email: "<REDACTED>",
    password: "<REDACTED>",
  }
});

await stagehand.act({
  action: "ログインボタンをクリックする",
});

const { loginName } = await stagehand.extract({
  instruction: "ログイン名を取得する",
  schema: z.object({
    loginName: z.string(),
  }),
});

console.log(loginName);

実行。

 $npx tsx src/index.ts

すると、30秒くらい探索しつつログインし、

2024-12-15T21:33:44.541Z::[stagehand:extraction] got response {"extraction_response":{"value":"{\"loginName\":\"黒田 テスト\",\"metadata\":{\"progress\":\"ログイン名 \\\"黒田 テスト\\\" が抽出されました。\",\"completed\":true}}","type":"object"}}
黒田 テスト

と正しく取得できた。

もう少し深掘り

モデルについて

執筆時点では

  private modelToProviderMap: { [key in AvailableModel]: ModelProvider } = {
    "gpt-4o": "openai",
    "gpt-4o-mini": "openai",
    "gpt-4o-2024-08-06": "openai",
    "o1-mini": "openai",
    "o1-preview": "openai",
    "claude-3-5-sonnet-latest": "anthropic",
    "claude-3-5-sonnet-20240620": "anthropic",
    "claude-3-5-sonnet-20241022": "anthropic",
  };

Geminiは使えない。

cache

init()でenableCachingをtrueにすると有効化できる。試してみると、

tmp/.cache/action_cache.json
tmp/.cache/llm_calls.json

が生成された。

{
  "2e35710e9ed9ff37dad921a339e72db4c422f64937b90f15dbafe52e0ccc4c38": {
    "data": {
      "id": "chatcmpl-AeqcnukK903dPRbqv18j0fyzP6YgI",
      "object": "chat.completion",
      "created": 1734298773,
      "model": "gpt-4o-2024-08-06",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": null,
            "tool_calls": [
              {
                "id": "call_Qxhb4vefxyzOAs17tRkFcseK",
                "type": "function",
                "function": {
                  "name": "skipSection",
                  "arguments": "{\"reason\":\"There are no input fields for email or password in the current DOM elements.\"}"
                }
              }
            ],
            "refusal": null
          },
          "logprobs": null,
          "finish_reason": "tool_calls"
        }
      ],...
}

useVisionオプション

actやobserveメソッドに存在。true/false/fallbackのどれか。
スクリーンショットを渡して画像認識をするかどうか。

変数

プレースホルダを使って渡す。

await stagehand.act({
  action: "enter %username% into the username field",
  variables: {
    username: "john.doe@example.com",
  },
});

extract

要素の取得。Zodでvalidateできる。

const price = await stagehand.extract({
  instruction: "extract the price of the item",
  schema: z.object({
    price: z.number(),
  }),
});

observe

現在のページで考えられるアクションを返す。
たとえばgoogleの検索画面を例にとると、

await stagehand.init();
await stagehand.page.goto("https://www.google.co.jp");
await stagehand.observe({ instruction: "可能なActionを日本語で具体的に羅列して" })

2024-12-15T21:50:35.275Z::[stagehand:observation] found elements {"elements":{"value":"[{\"description\":\"Googleについてリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[1]/a[1]\"},{\"description\":\"ストアリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[1]/a[2]\"},{\"description\":\"Gmailリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]\"},{\"description\":\"画像リンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[2]/a[1]\"},{\"description\":\"ログインリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[2]/a[1]\"},{\"description\":\"Google 検索ボタンをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[3]/form[1]/div[1]/div[1]/div[3]/center[1]/input[1]\"},{\"description\":\"I'm Feeling Luckyボタンをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[3]/form[1]/div[1]/div[1]/div[3]/center[1]/input[2]\"},{\"description\":\"ホリデーセール特価を見るリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[4]/div[1]/div[1]/div[1]/div[1]/div[3]/div[1]/promo-middle-slot[1]/div[1]/a[1]\"},{\"description\":\"Englishリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[4]/div[3]/div[1]/a[1]\"},{\"description\":\"広告リンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[6]/div[2]/div[1]/a[1]\"},{\"description\":\"ビジネスリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[6]/div[2]/div[1]/a[2]\"},{\"description\":\"検索の仕組みリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[6]/div[2]/div[1]/a[3]\"},{\"description\":\"プライバシーリンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[6]/div[2]/div[2]/a[1]\"},{\"description\":\"規約リンクをクリックする\",\"selector\":\"xpath=/html/body[1]/div[1]/div[6]/div[2]/div[2]/a[2]\"},{\"description\":\"設定を開く\",\"selector\":\"xpath=/html/body[1]/div[1]/div[6]/div[2]/div[2]/span[1]/span[1]/g-popup[1]/div[1]\"}]","type":"object"}}

 Tipstips集が載っている。
https://github.com/browserbase/stagehand/tree/main?tab=readme-ov-file#prompting-tips
ステップは細かく、具体的に。

 関連
 fuji-web似たようなAIによるブラウザ自動化として、fuji-webがあるらしい。（StagehandのREADMEに書いてある。）Chrome Extension。
https://github.com/normal-computing/fuji-web

 Zerostep似たような発想として@zerostep/playwrightが見つかる。
https://zerostep.com/
こんな感じ。
import { test, expect } from '@playwright/test'
import { ai } from '@zerostep/playwright'

test.describe('GitHub', () => {
  test('verify the number of labels in a repo', async ({ page }) => {
    await page.goto('https://github.com/zerostep-ai/zerostep')
    await ai(`Click on the Issues tabs`, { page, test })

    await page.waitForURL('https://github.com/zerostep-ai/zerostep/issues')
    await ai('Click on Labels', { page, test })

    await page.waitForURL('https://github.com/zerostep-ai/zerostep/labels')
    const numLabels = await ai('How many labels are listed?', { page, test })

    expect(parseInt(numLabels)).toEqual(9)
  })
})
ただし、無料プランだとaiメソッドは月500callまで。
https://github.com/zerostep-ai/zerostep

 Auto Playwright下記のようなコードになる。
import { test, expect } from "@playwright/test";
import { auto } from "auto-playwright";

test("auto Playwright example", async ({ page }) => {
  await page.goto("/");

  // `auto` can query data
  // In this case, the result is plain-text contents of the header
  const headerText = await auto("get the header text", { page, test });

  // `auto` can perform actions
  // In this case, auto will find and fill in the search text input
  await auto(`Type "${headerText}" in the search box`, { page, test });

  // `auto` can assert the state of the website
  // In this case, the result is a boolean outcome
  const searchInputHasHeaderText = await auto(`Is the contents of the search box equal to "${headerText}"?`, { page, test });

  expect(searchInputHasHeaderText).toBe(true);
});
https://github.com/lucgagan/auto-playwright

PlaywrightにAIを当てる話題は盛り上がって欲しいので記事にした。

このスクラップは2024/12/16にクローズされました