🐨

新しいブラウザ操作系エージェントのworkflow-useがかなり良さそうな予感

ikebowsan

2025/05/25に公開

https://github.com/browser-use/workflow-use
Browser　Useから新しいブラウザ操作系エージェントが登場しました

めちゃくちゃ魅力的だったので紹介します。

 従来のブラウザ操作系エージェントbrowser-useに限らず、従来のブラウザ操作系エージェントはユーザーからの自然言語な指令をもとにブラウザを操作します。

AIエージェントは画面キャプチャ + DOMの取得 → キャプチャを解析 → クリックすべき要素を推論 → playwrightで操作をループしてタスクを行います。
現在僕もよく使っているのですが、何点か課題があります。

 どうしてもアクションに揺らぎがある自然言語での指示である以上仕方ないのですが、「〇〇をクリックして〇〇をしてほしい」と言った細かい指示はプロンプトのチューニングが不可欠であり使用モデルの賢さにも依存します。

 処理に時間がかかる画面キャプチャ + DOMの取得 → キャプチャを解析 → クリックすべき要素を推論 → playwrightで操作...を繰り返すのでどうしても処理に時間がかかってしまいます。

ローカルのPCで動かすとかであればいいのですが、VMとかにデプロイして運用に使うとなるとちょっとしんどいです。

 workflow-use新しく登場したworkflow-useは上記で挙げたような課題を解決することができます。

従来のような自然言語によるブラウザの操作ではなく、予めて定義したworkflowに準じてブラウザを操作します。
定義するワークフローはユーザー自身で事前に定義をします。

定義するための機能もworkflow-useの中にAPIとして備わっており、コマンドを実行して、AI Agentに操作してほしいアクションをユーザーがお手本として実践します。

それをレコーディングしてワークフローをjson形式で吐き出してくれます。

さらに可変しそうなところ（inputタグやselectボタンなど）は変数として外出ししてくれるため、AIにブラウザ操作を実行してもらうときに毎回変更することができます。

 メリット自然言語による揺らぎが発生しない
あらかじめ決められたフローに沿って実行されるのでトークン消費量、処理速度が非常に速い
自分でワークフローを後から修正することが可能
フローの1部分だけAI Agentに推論して委ねるってことも可能

 デメリットブラウザの仕様やUIの変更にめっぽう弱い
パッケージとして提供されてない

 使い分け上記内容からbrowser-useの完全上位互換ってわけではないことが分かると思います。

ブラウザ操作をしてやって欲しいタスクによって使い分けが必要です。
browser-use
何か調べ物をして欲しいとき
最新の情報をとってきて欲しいとき
多少タスク操作に揺らぎがあっても問題ないとき
workflow-use
APIが提供されてないシステム（業務向けや社内向けシステムなど）をAI Agentで自動化したいとき
レガシーシステムなどでUIの変更が滅多に発生しないとき
AIに作業して欲しいこと、クリックして欲しいボタンやサイトなどが予め決まっているとき
AIによる揺らぎを極限まで回避したいとき

 使い方
 前提現状OpenAIしかサポートしてないのです。

自分はAzure OpenAIを使うのでコードの修正が必要です。

 1. 環境構築以下のリポジトリにアクセスして、git clone

https://github.com/browser-use/workflow-use

READMEの通りにコマンドを実行

https://github.com/browser-use/workflow-use/blob/main/README.md

 2. Azure OpenAI仕様に変更お使いのエディタでChatOpenAIで全体検索をかける


以下のような感じで、ChatOpenAI→AzureChatOpenAIに変えます。
cli.py
from langchain_openai import AzureChatOpenAI

from workflow_use.builder.service import BuilderService
from workflow_use.controller.service import WorkflowController
from workflow_use.recorder.service import RecordingService  # Added import
from workflow_use.workflow.service import Workflow

# Placeholder for recorder functionality
# from src.recorder.service import RecorderService

app = typer.Typer(
	name='workflow-cli',
	help='A CLI tool to create and run workflows.',
	add_completion=False,
	no_args_is_help=True,
)

# Default LLM instance to None
llm_instance = None
try:
	llm_instance = AzureChatOpenAI(
		openai_api_version="2024-10-21",
		azure_endpoint="<your azure openai endpoint>",
		azure_deployment="gpt-4.1",
		model="gpt-4.1",
		validate_base_url=False,
		api_key="<your aoai api key>"
	)

このままだとエラーになっちゃうのでもう一個修正が必要です。

workflow-use/workflow/service.pyのコードを一部改修します。
service.py
	async def _run_agent_step(self, step: AgenticWorkflowStep) -> AgentHistoryList | dict[str, Any]:
		"""Spin-up an Agent based on step dictionary."""
		if self.llm is None:
			raise ValueError("An 'llm' instance must be supplied for agent-based steps")

		task: str = step.task
		max_steps: int = step.max_steps or 5

		agent = Agent(
			task=task,
			llm=self.llm,
			browser=self.browser,
			browser_context=self.browser_context,
            　　　　　　　　# ↓↓以下を追加 
			tool_calling_method="function_calling",
			use_vision=True,  # Consider making this configurable via WorkflowStep schema
		)
		return await agent.run(max_steps=max_steps)

 3. 実際に試す今回は「Microsoftの最新のイベントに出席登録」というタスクでやってみます。

まずはワークフローを作成します。
以下のコマンドを実行
python cli.py create-workflow

するとデフォルトブラウザのシークレットウィンドウが立ち上がります。


リンクを入力して、登録ボタンをクリックして新しいタブを立ち上げて、情報を入力します。








登録ボタンを押して、AIエージェントにやって欲しいタスクを一通り再現したらターミナルに戻ってctrl + cをクリック。

すると、ワークフローの登録をするための質問がいくつかくるので適当に入力。


以下のようなワークフローが作成されました。
{
  "workflow_analysis": "This workflow automates the registration process for a Microsoft Virtual Training Event. It begins by navigating to the event listing page, then locates and clicks on a specific event registration link, taking the user to the registration form. The workflow proceeds to fill out the registration form fields such as first name, last name, email, phone number, job title, company name, country, and company size, and finally submits the form. Key dynamic inputs are the user's first name, last name, email, phone number, job title, company name, country, and company size, which must be provided for the automation to be reusable and flexible. Agentic steps are not required, as the form fields use stable selectors and all form values are directly determined by user input. Placeholder navigation and interaction steps (like excessive key presses and navigation refreshes) are condensed into streamlined deterministic actions for efficiency. All relevant values should be parameterized to allow the workflow's reuse for different registrants.",
  "name": "microsoft_virtual_training_event_registration",
  "description": "Automates registration for a Microsoft Dynamics 365 Virtual Training Day on the official Microsoft events site.",
  "version": "1.0",
  "steps": [
    {
      "description": "Navigate to the Microsoft Japan event listing page.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "navigation",
      "url": "https://www.microsoft.com/ja-jp/events/top/training-days?activetab=a1%3aprimaryr3"
    },
    {
      "description": "Click the registration link for the Microsoft Dynamics 365 Virtual Training Day event.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "click",
      "cssSelector": "a.cta[role*=\" \"][aria-label*=\"\u767b\u9332: 5 \u6708 26\u65e5\uff08\u6708\uff09\u30015 \u6708 27\u65e5\uff08\u706b\uff09...Microsoft Dynamics 365 Virtual Training Day: \u57fa\u790e(ERP)...\"]",
      "xpath": "id(\"tableV2Row-uida79j7n\")/td[3]/div[1]/div[1]/a[1]",
      "elementTag": "A",
      "elementText": "\u767b\u9332"
    },
    {
      "description": "Navigate to the specific event registration form.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "navigation",
      "url": "https://msevents.microsoft.com/event?id=1307512370"
    },
    {
      "description": "Fill in the first name field.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "input.form-control[type=\"text\"][id=\"firstName\"]",
      "value": "{first_name}",
      "xpath": "id(\"firstName\")",
      "elementTag": "INPUT"
    },
    {
      "description": "Fill in the last name field.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "input.form-control[type=\"text\"][id=\"lastName\"]",
      "value": "{last_name}",
      "xpath": "id(\"lastName\")",
      "elementTag": "INPUT"
    },
    {
      "description": "Fill in the email field.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "input.form-control[type=\"email\"][id=\"email\"]",
      "value": "{email}",
      "xpath": "id(\"email\")",
      "elementTag": "INPUT"
    },
    {
      "description": "Fill in the phone number field.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "input.form-control[type=\"text\"][id=\"640735e1-9bbc-ea11-a812-000d3a321938\"]",
      "value": "{phone}",
      "xpath": "id(\"640735e1-9bbc-ea11-a812-000d3a321938\")",
      "elementTag": "INPUT"
    },
    {
      "description": "Fill in the job title field.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "input.form-control[type=\"text\"][id=\"d4ffb6df-1ef5-e911-a98d-000d3a30dc0a\"]",
      "value": "{job_title}",
      "xpath": "id(\"d4ffb6df-1ef5-e911-a98d-000d3a30dc0a\")",
      "elementTag": "INPUT"
    },
    {
      "description": "Fill in the company name field.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "input.form-control[type=\"text\"][id=\"46165aee-289a-ea11-a811-000d3a5cec97\"]",
      "value": "{company_name}",
      "xpath": "id(\"46165aee-289a-ea11-a811-000d3a5cec97\")",
      "elementTag": "INPUT"
    },
    {
      "description": "Select the country from the dropdown.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "select.form-select[id=\"6ccc2d02-2b9a-ea11-a811-000d3a5cec97\"][aria-label=\"Country\"]",
      "value": "{country}",
      "xpath": "id(\"6ccc2d02-2b9a-ea11-a811-000d3a5cec97\")",
      "elementTag": "SELECT"
    },
    {
      "description": "Choose the company size from the dropdown.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "input",
      "cssSelector": "select.form-select[id=\"b9f74f51-4664-ef11-a671-000d3a355617\"][aria-label*=\"What best describes the size of your organization?\"]",
      "value": "{company_size}",
      "xpath": "id(\"b9f74f51-4664-ef11-a671-000d3a355617\")",
      "elementTag": "SELECT"
    },
    {
      "description": "Click the submit button to complete registration.",
      "output": null,
      "timestamp": null,
      "tabId": null,
      "type": "click",
      "cssSelector": "span",
      "xpath": "id(\"submitbtn\")/span[1]",
      "elementTag": "SPAN",
      "elementText": "\u767b\u9332\u3059\u308b"
    }
  ],
  "input_schema": [
    {
      "name": "first_name",
      "type": "string",
      "required": true
    },
    {
      "name": "last_name",
      "type": "string",
      "required": true
    },
    {
      "name": "email",
      "type": "string",
      "required": true
    },
    {
      "name": "phone",
      "type": "string",
      "required": true
    },
    {
      "name": "job_title",
      "type": "string",
      "required": true
    },
    {
      "name": "company_name",
      "type": "string",
      "required": true
    },
    {
      "name": "country",
      "type": "string",
      "required": true
    },
    {
      "name": "company_size",
      "type": "string",
      "required": true
    }
  ]
}

動かしてみます。

以下のコマンドを実行
python cli.py run-workflow tmp/<your workflow>.workflow.json

するとフォームに入力する値は変数として認識してくれているのでユーザーに入力する値を促してきます。


動画をアップできないんで、ほんの一部になっちゃいますが、自分の操作と全く同じように処理してくれました!

ヘッドウォータース

株式会社ヘッドウォータースのテックブログです。 AIエージェント、生成AI、LLM、Azureのサービスや資格、IoT、XR系などData&AIとApp modernizeに関して幅広く投稿します！

従来のブラウザ操作系エージェント

どうしてもアクションに揺らぎがある

処理に時間がかかる

workflow-use

メリット

デメリット

使い分け

使い方

前提

1. 環境構築

2. Azure OpenAI仕様に変更

3. 実際に試す

Discussion