🔍

生成AIを活用したE2Eテスト：AutoE2Eの調査メモ

2025/09/29に公開

 はじめに生成AIを活用したEnd-to-Endテストを自動生成する研究というのがあったので、その詳細をのぞいてみます。

https://note.com/smorisaki/n/n2db887de1f39
上記の研究で使用したコードは以下になります。

https://github.com/parsaalian/autoe2e/tree/main
画面を辿っていって、機能の一覧と状態遷移図が作られているようです。

一方、私がみた範囲ではテストケースを生成するコードは、公開されていないように見えます。

（同じ結論になった人がIssueをたててます）

https://github.com/parsaalian/autoe2e/issues/1
!生成AI代節約の都合、机上でしかチェックができていません。

そのため、誤読、解析ミスなどが存在することをご承知ください。

 概要AutoE2Eのワークフローは以下のようになっています。


引用元
やっていることはconfigで指定したベースURLをもとに、実際にURLにアクセスして、ボタンやFormから行えるActionを抽出し、どの画面（状態）にどんな機能があるかをLLMで予測して記録していき、機能一覧と状態遷移図を作成しています。
https://github.com/parsaalian/autoe2e/blob/main/main.py
簡単な流れ
起点となるURLからページに含まれるボタンやForm処理などの操作（以後Action)を収集してStateを構築する。
クローリングキューに積む
クローリングキューが空になるまで以下を繰り返す
キューからStateを取得する
LLMを使用してStateのコンテキストを作成する
State中のActionすべてについて以下を実施
そのActionが実行をもどせない重要なActionか判断する
重要なActionではない場合
実際にActionを実行する
FormのActionの場合はLLMで実行に必要な情報を埋める

Action実行後の画面でActionの一覧を取得してStateを構築する
新しいStateならクローリングキューと状態遷移情報に追加し、機能の抽出はおこなわない

機能抽出を行う必要がある場合
現在のStateとActionをもとに機能を抽出する
抽出した機能について必要に応じてActionFunctionDBとFunctionDBに追加

現在のStateとAction、前回のStateとActionをもとに機能を抽出する
抽出した機能について必要に応じてActionFunctionDBとFunctionDBに追加

FunctionDBのスコアを更新

最終Actionかどうかを判定


Stateをノード、Actionをエッジとしたグラフを作る

 解析上重要なデータ
 MongoDBMongoDBを使用して以下の２つのコレクションを管理している
FunctionDB
ActionFunctionDB

 FunctionDB機能とその尤もらしさのスコアを格納する


属性
説明


app
アプリケーション名

text
機能名

embedding
textをOpenAIEmbeddings(model="text-embedding-3-large") でエンべディングした結果

score
その機能の尤もらしさの累計

final
この state×action 観測において“終端”と判定済みであることを示すフラグ

executable
実行可能な機能に関連づけられたアクションが少なくとも1つある


 ActionFunctionDB各Stateにおけるactionで発生する機能のスコアを保持する


属性
説明


app
アプリケーション名

url
StateのURL

state

State.get_id(BY_ACTIONS)、Stateを表すstate_id

prev_state
前回Stateのstate_id

action
要素のID

prev_action
前のActionのID

test_id
Actionのelement.test_id

depth
操作の深さ

type
SINGLE/DOUBLE(前のState情報を使用したスコア算出)

rank_score
（そのState×Action条件下で）該当機能がLLMで返された順位を幾何スコア化した値

func_pointer
FunctionDBをInsertしたときに返ってくるドキュメントのID

final
この state×action 観測において“終端”と判定済みであることを示すフラグ

should_execute
常時Trueにみえる


 主なデータ格納用クラス
 ActionActionはボタンクリックやFormのアクションを


プロパティ
型
説明


element
Element
Actionの要素

action_type
ActionType
FormActionType/ClickActionType

should_execute
bool


parent_state_id
str
親のstate_id


 StateStateは状態（画面）の情報を格納するクラス


プロパティ
型
説明


unique_id
str
一意のID(uuid4)

evaluator
StateIdEvaluator
BY_UNIQUE/BY_URL/BY_DOM/BY_ACTIONS

url
str
ページのURL

dom
str
ページのDOM(driver.page_source)

actions
list[Action]
状態が有するアクションの一覧

crawl_path
CrawlPath
StateがどのようなStateとActionで遷移してきたかを表す情報

context
str
Stateのコンテキスト


 LLMの使用箇所
 LLMを使用してStateのコンテキストを作成するhttps://github.com/parsaalian/autoe2e/blob/main/main.py#L52-L57
extract_state_contextではStateの画面キャプチャ、前のStateのコンテキスト、Actionの要素のHTMLをと、以下のようなプロンプトを使用してStateのコンテキストを作成してます。

この関数の結果で「商品検索の検索結果を表示するページ」といったStateのコンテキストが作成されます。
システムプロンプト：

CONTEXT_EXTRACTION_SYSTEM_PROMPT
Given the provided information about a webpage, your task is to provide a brief and abstract description of the webpage's primary purpose or function.
Output Guidelines:
* Brevity: Keep the description concise (aim for 1-2 sentences).
* Abstraction: Avoid specific details or variable names. Use general terms to describe the content and function. (Example: Instead of "a page showing results for searching for a TV," say "a page displaying search results for a product query.")
* Focus on Purpose: Prioritize describing the main intent of the page. What is it designed for the user to do or learn?
* No Extra Explanations: Just provide the context. Avoid adding commentary or assumptions.
ユーザプロンプト

CONTEXT_EXTRACTION_USER_PROMPT
The description of the website is: {description}
The previous state was: {previous_state}
The previous action was: {previous_action}
description: "None"

previous_state: 前のState

previous_action: 前のStateが実行したActionが有するelementのouterHTML

 重要なActionかの確認https://github.com/parsaalian/autoe2e/blob/main/main.py#L69
is_action_criticalではActionのouterHTMLからそれが、アカウントの削除や購入など、その影響が取り消し不可能な重要なActionかを判定します。
システムプロンプト

CRITICAL_ACTION_SYSTEM_PROMPT
Given an element in a web application, your task is to determine if the element is a critical action.
A critical action is an action that its effects are irreversible, such as deleting an account or making a purchase.
Please return a boolean value indicating if the element is a critical action. The boolean should be in Python format (True or False).
Just return the boolean and no further explanation.
ユーザプロンプト

ActionのelementのouterHTML

 FormのActionの場合はLLMで実行に必要な情報を埋めるhttps://github.com/parsaalian/autoe2e/blob/main/main.py#L73
create_form_filling_valuesではForm実行に設定が必要なFormの情報をActionのouterHTMLを使用して抽出します。この出力結果はJSONとなります。
システムプロンプト

FORM_VALUE_SYSTEM_PROMPT
Given a form element in a web application, your task is to generate a set of values so that the form can be submitted successfully.
The format for your response should be a JSON where the keys are the data-testid attributes of the input elements and the values are the values that should be filled in.
If the elements are radios or checkboxes, the values should be booleans.
If the elements are selects, the values should be the value attribute of the selected option.
Your response should be parsable by json.loads. Just include your response in the JSON, no additional information is needed. Avoid formatting the JSON for markdown or any other format.
ユーザプロンプト

ActionのelementのouterHTML

 機能の抽出https://github.com/parsaalian/autoe2e/blob/main/main.py#L102

https://github.com/parsaalian/autoe2e/blob/main/main.py#L119
extract_action_functionalitiesは今回のStateのコンテキスト、ActionのouterHTML（必要に応じて前回のものも利用可能）を使用して、そのActionに関連づけられる機能の候補(例："add item to cart"）を最大５つ取得します。

この結果は以下のようなJSONが取得される想定です
[
  {
    probability:（0.0〜1.0）その機能が存在する可能性,
    feature: ユーザーアクションの簡潔な記述（例: "add item to cart"）
  }
  ... 最大５件、probabilityの降順
]
システムプロンプト

FUNCTIONALITY_EXTRACTION_SYSTEM_PROMPT
Given a webpage's purpose and content (webpage_context), the outerHTML of an action element (action_element), and optionally the user's last action that led to this state, your task is to infer the most likely functionalities associated with that action element.
These functionalities should be user-centric actions that produce measurable outcomes within the application, are testable through E2E testing, and are essential to the presence of the action element.

Output Format:
Your is enclosed in two tags:
<Reasoning>:
- An enumerated list of at most five functionalities potentially connected to the element.
- For each functionality, answer the following questions concisely:
    1. Would developers write E2E test cases for this in the real world? It should be non-navigational, not menu-related, and not validation.
    2. Is the functionality a final user goal in itself or is it always a step in doing something else?
    3. Is this overly abstract/vague? If so, break it down into more testable sub-functionalities.
- Avoid repeating the questions in your responses every time.
<Response>:
- A JSON array of objects, each containing:
    - probability: (0.0 to 1.0) Likelihood of this functionality exists.
    - feature: A concise description of the user action (e.g., "add item to cart").
- Sorted by probability in descending order.
- Parsable by `json.loads`.
- Can be an empty array if no valid functionalities are found.
ユーザプロンプト
{
    "webpage_context": webpage_context,
    "action_element": action_element,
    "previous_action": previous_action
}
参考
webpage_context: Stateのコンテキスト

action_element: ActionのouterHTML

previous_action: 前回のActionがあれば、そのouterHTML

 抽出した機能について必要に応じてActionFunctionDBとFunctionDBに追加https://github.com/parsaalian/autoe2e/blob/main/main.py#L104

https://github.com/parsaalian/autoe2e/blob/main/main.py#L121
insert_functionalitiesでは、その機能がすでに存在するかしないかで、FunctionDBの追記または更新を行なっています。この際、機能名は前述のとおりLLMで取得しているため決定的なものではありません。そのため、機能名をOpenAIEmbeddingでベクトル化して、それに近しい候補をFunctionDBから抽出し、LLMで同じ機能かどうかを判定させています。

この際、既存の機能が存在した場合は既存の機能名と今の機能名の両方を考慮した機能名を作成し、それを機能名として登録します。

LLMからは以下のような返答が返ります。
{
    "match": リスト内のいずれかの機能が基準機能と一致するなら true,
    "match_index": 一致した機能のインデックス配列。一致がある場合にのみこのキーを含まれる。,
    "combined_text": 一致した場合、その機能の簡潔な説明。冗長な語は省いて構いません。一致がある場合にのみこのキーを含めてください
}
システムプロンプト

SIMILARITY_SYSTEM_PROMPT
Given a description of a software feature and a list of other software feature descriptions, your task is to determine if the initial feature matches any features in the list.

Output format:
Your analysis is enclosed in two tags:
<Reasoning>:
- For each item in the list, argue why the base feature and the feature in the list are or aren't describing the same action being performed in the app.
    - Are they exactly or semantically equivalent?
    - If they are different, how are they different?
- Avoid repeating the questions in your responses every time.
- Your analyses should be short and concise.
<Response>:
- A JSON object containing the following keys:
    - match: true If any feature in the list matches the base feature, false if not.
    - match_index: An array of indices of matched features in the list. Only include this key if there is a matching feature.
    - combined_text: If the features match, a concise description of that feature. Only include this key if there is a matching feature. You can omit some of the redundant words to keep this sentence simple.
- Parsable by `json.loads`.
ユーザプロンプト
f'Base feature:\n{base_functionality}\nThe list of functionalities:\n{functionalities}'
base_functionality：機能名

functionalities:base_functionalityに類似した機能名の一覧.改行で区切る

 最終Actionかどうかを判定https://github.com/parsaalian/autoe2e/blob/main/main.py#L147
mark_final_functionalitiesは現在のStateのActionが最後のアクションかどうかを判断する真偽値を返すLLMを実行したのちに、必要に応じてFunctionDB,ActionFunctionDBのfinalを更新する。
システムプロンプト

FINALITY_SYSTEM_PROMPT
Given the context of a webpage, an action element, and a list of features and scenarios, your task is to determine whether the action is the final action in the chain of actions for performing each of the features.

Output format:
Your analysis is enclosed in two tags:
<Reasoning>:
- For each feature in the list, argue why executing the action would or would not conclude the feature.
- Avoid repeating the description of the feature.
- Your analyses should be short and concise.
<Response>:
- An array of Python booleans, where index i is True if the action concludes feature i.
ユーザプロンプト
f'The context of the webpage is: {context}\nThe action element is: {clean_children_html(action_element)}\nThe list of functionalities:\n{functionalities}'
context: 現在のStateのコンテキスト

action_element: ActionのouterHTML

functionalities: 今のStateとActionに紐づく機能の候補の一覧。改行で区切る

 まとめ一般的なクローリングとLLMを活用して、そのアプリケーションの状態遷移図と機能一覧を網羅的につくろうとしていることは理解できました。
画像と、前回のStateとActionを利用したStateのコンテキストのLLMによる作り方や、機能の抽出方法と類似機能の抽出とcombined_textによる更新は、応用が効きそうな発想であると思います。
一方、ユーザー認証やLLMの使用数、実際のテストケースの作成などを考えた場合、GitHub上のコードをそのまま使用できるのかという点には疑義があるので、あくまで、参考としての利用になるかと思います。

属性	説明
app	アプリケーション名
text	機能名
embedding	textを`OpenAIEmbeddings(model="text-embedding-3-large")` でエンべディングした結果
score	その機能の尤もらしさの累計
final	この state×action 観測において“終端”と判定済みであることを示すフラグ
executable	実行可能な機能に関連づけられたアクションが少なくとも1つある

属性	説明
app	アプリケーション名
url	StateのURL
state	State.get_id(BY_ACTIONS)、Stateを表すstate_id
prev_state	前回Stateのstate_id
action	要素のID
prev_action	前のActionのID
test_id	Actionのelement.test_id
depth	操作の深さ
type	SINGLE/DOUBLE(前のState情報を使用したスコア算出)
rank_score	（そのState×Action条件下で）該当機能がLLMで返された順位を幾何スコア化した値
func_pointer	FunctionDBをInsertしたときに返ってくるドキュメントのID
final	この state×action 観測において“終端”と判定済みであることを示すフラグ
should_execute	常時Trueにみえる

プロパティ	型	説明
element	Element	Actionの要素
action_type	ActionType	FormActionType/ClickActionType
should_execute	bool
parent_state_id	str	親のstate_id

プロパティ	型	説明
unique_id	str	一意のID(uuid4)
evaluator	StateIdEvaluator	BY_UNIQUE/BY_URL/BY_DOM/BY_ACTIONS
url	str	ページのURL
dom	str	ページのDOM(driver.page_source)
actions	list[Action]	状態が有するアクションの一覧
crawl_path	CrawlPath	StateがどのようなStateとActionで遷移してきたかを表す情報
context	str	Stateのコンテキスト

はじめに

概要

解析上重要なデータ

MongoDB

FunctionDB

ActionFunctionDB

主なデータ格納用クラス

Action

State

LLMの使用箇所

LLMを使用してStateのコンテキストを作成する

重要なActionかの確認

FormのActionの場合はLLMで実行に必要な情報を埋める

機能の抽出

抽出した機能について必要に応じてActionFunctionDBとFunctionDBに追加

最終Actionかどうかを判定

まとめ

Discussion