web seraping ai agent

自然言語クエリでHTML要素を取れるAIエージェント
試してみる。
ゴールはECサイトで商品を自動購入する
https://www.agentql.com/

 料金プランこのAPIプランは、「プロフェッショナル」プランと呼ばれるもので、以下の内容が含まれています：
月額料金: $99

APIリクエストごとの料金: $0.015

月間APIリクエスト数の上限: 10,000リクエスト

毎分APIリクエスト数の上限: 50リクエスト

つまり、このプランでは月額固定費として99ドルがかかり、10,000リクエストまでは使用できます。それを超えた場合、1リクエストごとに$0.015が追加で課金されます。また、1分間に最大50リクエストまで送信可能です。
例えば、このプランを超えるリクエストが発生した場合、以下のような追加コスト計算となります：
10,500リクエストを使用した場合、超過分500リクエスト × $0.015 = $7.5 の追加料金

プロフェッショナルでもそんなに叩けないイメージ。

しょーへー

playgroundでデモができた。
下記のような雑なクエリで要素を取ってきてくれる。
Xpathも取ってきれくれるみたいだから、AgentQLで購入ボタンとかのXpath取ってきて、Seleniumでボタンを押すって感じの構成になりそう。

{
  job_categories[]
  jobs[] {
    company_name
    role
  }
}

https://youtu.be/vzuXPQEHHcs

しょーへー

Quick Startやってみる

しょーへー

デバック用のクロム拡張機能があるらしい。

ブラウザでクエリの挙動を簡単に試せるやつ。
Query this page with the AgentQL Debugger Chrome extension

The AgentQL Debugger lets you write and test queries in real-time on web pages, without needing to spin up the ?Python SDK. It's perfect for debugging queries before putting them into production! Here's how to get started:

しょーへー

インストールすると、検証ツールでAgentQLのタブが追加される。

しょーへー

下記のクエリを投げたら良い感じにHTML要素取れてそう。

{
    search_box
}

  "search_box": {
    "role": "text",
    "tf623_id": "299",
    "html_tag": "span",
    "name": "Search",
    "attributes": {
      "class": "DocSearch-Button-Placeholder"
    }
  }
}

しょーへー

クエリ
()の中に詳細な説明を付け加えられるみたい。

{
    headings(all the headings inside the article)[]
}

結果

{
  "headings": [
    "AgentQL Quick Start",
    "What's an AgentQL query?",
    "Get your API key",
    "Query this page with the AgentQL Debugger Chrome extension",
    "Try it out",
    "Perform the query with the AgentQL SDK",
    "Next steps",
    "Table of contents",
    "How would you rate your experience?"
  ]
}

良い感じにページの見出しのみ取得できた。

しょーへー

Perform the query with the AgentQL SDK

Python SDK使ってみる。

仮想環境作る

python -m venv agentql
source agentql/bin/activat

ライブラリインストールして初期化

pip3 install agentql
agentql init

API-KEYの入力を求められる

Installing dependencies...
Get your AgentQL API key at https://dev.agentql.com
Enter your AgentQL API key:

セットアップ完了

しょーへー

demo code

import agentql
from playwright.sync_api import sync_playwright

# Initialise the browser
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
    page = agentql.wrap(browser.new_page())
    page.goto("https://docs.agentql.com/quick-start")

    # Find "Search" button using Smart Locator
    search_button = page.get_by_prompt("search button")
    # Interact with the button
    search_button.click()

    # Define a query for modal dialog's search input
    SEARCH_BOX_QUERY = """
    {
        modal {
            search_box
        }
    }
    """

    # Get the modal's search input and fill it with "Quick Start"
    response = page.query_elements(SEARCH_BOX_QUERY)
    response.modal.search_box.type("Quick Start")

    # Define a query for the search results
    SEARCH_RESULTS_QUERY = """
    {
        modal {
            search_box
            search_results {
                items[]
            }
        }
    }
    """

    # Execute the query after the results have returned then click on the first one
    response = page.query_elements(SEARCH_RESULTS_QUERY)
    response.modal.search_results.items[0].click()

    # Used only for demo purposes. It allows you to see the effect of the script.
    page.wait_for_timeout(10000)

しょーへー

This script opens this site,docs.agentql.com, clicks the search button, fills in the search modal's input with "Quick Start," and clicks the first result—bringing you back to this page.
デモの説明

https://docs.agentql.com/quick-start のURLにアクセスする
検索ボタンを見つけてクリック
get_by_prompt("search button")で取れるらしい。  # Find "Search" button using Smart Locator
  search_button = page.get_by_prompt("search button")
  # Interact with the button
  search_button.click() 
"Quick Start"を入力して検索
検索結果を下記のクエリで取得 SEARCH_RESULTS_QUERY = """
  {
      modal {
          search_box
          search_results {
              items[]
          }
      }
  }
  """

しょーへー

実行してみるとエラーになった
画面開く→検索バーをクリック→検索内容を入力
までは上手くいくけど、検索内容を取得できていない

https://youtu.be/Gzqksk4S6I8

Traceback (most recent call last):
  File "/Users/tanabeshouhei/work/agnetql/agentql/example_script.py", line 41, in <module>
    response.modal.search_results.items[0].click()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/Users/tanabeshouhei/work/agnetql/agentql/lib/python3.11/site-packages/agentql/ext/playwright/sync_api/response_proxy.py", line 36, in __getitem__
    return super().__getitem__(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tanabeshouhei/work/agnetql/agentql/lib/python3.11/site-packages/agentql/ext/playwright/_response_proxy_parent.py", line 31, in __getitem__
    return self._resolve_item(self._response_data[index], self._query_tree_node)  # type: ignore # returned value could be None, but to make static checker happy we ignore it
                              ~~~~~~~~~~~~~~~~~~~^^^^^^^
IndexError: list index out of range

下記でエラーが出てる。検索した後のページで情報を取得できていない。

response = page.query_elements(SEARCH_RESULTS_QUERY)

デバック仕込むと空のリストで帰ってきてる

response = page.query_elements(SEARCH_RESULTS_QUERY)
print(response.modal.search_results.items)  # デバッグ用

しょーへー

クエリを変更してみる。
クロムの拡張機能の方で上手く取れるかをを実験してからクエリを考える。

ポップアップの要素を取得できていない。

しょーへー

クエリの書き方のドキュメント読んでみる

https://docs.agentql.com/agentql-query/best-practices
結構自由度高くクエリ書けそうだね
{
    login_btn(the one in header section)
    footer {
        social_media_links(The icons that lead to Facebook, Snapchat, etc.)[]
    }
}
分かりやすいの載ってた

ECサイトの例もあった

結局ポップアップの要素取るにはどうすれば良いんだ？

しょーへー

こんなクエリどうだろう？

{
    modal {
        search_results {
            items[] {
                title
                link
            }
        }
    }
}

しょーへー

取得できた！

しょーへー

ソースコードを変更

    # Define a query for the search results
    SEARCH_RESULTS_QUERY = """
    {
        modal {
            search_box
            search_results {
                items[]
            }
        }
    }
    """

↓

    # Define a query for the search results
    SEARCH_RESULTS_QUERY = """
    {
        modal {
            search_results {
                items[] {
                    title
                    link
                }
            }
        }
    }
    """

しょーへー

実行してみる。
取得できない😭

[]
No search results found.

検索結果が表示されるまでのラグがあるからかな？
3秒待機する

    # Wait for 3 seconds
    page.wait_for_timeout(3000)  # 3秒待機

キタ！取得できた！

[
  {
    "title": {
      "role": "text",
      "tf623_id": "1348",
      "html_tag": "span",
      "name": "AgentQL",
      "attributes": {}
    },
    "link": {
      "role": "link",
      "tf623_id": "1342",
      "html_tag": "a",
      "name": "",
      "attributes": {
        "class": "transition-colors duration-200 DocSearch-Hit--Child",
        "href": "/quick-start"
      }
    }
  },
  {
    "title": {
      "role": "text",
      "tf623_id": "1360",
      "html_tag": "span",
      "name": "What's an AgentQL query?",
      "attributes": {
        "class": "DocSearch-Hit-title"
      }
    },
    "link": {
      "role": "link",
      "tf623_id": "1354",
      "html_tag": "a",
      "name": "",
      "attributes": {
        "class": "transition-colors duration-200 DocSearch-Hit--Child",
        "href": "/quick-start#whats-an-agentql-query"
      }
    }
  },
  {
    "title": {
      "role": "text",
      "tf623_id": "1374",
      "html_tag": "span",
      "name": "Get your API key",
      "attributes": {
        "class": "DocSearch-Hit-title"
      }
    },
    "link": {
      "role": "link",
      "tf623_id": "1368",
      "html_tag": "a",
      "name": "",
      "attributes": {
        "class": "transition-colors duration-200 DocSearch-Hit--Child",
        "href": "/quick-start#get-your-api-key"
      }
    }
  },

しょーへー

公式のデモコードに3秒待機を追記して実行してみる。

上手く行った！！
ブラウザ開く→検索バーをクリック→検索内容を入力→検索結果を取得→検索結果の要素をクリック
までできるのはだいぶ良さそう。
https://youtu.be/g2VGuCoF4Ko

しょーへー

Amazonの商品を自動購入してみる

下記の商品。自動で買えてしまってもOKな商品を選んだ。
https://amzn.asia/d/eDqSAJ3

しょーへー

ログインボタンをクリックする

ヘッダーの「こんにちは、ログイン」のボタンをクリックしたい。

下記のクエリで上手く行った。

    # ログインボタンを特定するためのクエリを定義
    LOGIN_BUTTON_QUERY = """
    {
      login_btn(the one in header section containing "ログイン")
    }

日本語でも上手く行った

login_btn(ヘッダーの中にあり、テキストが「ログイン」である)

しょーへー

メアドを入力

下記クエリで上手く行った

    # メール入力フォームを取得してメールアドレスを入力
    response = page.query_elements(EMAIL_INPUT_QUERY)
    response.email_input_form.type("example@gmail.com")  # メールアドレスを入力

    # 次に進むを取得するためのクエリを定義
    NEXT_BUTTON_QUERY = """
    {
      next_input_button(type=submit)
    }
    """

    # 次に進むを取得してクリック
    response = page.query_elements(NEXT_BUTTON_QUERY)
    print(response)  # デバッグ用にレスポンスを出力
    response.next_input_button.click()  # 次に進むをクリック

しょーへー

難関現る

閉じるボタンを押したいけど、ボタンのHTML要素取れない。。。
検証ツールでも要素確認できない。
あ、無視してそのまま進めそう？？

しょーへー

上記のポップアップを無視しても、HTML要素は取得できてる。
でも、下記のinput要素に入力できない。。。

response.password_input_form.type("password")  # パスワードを入力

{
  "password_input_form": {
    "role": "form",
    "tf623_id": "5150",
    "html_tag": "form",
    "name": "signIn",
    "attributes": {
      "name": "signIn",
      "method": "post",
      "novalidate": true,
      "action": "https://www.amazon.co.jp/ap/signin",
      "class": "auth-validate-form auth-real-time-validation a-spacing-none",
      "data-fwcim-id": "GK7iSGYJ"
    }
  }
}
{
  "login_btn": {
    "role": "button",
    "tf623_id": "5176",
    "html_tag": "input",
    "name": "",
    "attributes": {
      "id": "signInSubmit",
      "class": "a-button-input",
      "type": "submit",
      "aria-labelledby": "auth-signin-button-announce"
    }
  }
}

しょーへー

仮想ブラウザを出現しないようにして対応。

with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:

↓

with sync_playwright() as playwright, playwright.chromium.launch(headless=True) as browser:

しょーへー

パスワード入力完了して、ここまでキタ！！
詰んでる？？
こーゆー自動購入をしようとしてるやつをブロックするようのやつじゃん、普通に。

しょーへー

こーゆー系のセキリティって、AIで突破できるのかな？
ChatGPTに解かせてみる。

出力

{"text":"8460sd"}

おしいけど、答えてくれた

正解候補を5種類出力させた
Wが0になってしまう。

{"text_options": ["8460sd", "8460sD", "8460Sd", "8460sD", "8460sd"]}

それぞれの桁で候補を5つずつ出力させる。
うーん、だめだ。

{"text_candidates": [
{
"digit_1": ["8", "0", "6", "9", "3"],
"digit_2": ["4", "5", "7", "2", "1"],
"digit_3": ["6", "8", "0", "4", "2"],
"digit_4": ["s", "5", "$", "%", "&"],
"digit_5": ["d", "#", "@", "*", "+"],
"digit_6": ["2", "3", "0", "1", "4"]
}
]}