🦄

SeleniumなしでWebDriverを操作するには - Part3

2024/01/08に公開

.NET

はじめに

この記事ではWebDriverを利用してブラウザを操作する方法を紹介します。
WebDriverの基本についてはPart1やPart2を参照してください。

Part1
Part2

なお、WebDriverに関する資料は2024年1月の内容を参考にしています。資料

今回は使ったソースコードはGitHubに公開しています。GitHub

Part3に入る前におさらい

最初に確認しておくべき注意点

WebDriverを利用するにあたっては以下のような注意点があります。

WebDriverはブラウザのバージョンに依存する

ご利用の環境によってうまく動作しない場合はブラウザのバージョンを確認したうえで
対応したWebDriverを利用してください。

動作環境

今回は以下の環境で動作確認を行っています。そのほかの環境で動かす場合の参考にしてください。

MacBook
- Apple Sillicon M1
- Sonoma 14.0
WebDriver
- chromedirver:119
Python
- Python 3.9.6

なお、chromedriverで実行していますが、基本的にWebDriverの仕様は変わらないのでエンドポイントさえ合っていれば他のWebDriver(edgedriver)でも動作します。

ウィンドウを操作するための前提知識

WebDriverではブラウザのウィンドウを操作する場合はウィンドウ特有のIDが必要です。
ウィンドウ特有のIDとはWebDrier上ではSessionIdと呼ばれるものです。
このSessionIdはWebDriverのセッションを開始するときに取得できます。

エンドポイント：http://localhost:9515/session
メソッド：POST
リクエスト用のJSON

{
    "capabilities":{}
}

SessionIdはウィンドウ起動中に一意になるようにWebDriverによって生成され、WebDriverで起動されたウィンドウはこのSessionIdを利用してウィンドウを操作できます。

なお、WebDriverで起動されたウィンドウが存在してかつWebDriverのセッションが有効な場合はSessionIdを用いてウィンドウを操作できます。
つまり、セッションが有効の状態でWebDriverによるブラウザオートメーションが動かなくなった場合、SessionIdを取得しなおす必要はありません。

エレメントを操作するための前提知識

エレメントとは簡単に説明するとWebページのドキュメントが持つオブジェクトです。
厳密な定義はMDNのページにありますので割愛します。
MDN の Element

エレメントを操作する場合はエレメント特有のIDが必要です。
特定のエレメントIDを取得する場合は以下に示されるエレメントの特徴をPOSTで渡すことが重要になります。

using	keyword
CSS selector	css selector
Link text selector	link text
Partial link text selector	partial link text
Tag name	tag name
XPath selector	xpath

エレメントを操作する際に使うエンドポイントは以下のような形式になります。

単一のエレメントを取得する場合
エンドポイント：http://localhost:9515/session/:sessionId/element/:elementId

複数のエレメントを取得する場合
エンドポイント：http://localhost:9515/session/:sessionId/elements/:elementId

メソッド：POST
リクエスト用のJSONはアクションによって異なります。

エレメントIDを取り出す時は定数値のキーを用いる必要があります。
具体的にはelement-6066-11e4-a52e-4f735466cecfです。

フレームを操作するための前提知識

昔のHTMLではframeとそれを束ねるframesetがあり、比較的新しいHTMLについてはiframe（インラインフレーム）が使われます。そもそもフレームとはHTMLの中にHTMLを埋め込むための仕組みです。
フレームごとにページが表示されているため、それぞれ異なる画面として扱われます。

WebDriverではこれをコンテキストとして扱うため、フレーム内のエレメントを取得するにはフレームを切り替える必要があります。
つまり、フレーム内のドキュメントを取得するときはスイッチという動作を事前に実行する必要があります。

エンドポイント：http://localhost:9515/session/:sessionId/frame
メソッド：アクションによって異なります。
リクエスト用のJSONはアクションによって異なります。

なお、iframe（インラインフレーム）内のドキュメントはフレームごとに画面のロード時間が異なります。
iframe（インラインフレーム）を束ねるウィンドウのロードが完了しても読み込めない場合があるのはそのためです。

Webスクレイピングをするにあたっての基本

ブラウザの開発者ツールを利用することで、どのようなタグや属性を利用すれば良いかを確認できます。
なお、Webスクレイピングの基本としては以下のような動作があります。

ウィンドウ
- ウィンドウタイトルを取得
- ウィンドウを開く
- ウィンドウを閉じる
- ウィンドウを最大化
- ウィンドウを最小化
- ウィンドウをリサイズ
- ウィンドウを移動
取得
- Webページの要素を取得
- ページソースを取得
入力
- UIの操作（クリック、チェックボックス、文字列の入力）
スクリプトの実行
- JavaScriptの同期実行
- JavaScriptの非同期実行
アラート
- アラートを取得
- アラートを閉じる
- アラートのテキストを取得
出力
- ファイル出力
パース
- HTML
- XML
- JSON

実践

ここまでで簡単なおさらいは以上です。それではサンプルコードをいくつかみていきましょう。
今回はPythonを使ってサンプルコードを実行します。

今回やること

SessionIdの取得
ウィンドウの操作
要素の取得
FindElements
UIの操作
スクリプトの実行
Cookieの操作

SessionIdの取得

※0_SessionId - この章で使ったコード

まずはすべての基本となるSessionIdの取得を学びます。
コードを書くにあたっては固定値となるパラメータをひとつのモジュール（config）にしておきます。

config/const.py

PORT = 9515
WEB_DRIVER_URL = 'http://localhost:{0}/session'.format(PORT)

上記のモジュールをインポート後、main.pyを作成して実行します。

main.py

import requests  # 「pip install requests」などが必要
import config.const as const

if __name__ == '__main__':

    print("Run WebDriver")

    res = requests.post(
        const.WEB_DRIVER_URL,
        headers={'Content-Type': 'application/json'},
        data='{"capabilities":{}}'
    ).json()

    sessionId = res.get("value").get("sessionId")

    print("json response {0}".format(res))
    print("{0} {1}".format("sessionId", sessionId))

    requests.delete(
        const.WEB_DRIVER_URL + '/' + sessionId,
        headers={'Content-Type': 'application/json'},
    )

python main.py

実行結果

Run WebDriver
json response {'value': {'capabilities': {'acceptInsecureCerts': False, 'browserName': 'chrome', 'browserVersion': '120.0.6099.129', 'chrome': {'chromedriverVersion': '119.0.6045.105 (38c72552c5e15ba9b3117c0967a0fd105072d7c6-refs/branch-heads/6045@{#1103})', 'userDataDir': '/var/folders/nk/26yk2_3d1dd382hvzshrs_3h0000gn/T/.org.chromium.Chromium.vYCiNc'}, 'fedcm:accounts': True, 'goog:chromeOptions': {'debuggerAddress': 'localhost:50582'}, 'networkConnectionEnabled': False, 'pageLoadStrategy': 'normal', 'platformName': 'mac', 'proxy': {}, 'setWindowRect': True, 'strictFileInteractability': False, 'timeouts': {'implicit': 0, 'pageLoad': 300000, 'script': 30000}, 'unhandledPromptBehavior': 'dismiss and notify', 'webauthn:extension:credBlob': True, 'webauthn:extension:largeBlob': True, 'webauthn:extension:minPinLength': True, 'webauthn:extension:prf': True, 'webauthn:virtualAuthenticators': True}, 'sessionId': 'ea7b68da6f7b932970282f929bf6f7f4'}}
sessionId ea7b68da6f7b932970282f929bf6f7f4

SessionIdは実行する度に異なる値で出力されます。
一瞬ですが、ブラウザが起動したかと思います。

これはリクエストを実行したあと、すぐにSessionIdを用いてDELETEリクエストを実行しているためです。
起動したことを確認できていないので以下のようにコードを修正して確認します。

main.py

import requests  # 「pip install requests」などが必要
import config.const as const

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    res = requests.post(
        const.WEB_DRIVER_URL,
        headers={'Content-Type': 'application/json'},
        data='{"capabilities":{}}'
    ).json()

    sessionId = res.get("value").get("sessionId")

    print("json response {0}".format(res))
    print("{0} {1}".format("sessionId", sessionId))

    for cnt in range(10):
        print("sleep {0}".format(cnt))
        os.system('sleep 1')

    requests.delete(
        const.WEB_DRIVER_URL + '/' + sessionId,
        headers={'Content-Type': 'application/json'},
    )

これで10秒間だけブラウザを起動して閉じるという動作ができます。

ウィンドウの操作

※1_Window - この章で使ったコード

ウィンドウの操作に入る前に最初に理解しておきたいところとしては今回はWebDriverを使ったウィンドウ操作であるということです。つまりはWebページで開いたドキュメントを操作することが前提であるということを忘れないでください。

一連のウィンドウ操作を体感するために下記に示すmain.pyと同じディレクトリにindex.htmlを作成します。
以降の項目でもindex.htmlを使ってWebDriverの動作を確認します。

index.html

<!DOCTYPE html>
<html lang="ja">

<head>
  <meta charset="UTF-8">
  <title>WebDriver Test Page</title>
</head>

<body>
  <div class="container">
    <h1>WebDriver Test Page</h1>
    <div class="window">
      <div class="window__header">
        <div class="window__title">Window</div>
        <div class="window__close">X</div>
      </div>
      <div class="window__content">
        <p>Run WebDriver</p>
      </div>
    </div>
  </div>
</body>

</html>

なお、今回はわかりやすさや起動の速さを優先しているため、シンプルなWebページとなっており
見づらいところがあります。HTMLに知見のある方は上記のHTMLを必要に応じて修正しても構いません。

index.htmlの準備が実行できたら下記に示すPythonのコードを実行します。
コードは以下のような操作を実行するコードです。

ブラウザの起動
URLを開く
タイトルを取得
ウィンドウハンドルの取得
ウィンドウのサイズ変更
ウィンドウの移動

ソースコードでは冗長な部分が見られますが、説明のために意図して記載しています。

main.py

import requests  # 「pip install requests」などが必要
import config.const as const

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    res = requests.post(
        const.WEB_DRIVER_URL,
        headers={'Content-Type': 'application/json'},
        data='{"capabilities":{}}'
    ).json()

    sessionId = res.get("value").get("sessionId")
    open_url = os.path.join(os.getcwd(), "index.html")

    res = requests.post(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/url']),
        headers={'Content-Type': 'application/json'},
        data='{"url": "' + "file://" + open_url + '"}'
    ).json()

    res = requests.get(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/title']),
        headers={'Content-Type': 'application/json'},
    ).json()
    window_title = res.get("value")
    print("Window Title: {0}".format(window_title))

    res = requests.get(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/handles']),
        headers={'Content-Type': 'application/json'},
    ).json()

    if res.get("value"):
        window_handles = res.get("value")

        res = requests.post(
            "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Handle: {0}".format(window_handles[0]))
        for cnt in range(2):
            os.system('sleep 1')

        res = requests.post(
            "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/rect']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' +
            window_handles[0] + '", "width": 800, "height": 600}'
        ).json()
        print("Window Rect: {0}".format(res.get("value")))
        for cnt in range(2):
            os.system('sleep 1')

        res = requests.post(
            "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/rect']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' +
            window_handles[0] + '", "x": 0, "y": 0}'
        ).json()
        print("Window Rect: {0}".format(res.get("value")))
        for cnt in range(2):
            os.system('sleep 1')

        res = requests.post(
            "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/minimize']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Minimize: {0}".format(res.get("value")))
        for cnt in range(2):
            os.system('sleep 1')

        res = requests.post(
            "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/maximize']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Maximize: {0}".format(res.get("value")))

        res = requests.post(
            "".join([const.WEB_DRIVER_URL, '/',
                    sessionId, '/window/fullscreen']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Fullscreen: {0}".format(res.get("value")))
        for cnt in range(5):
            os.system('sleep 1')

        res = requests.post(
            "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/maximize']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Maximize: {0}".format(res.get("value")))
        for cnt in range(5):
            os.system('sleep 1')

    requests.delete(
        "".join([const.WEB_DRIVER_URL, '/', sessionId]),
        headers={'Content-Type': 'application/json'},
    )

    print("End WebDriver")

実行結果
※ご利用の環境によって実行結果が異なる可能性があります。

Run WebDriver
Window Title: WebDriver Test Page
Window Handle: D32A2A54A3C250FCCB0285F955053142
Window Rect: {'height': 600, 'width': 800, 'x': 22, 'y': 47}
Window Rect: {'height': 600, 'width': 800, 'x': 0, 'y': 25}
Window Minimize: {'height': 600, 'width': 800, 'x': 0, 'y': 25}
Window Maximize: {'height': 875, 'width': 1440, 'x': 0, 'y': 25}
Window Fullscreen: {'height': 900, 'width': 1440, 'x': 0, 'y': 0}
Window Maximize: {'height': 875, 'width': 1440, 'x': 0, 'y': 25}
End WebDriver

補足：ネイティブアプリケーションのUI操作

Webページ以外にもブラウザに実装されているUIそのものを操作したいと思うことがあると思いますが、そういった操作はネイティブアプリケーションの操作になります。

ネイティブアプリケーションのUI操作について触れておくとUI操作についてはいくつかの実行方法があります。

Windows32 APIを利用したウィンドウハンドルベースによるUI操作
RPA製品によるアクティビティベースによるUI操作

また、要件に合致する場合はブラウザが提供するAPIにアクセスする方法もあります。拡張機能などによるアクセスが可能です。

要素の取得

※2_Element - この章で使ったコード

要素の取得ではさまざまなものが取得できます。主な内容としては下記の通りです。

属性値（name,value,idなどが持つ値）
innerText
innerHTML

JavaScriptでDOM（Document Object Model）プログラミング経験のある方にとっては馴染みの深い内容になるかと思います。

一連のエレメント操作を体感するためにindex.htmlとmain.py、const.pyを修正します。

index.html

<!DOCTYPE html>
<html lang="ja">

<head>
  <meta charset="UTF-8">
  <title>WebDriver Test Page</title>
</head>

<body>
  <div class="container">
    <h1>WebDriver Test Page</h1>
    <a href="https://google.com/">google link</a>
    <input type="text" name="text" value="text">
    <input type="button" value="text change">
  </div>
</body>

</html>

実行するPythonのコードは下記の通りです。

main.py

import requests  # 「pip install requests」などが必要
import config.const as const

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    res = requests.post(
        const.WEB_DRIVER_URL,
        headers={'Content-Type': 'application/json'},
        data='{"capabilities":{}}'
    ).json()

    sessionId = res.get("value").get("sessionId")
    open_url = os.path.join(os.getcwd(), "index.html")

    res = requests.post(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/url']),
        headers={'Content-Type': 'application/json'},
        data='{"url": "' + "file://" + open_url + '"}'
    ).json()

    res = requests.get(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/title']),
        headers={'Content-Type': 'application/json'},
    ).json()
    window_title = res.get("value")
    print("Window Title: {0}".format(window_title))

    res = requests.get(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/handles']),
        headers={'Content-Type': 'application/json'},
    ).json()

    if res.get("value"):
        window_handles = res.get("value")

        res = requests.post(
            "/".join([const.WEB_DRIVER_URL, sessionId, 'window']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Handle: {0}".format(window_handles[0]))

        res = requests.post(
            "/".join([const.WEB_DRIVER_URL, sessionId, 'window', 'maximize']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Maximize: {0}".format(res.get("value")))

    res = requests.post(
        "/".join([const.WEB_DRIVER_URL, sessionId, 'element']),
        headers={'Content-Type': 'application/json'},
        data='{"using": "css selector", "value": "' +
        "[type='" + "button" + "']" + '"}'
    ).json()
    btn_element = res.get("value").get(const.ELEMENT_KEY)
    print("Element: {0}".format(btn_element))

    res = requests.get(
        "/".join([const.WEB_DRIVER_URL, sessionId,
                  'element', btn_element, 'attribute', 'type']),
        headers={'Content-Type': 'application/json'},
    ).json()
    ui_type = res.get("value")
    print("Type: {0}".format(ui_type))

    res = requests.get(
        "/".join([const.WEB_DRIVER_URL, sessionId,
                  'element', btn_element, 'attribute', 'value']),
        headers={'Content-Type': 'application/json'},
    ).json()
    ui_value = res.get("value")
    print("Value: {0}".format(ui_value))

    res = requests.post(
        "/".join([const.WEB_DRIVER_URL, sessionId, 'element']),
        headers={'Content-Type': 'application/json'},
        data='{"using": "link text", "value":' + '"google link"' + "}"
    ).json()
    link_element = res.get("value").get(const.ELEMENT_KEY)
    print("Element: {0}".format(link_element))

    res = requests.get(
        "/".join([const.WEB_DRIVER_URL, sessionId,
                  'element', link_element, 'attribute', 'href']),
        headers={'Content-Type': 'application/json'},
    ).json()
    anchor_attr = res.get("value")
    print("Value: {0}".format(anchor_attr))

    res = requests.post(
        "/".join([const.WEB_DRIVER_URL, sessionId, 'element']),
        headers={'Content-Type': 'application/json'},
        data='{"using": "tag name", "value":' + '"h1"' + "}"
    ).json()
    h1_element = res.get("value").get(const.ELEMENT_KEY)
    print("Element: {0}".format(h1_element))

    res = requests.get(
        "/".join([const.WEB_DRIVER_URL, sessionId,
                  'element', h1_element, 'property', 'innerHTML']),
        headers={'Content-Type': 'application/json'},
    ).json()
    h1_innerHTML = res.get("value")
    print("Value: {0}".format(h1_innerHTML))

    res = requests.post(
        "/".join([const.WEB_DRIVER_URL, sessionId, 'element']),
        headers={'Content-Type': 'application/json'},
        data='{"using": "xpath", "value":' + '"/html/body/div/h1"' + "}"
    ).json()
    h1_xpath_element = res.get("value").get(const.ELEMENT_KEY)
    print("Element: {0}".format(h1_xpath_element))

    res = requests.get(
        "/".join([const.WEB_DRIVER_URL, sessionId,
                  'element', h1_xpath_element, 'property', 'innerHTML']),
        headers={'Content-Type': 'application/json'},
    ).json()
    h1_xpath_innerHTML = res.get("value")
    print("Value: {0}".format(h1_xpath_innerHTML))

    for cnt in range(5):
        os.system('sleep 1')

    requests.delete(
        "/".join([const.WEB_DRIVER_URL, sessionId]),
        headers={'Content-Type': 'application/json'},
    )

    print("End WebDriver")

エレメントの情報を取得するためのELEMENT_KEY（定数値）を定義します。

/config/const.py

PORT = 9515
WEB_DRIVER_URL = 'http://localhost:{0}/session'.format(PORT)
ELEMENT_KEY = "element-6066-11e4-a52e-4f735466cecf"

実行結果

Run WebDriver
Window Title: WebDriver Test Page
Window Handle: 976AEC450ACAFD821F43A5264F9D7090
Window Maximize: {'height': 875, 'width': 1440, 'x': 0, 'y': 25}
Element: 6CEDEA9121B3BAF142636F8BAF3600F0_element_3
Type: button
Value: text change
Element: 6CEDEA9121B3BAF142636F8BAF3600F0_element_5
Value: https://google.com/
Element: 6CEDEA9121B3BAF142636F8BAF3600F0_element_6
Value: WebDriver Test Page
Element: 6CEDEA9121B3BAF142636F8BAF3600F0_element_6
Value: WebDriver Test Page
End WebDriver

なお、最後のh1タグの取得に注目するとElementIdが同じであることがわかります。
このことから取得する手段によってElementIdが変わらないということを意味します。

ただし、取得した内容が同一であることと取得想定の内容であることの保証はありません。FindElementは一番最初にヒットしたエレメントを取り出すため、安定した取得方法とはいえません。

FindElements

※3_Elements - この章で使ったコード

FindElementは取得するエレメントが特定できている場合において有効ですが
同じようなエレメントが複数存在する場合は操作対象のエレメントを特定する必要があります。

たとえば、index.htmlにはinputタグが複数あります。この中からbuttonのみをクリックしたい場合
FindElementでは対応できません。

index.html

<input type="text" name="text" value="text">
<input type="password" name="password" value="password">
<input type="checkbox" name="checkbox" value="checkbox">
<input type="button" onclick="changeText();" value="text change">

inputタグそれぞれを特定するためにmain.pyを修正します。

main.py

import requests  # 「pip install requests」などが必要
import config.const as const

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    res = requests.post(
        const.WEB_DRIVER_URL,
        headers={'Content-Type': 'application/json'},
        data='{"capabilities":{}}'
    ).json()

    sessionId = res.get("value").get("sessionId")
    open_url = os.path.join(os.getcwd(), "index.html")

    res = requests.post(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/url']),
        headers={'Content-Type': 'application/json'},
        data='{"url": "' + "file://" + open_url + '"}'
    ).json()

    res = requests.get(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/title']),
        headers={'Content-Type': 'application/json'},
    ).json()
    window_title = res.get("value")
    print("Window Title: {0}".format(window_title))

    res = requests.get(
        "".join([const.WEB_DRIVER_URL, '/', sessionId, '/window/handles']),
        headers={'Content-Type': 'application/json'},
    ).json()

    if res.get("value"):
        window_handles = res.get("value")

        res = requests.post(
            "/".join([const.WEB_DRIVER_URL, sessionId, 'window']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Handle: {0}".format(window_handles[0]))

        res = requests.post(
            "/".join([const.WEB_DRIVER_URL, sessionId, 'window', 'maximize']),
            headers={'Content-Type': 'application/json'},
            data='{"handle": "' + window_handles[0] + '"}'
        ).json()
        print("Window Maximize: {0}".format(res.get("value")))

    print("---")

    res = requests.post(
        "/".join([const.WEB_DRIVER_URL, sessionId, 'elements']),
        headers={'Content-Type': 'application/json'},
        data='{"using": "tag name", "value":' + '"input"' + "}"
    ).json()
    input_elements = res.get("value")

    for input_element in input_elements:
        print("Element: {0}".format(input_element.get(const.ELEMENT_KEY)))

        res = requests.get(
            "/".join([const.WEB_DRIVER_URL, sessionId,
                      'element', input_element.get(const.ELEMENT_KEY), 'property', 'type']),
            headers={'Content-Type': 'application/json'},
        ).json()
        input_element_type = res.get("value")

        if input_element_type == 'text':
            print("input text")

        if input_element_type == 'password':
            print("input password")

        if input_element_type == 'checkbox':
            print("input checkbox")

        if input_element_type == 'button':
            print("input button")

    for cnt in range(5):
        os.system('sleep 1')

    requests.delete(
        "/".join([const.WEB_DRIVER_URL, sessionId]),
        headers={'Content-Type': 'application/json'},
    )
    print("---")
    print("End WebDriver")

実行結果

Run WebDriver
Window Title: WebDriver Test Page
Window Handle: 31C204419C7DEB76D771DA4484BF915F
Window Maximize: {'height': 875, 'width': 1440, 'x': 0, 'y': 25}
---
Element: 61F819E04BB692360EC955C73426A743_element_3
input text
Element: 61F819E04BB692360EC955C73426A743_element_5
input password
Element: 61F819E04BB692360EC955C73426A743_element_7
input checkbox
Element: 61F819E04BB692360EC955C73426A743_element_8
input button
---
End WebDriver

以上のようにFindElementsを実行後にエレメントのリストが戻り値として取得できるため
リストから順番にエレメントIDを取り出してプロパティを検査できます。

UIの操作（準備）

※4_ui - この章で使ったコード

エレメントの基礎を理解できたところで実際にUIを動かしたいところですが
その前にこれまでに作成した動作をモジュールとして定義してモジュールから呼び出しやすいようにしておきます。

ここではreq_driverというモジュール名で3つの機能と1つのconfigから構成されるモジュールを作成します。具体的には下記のリンクに示すモジュールです。

https://github.com/ymd65536/webdriver_py_samples/tree/main/4_ui/req_driver

作成したモジュールを使ってmain.pyを修正します。

main.py

import req_driver.config.const as const
import req_driver.browser.window as win
import req_driver.using_elements.element as ele

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    sessionId = win.start_session()
    open_url = os.path.join(os.getcwd(), "index.html")

    win.open_url(sessionId, open_url)
    window_title = win.get_title(sessionId)
    print("Window Title: {0}".format(window_title))
    window_handles = win.get_window_handles(sessionId)

    if window_handles:

        print(window_handles)
        sw = win.switch_to_window(sessionId, window_handles[0])

        maximize = win.window_maximize(sessionId, window_handles[0])
        print("Window Maximize: {0}".format(maximize))

    print("---")

    input_elements = ele.find_elements(sessionId, "tag name", "input")

    for input_element in input_elements:
        print("Element: {0}".format(input_element.get(const.ELEMENT_KEY)))

        input_element_type = ele.get_property(
            sessionId, input_element.get(const.ELEMENT_KEY), 'type')

        if ele.is_input_type_text(input_element_type):
            print("input text")

        if ele.is_input_type_password(input_element_type):
            print("input password")

        if ele.is_input_type_checkbox(input_element_type):
            print("input checkbox")

        if ele.is_input_type_button(input_element_type):
            print("input button")

    for cnt in range(5):
        os.system('sleep 1')

    win.delete_session(sessionId)

    print("---")
    print("End WebDriver")

実行結果

Run WebDriver
Window Title: WebDriver Test Page
['FE73E3AD34836CCB2601B46AB42D172C']
Window Maximize: {'height': 875, 'width': 1440, 'x': 0, 'y': 25}
---
Element: C26F34C12B1BC69C84BFA3CECE98320E_element_3
input text
Element: C26F34C12B1BC69C84BFA3CECE98320E_element_5
input password
Element: C26F34C12B1BC69C84BFA3CECE98320E_element_7
input checkbox
Element: C26F34C12B1BC69C84BFA3CECE98320E_element_8
input button
---
End WebDriver

実行結果が1つ前の結果とほぼ同じになれば、問題ありません。

UIの操作（実践）

※4_ui - この章で使ったコード

UI操作の準備ができましたので実践します。まずはinputタグに関する操作をreq_driverモジュールに定義します。モジュールに定義する機能としては下記の通りです。

ボタンをクリックする
テキストを入力する
チェックボックスにチェックを入れる

なお、チェックボックスにチェックを入れる動作についてはチェックが入ったかどうかを調べるis_checkedメソッドを実装します。

inputタグを操作するモジュール、input_element.pyは下記の通りです。

input_element.py

from ..config import const
from ..common import _request as req


def click_element(session_id, element_id):
    res = req.post(
        "/".join([const.WEB_DRIVER_URL, session_id,
                 'element', element_id, 'click']),
        headers=const.REQUEST_HEADERS,
        data='{"type": "pointerDown", "duration": 0}'
    )

    if res is None:
        res = req.post(
            "/".join([const.WEB_DRIVER_URL, session_id,
                     'element', element_id, 'click']),
            headers=const.REQUEST_HEADERS,
            data='{"type": "pointerUp", "duration": 0}'
        )
    return res


def send_keys(session_id, element_id, keys):
    res = req.post(
        "/".join([const.WEB_DRIVER_URL, session_id,
                  'element', element_id, 'value']),
        headers=const.REQUEST_HEADERS,
        data='{"type": "keyDown", "text": "' + keys + '"}'
    )
    if res is None:
        res = req.post(
            "/".join([const.WEB_DRIVER_URL, session_id,
                     'element', element_id, 'click']),
            headers=const.REQUEST_HEADERS,
            data='{"type": "keyUp", "duration": 0}'
        )
    return res


def checkbox(session_id, element_id):
    res = req.post(
        "/".join([const.WEB_DRIVER_URL, session_id,
                 'element', element_id, 'click']),
        headers=const.REQUEST_HEADERS,
        data='{"type": "pointerDown", "duration": 0}'
    )

    if res is None:
        req.get(
            "/".join([const.WEB_DRIVER_URL, session_id,
                      'element', element_id, 'selected']),
            headers=const.REQUEST_HEADERS,
            data='{"type": "pointerUp", "duration": 0}'
        )
    return res


def is_selected(session_id, element_id):
    res = req.get(
        "/".join([const.WEB_DRIVER_URL, session_id,
                  'element', element_id, 'selected']),
        headers=const.REQUEST_HEADERS
    )
    return res

https://github.com/ymd65536/webdriver_py_samples/blob/main/4_ui/req_driver/using_elements

input_element.pyのポイントとしては送信するデータだけでなく、マウスやキーボードも操作しているところです。

Seleniumのクリック操作がうまくいかないといった不具合の原因としてウィンドウがアクティブになっていないことが原因に挙げられますが、pointerDownの命令が機能していないことも可能性として挙げられます。
また、pointerDown命令によってウィンドウがアクティブになるということでもあります。

なお、pointerDown命令はポインターのクリックを下げた状態であるため、pointerDown命令後に次のUIを操作する場合はpointerUp命令を出す必要があります。

では、input_element.pyを使ってmain.pyを完成させましょう。

main.py

import req_driver.config.const as const
import req_driver.browser.window as win
import req_driver.using_elements.element as ele
import req_driver.using_elements.input_element as input_ele

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    sessionId = win.start_session()
    open_url = os.path.join(os.getcwd(), "index.html")

    win.open_url(sessionId, open_url)
    window_title = win.get_title(sessionId)
    print("Window Title: {0}".format(window_title))
    window_handles = win.get_window_handles(sessionId)

    if window_handles:

        print(window_handles)
        sw = win.switch_to_window(sessionId, window_handles[0])

        maximize = win.window_maximize(sessionId, window_handles[0])
        print("Window Maximize: {0}".format(maximize))

    print("---")

    input_elements = ele.find_elements(sessionId, "tag name", "input")

    for input_element in input_elements:
        print("Element: {0}".format(input_element.get(const.ELEMENT_KEY)))

        input_element_type = ele.get_property(
            sessionId, input_element.get(const.ELEMENT_KEY), 'type')

        if ele.is_input_type_text(input_element_type):
            print("input text")
            res = input_ele.send_keys(
                sessionId, input_element.get(const.ELEMENT_KEY), "test")
            os.system('sleep 1')

        if ele.is_input_type_password(input_element_type):
            print("input password")
            res = input_ele.send_keys(
                sessionId, input_element.get(const.ELEMENT_KEY), "password")

        if ele.is_input_type_checkbox(input_element_type):
            print("input checkbox")
            res = input_ele.is_selected(
                sessionId, input_element.get(const.ELEMENT_KEY))
            print(f"is checked {res}")

            res = input_ele.checkbox(
                sessionId, input_element.get(const.ELEMENT_KEY))

            res = input_ele.is_selected(
                sessionId, input_element.get(const.ELEMENT_KEY))
            print(f"is checked {res}")

        if ele.is_input_type_button(input_element_type):
            print("input button")
            res = input_ele.click_element(
                sessionId, input_element.get(const.ELEMENT_KEY))
            print(res)

    for cnt in range(5):
        os.system('sleep 1')

    win.delete_session(sessionId)

    print("---")
    print("End WebDriver")

さらに、変化をわかりやすくするためにindex.htmlも変更します。
具体的にはinputのvalue属性を空にします。

index.html

<!DOCTYPE html>
<html lang="ja">

<head>
  <meta charset="UTF-8">
  <title>WebDriver Test Page</title>
</head>

<script>
  function changeText() {
    document.getElementsByName('text').item(0).value = 'WebDriver';
  }
</script>

<body>
  <div class="container">
    <h1>WebDriver Test Page</h1>
    <a href="https://google.com/">google link</a>
    <input type="text" name="text" value="">
    <input type="password" name="password" value="">
    <input type="checkbox" name="checkbox" value="checkbox">
    <input type="button" onclick="changeText();" value="text change">
  </div>
</body>

</html>

スクリプトを実行して実行結果を参照します。

実行結果

Run WebDriver
Window Title: WebDriver Test Page
['82639A9C6040ECF3B9157CB6578AA0A2']
Window Maximize: {'height': 875, 'width': 1440, 'x': 0, 'y': 25}
---
Element: BB556EF8A20BC0209F7923B646A3BEA7_element_3
input text
Element: BB556EF8A20BC0209F7923B646A3BEA7_element_5
input password
Element: BB556EF8A20BC0209F7923B646A3BEA7_element_7
input checkbox
is checked False
is checked True
Element: BB556EF8A20BC0209F7923B646A3BEA7_element_8
input button
None
---
End WebDriver

動作速度が早いので見逃してしまいそうになりますが、テキストボックスへの入力、チェックボックスのチェック、ボタンの入力までできたと思います。

スクリプトの実行（前提知識）

WebDriverによるUIの操作を把握できたところで気づいたかもしれませんが
WebページのUIを操作する方法としてはJavaScriptを直接実行して操作する方法もあります。

ブラウザにはWebページ用のインターフェイスとしてDOMが提供されており、DOMをプログラムで操作することはDOMプログラミングと呼ばれています。このDOMプログラミングをJavaScriptで実現できます。

本題に入る前に少しだけJavaScriptによるUIの操作を復習しましょう。
すでに作成済みのindex.htmlにはchangeTextという関数がscriptタグに定義されています。

<script>
  function changeText() {
    document.getElementsByName('text').item(0).value = 'WebDriver';
  }
</script>

このchangeTextという関数はdocumentオブジェクトからgetElementsByNameメソッドを呼び出しています。

参考：NodeList - 開発者向けのウェブ技術

戻り値のデータ型はNodeListという型です。NodeListは配列として認識されるため、item(N)という形でアクセスできます。

つまり、document.getElementsByName('text').item(0).valueの1行で
「name属性の属性値がtextであるものをNodeList型で取得して返し、NodeListの先頭にあるデータからvalueプロパティを参照する」という意味になります。

valueの後ろにイコールをつけることでそのプロパティに代入するという意味になり、画面上では「テキストボックスにテキストが入力された」ように見えます。

なお、コード内ではdocumentと表現していますが、window.documentでも同じ意味になります。

スクリプトの実行（実践）

では、実際にコードを書いて試してみます。まずはindex.htmlを下記のように修正します。

index.html

<!DOCTYPE html>
<html lang="ja">

<head>
  <meta charset="UTF-8">
  <title>WebDriver Test Page Execute Sync</title>
</head>

<script>
  function changeNameAttr() {
    document.getElementsByName('name_attr').item(0).value = 'WebDriver';
  }
  function changeIdAttr() {
    document.getElementById('id_attr').value = 'WebDriver Test Page';
  }
  function changeCheckbox() {
    document.getElementsByName('checkbox').item(0).checked = true;
  }
  function changeParagraph() {
    document.getElementsByName('paragraph').item(0).innerHTML = 'WebDriver Test Page';
  }
  function changeHead1() {
    document.getElementsByName('head1').item(0).innerHTML = '---Welcome WebDriver Test Page---';
  }
  function changeHead2() {
    document.getElementsByName('head2').item(0).innerHTML = 'Test Page';
  }
  function changeCounterSync() {
    let counter = document.getElementsByName('counter_sync').item(0);
    counter.innerHTML = 10;
  }
  function changeCounterAsync() {
    let counter = document.getElementsByName('counter_async').item(0);
    counter.innerHTML = 10;
  }
</script>

<body>
  <div class="container">
    <h1 name="head1">WebDriver Test Page1</h1>
    <h2 name="head2">WebDriver Test Page2</h2>
    <input type="text" name="name_attr" value="">
    <input type="text" id="id_attr" value="">
    <input type="checkbox" name="checkbox" value="checkbox">
    <p name="paragraph">ここに挿入</p>

    <div>
      counter_async:
      <p name="counter_async">0</p>
    </div>
    <div>
      counter_sync:
      <p name="counter_sync">0</p>
    </div>

  </div>
</body>

</html>

上記のWebページではボタンなどによるイベントを定義していませんのでブラウザの開発者ツールなどを用いて
実行してあげなければ、定義したJavaScriptを動かすことはできません。

しかし、WebDriverへ適切なリクエストを送信することでJavaScriptを実行できるようになります。
Webページ内のスクリプトを実行できるようにreq_webdriverモジュールに機能を追加します。

実装する機能は2つです。

指定されたスクリプトを実行する
指定されたスクリプトを非同期で実行する

/script/javascript.py

from ..config import const
from ..common import _request as req


def execute_script(session_id, script):
    req.post(
        "/".join([const.WEB_DRIVER_URL, session_id, 'execute', 'sync']),
        headers=const.REQUEST_HEADERS,
        data='{"script": ' + '"' + script + '"' + ', "args":[]' + "}"
    )


def execute_async_script(session_id, script):
    req.post(
        "/".join([const.WEB_DRIVER_URL, session_id, 'execute', 'async']),
        headers=const.REQUEST_HEADERS,
        data='{"script": ' + '"' + script + '"' + ', "args":[]' + "}"
    )

javascript.pyに合わせてmain.pyを修正します。

main.py

import req_driver.browser.window as win
import req_driver.script.javascript as js

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    sessionId = win.start_session()
    open_url = os.path.join(os.getcwd(), "index.html")

    win.open_url(sessionId, open_url)
    window_title = win.get_title(sessionId)
    print("Window Title: {0}".format(window_title))
    window_handles = win.get_window_handles(sessionId)

    if window_handles:
        print(window_handles)
        sw = win.switch_to_window(sessionId, window_handles[0])

        maximize = win.window_maximize(sessionId, window_handles[0])
        print("Window Maximize: {0}".format(maximize))

    os.system('sleep 1')
    js.execute_script(sessionId, "changeHead1();")
    js.execute_script(sessionId, "changeHead2();")
    js.execute_script(sessionId, "changeNameAttr();")
    js.execute_script(sessionId, "changeCheckbox();")
    js.execute_script(sessionId, "changeParagraph();")

    js.execute_async_script(sessionId, "changeCounterAsync();")
    js.execute_script(sessionId, "changeCounterSync();")

    os.system('sleep 1')

    win.delete_session(sessionId)

    print("End WebDriver")

JavaScriptを使って各UI要素の値が変更されれば、問題なく動作しています。
なお、JavaScriptが引数を受け取る場合はdata='{"script": ' + '"' + script + '"' + ', "args":[]' + "}"のargsに渡すことでJavaScriptに引数を渡せますが、記事執筆時点ではうまく動作しなかったため、サンプルコードには載せてません。

Cookieを操作する前に前提知識を整理します。そもそもCookieとはなんでしょうか。
ここでMDNの説明を引用して定義を確認します。

HTTP Cookie (ウェブ Cookie、ブラウザー Cookie) は、サーバーがユーザーのウェブブラウザーに送信する小さなデータであり、ブラウザーに保存され、その後のリクエストと共に同じサーバーへ返送されます。一般的には、 2 つのリクエストが同じブラウザーから送信されたものであるかを知るために使用されます。例えば、ユーザーのログイン状態を維持することができます。 Cookie は、ステートレスな HTTP プロトコルのためにステートフルな情報を記憶します。

参考：Cookie - HTTP | MDN

HTTPはステートレスなプロトコルであるため、サーバとやりとりした情報は保存されず消えてしまいます。
そこでCookieを使うことでステートフルな情報として記憶できます。

プログラムの観点から見るとCookieはkey-valueの形式でデータを保存できるデータ構造であり、DocumentオブジェクトのcookieプロパティにアクセスすることでCookieを操作できます。

Cookieにアクセスする例

document.cookie = "key=value";

Document.cookie | MDN

※5_Cookie - この章で使ったコード

実際にCookieを操作する前の注意事項として、現在のドキュメントとは異なるドメインに対してCookieを操作することはできません。
つまり、これまでに作成したindex.htmlはfile://スキーマで開いており、ドメインが存在していないため、Cookieを操作できません。

参考 - InvalidCookieDomain

では、注意事項を念頭においた上で実際にCookieを操作してみましょう。
まずはusing_cookieという機能をreq_driverに作成します。

/req_driver/using_cookie.py

from ..config import const
from ..common import _request as req


def add_cookie(session_id, cookie):
    data = '{"cookie": ' + cookie + '}'

    res = req.post(
        "/".join([const.WEB_DRIVER_URL, session_id, 'cookie']),
        headers=const.REQUEST_HEADERS,
        data=data
    )
    print(res)

    return res


def get_all_cookies(session_id):
    return req.get(
        "/".join([const.WEB_DRIVER_URL, session_id, 'cookie']),
        headers=const.REQUEST_HEADERS,
    )


def named_cookie(session_id, name):
    return req.get(
        "/".join([const.WEB_DRIVER_URL, session_id, 'cookie', name]),
        headers=const.REQUEST_HEADERS,
    )


def delete_cookie(session_id, name):
    return req.delete(
        "/".join([const.WEB_DRIVER_URL, session_id, 'cookie', name]),
        headers=const.REQUEST_HEADERS,
    )


def delete_all_cookies(session_id):
    return req.delete(
        "/".join([const.WEB_DRIVER_URL, session_id, 'cookie']),
        headers=const.REQUEST_HEADERS,
    )

次にreq_driverのbrowserで定義されているopen_urlを修正します。

/req_driver/browser/window.py

def open_url(session_id, url, file=True):

    if not file:
        res = req.post(
            "/".join([const.WEB_DRIVER_URL, session_id, 'url']),
            headers=const.REQUEST_HEADERS,
            data='{"url": "' + url + '"}'
        )
    else:
        res = req.post(
            "/".join([const.WEB_DRIVER_URL, session_id, 'url']),
            headers=const.REQUEST_HEADERS,
            data='{"url": "' + "file://" + url + '"}'
        )

    return res

最後にmain.pyを修正します。

main.py

import req_driver.browser.window as win
import req_driver.using_cookie.cookie as cookie

# sleep用
import os

if __name__ == '__main__':

    print("Run WebDriver")

    sessionId = win.start_session()
    open_url = "https://google.com"

    win.open_url(sessionId, open_url, file=False)
    window_title = win.get_title(sessionId)
    print("Window Title: {0}".format(window_title))
    window_handles = win.get_window_handles(sessionId)

    if window_handles:
        print(window_handles)
        sw = win.switch_to_window(sessionId, window_handles[0])

        maximize = win.window_maximize(sessionId, window_handles[0])
        print("Window Maximize: {0}".format(maximize))

    cookie.add_cookie(
        sessionId, '{"name":"ymd","value":"65536","domain":"google.com"}')

    cookies = cookie.get_all_cookies(sessionId)
    print("Cookies: {0}".format(cookies))

    print("--------------------")

    named_cookie = cookie.named_cookie(sessionId, "ymd")
    print("Named Cookie: {0}".format(named_cookie))

    delete_cookie = cookie.delete_cookie(sessionId, "ymd")

    print("--------------------")

    cookies = cookie.get_all_cookies(sessionId)
    print("Cookies: {0}".format(cookies))

    cookie.delete_all_cookies(sessionId)
    cookies = cookie.get_all_cookies(sessionId)
    print("Cookies: {0}".format(cookies))

    os.system('sleep 1')
    win.delete_session(sessionId)

    print("End WebDriver")

実行結果

Run WebDriver
Window Title: Google
['EBE06D95E6140DEAB9B0BEB71FF9B16E']
Window Maximize: {'height': 875, 'width': 1440, 'x': 0, 'y': 25}
None
Cookies: [{'domain': '.google.com', 'expiry': 1720478162, 'httpOnly': True, 'name': 'NID', 'path': '/', 'sameSite': 'None', 'secure': True, 'value': '511='}, {'domain': '.google.com', 'expiry': 1720218961, 'httpOnly': True, 'name': 'AEC', 'path': '/', 'sameSite': 'Lax', 'secure': True, 'value': 'Ackid'}, {'domain': '.google.com', 'httpOnly': False, 'name': 'ymd', 'path': '/', 'sameSite': 'Lax', 'secure': True, 'value': '65536'}, {'domain': '.google.com', 'expiry': 1707258980, 'httpOnly': False, 'name': '1P_JAR', 'path': '/', 'sameSite': 'None', 'secure': True, 'value': '2024-01-07-22'}]
--------------------
Named Cookie: {'domain': '.google.com', 'httpOnly': False, 'name': 'ymd', 'path': '/', 'sameSite': 'Lax', 'secure': True, 'value': '65536'}
--------------------
Cookies: [{'domain': '.google.com', 'expiry': 1720478162, 'httpOnly': True, 'name': 'NID', 'path': '/', 'sameSite': 'None', 'secure': True, 'value': '511='}, {'domain': '.google.com', 'expiry': 1720218961, 'httpOnly': True, 'name': 'AEC', 'path': '/', 'sameSite': 'Lax', 'secure': True, 'value': 'Ackid'}, {'domain': '.google.com', 'expiry': 1707258980, 'httpOnly': False, 'name': '1P_JAR', 'path': '/', 'sameSite': 'None', 'secure': True, 'value': '2024-01-07-22'}]
Cookies: []
End WebDriver

実行結果を見るとgoogle.comのドメインに対してCookieを追加できていることがわかります。
また、google.comのドメインに対して追加したCookieを削除できていることもわかります。

まとめ

今回は以下の操作を実装しました。

SessionIdの取得
ウィンドウの操作
要素の取得
FindElements
UIの操作
スクリプトの実行
Cookieの操作

基本的にはSeleniumと同じような操作が可能であり、Seleniumと比較してWebDriverの方が細かい操作ができることもわかりました。SeleniumはWebDriverの起動が必須であるため、Seleniumでうまくいかない場合はWebDriverを直接操作することで解決できるかもしれません。

また、今回は記事の長さの関係もあって取り扱っていませんが、Alertの操作やFrameの操作も可能です。
それらはPart4にて取り扱う予定です。

所感

久しぶりにWebDriverを触ってみて、忘れていることが多かったので実装では苦労しました。
しかしながら、Part2の作成から数年間のブランクが空いていましたが、数年前から設計は変わっていないので安心しました。誰かのお役に立てれば幸いです。

SeleniumなしでWebDriverを操作するには - Part3

はじめに

Part3に入る前におさらい

最初に確認しておくべき注意点

動作環境

ウィンドウを操作するための前提知識

エレメントを操作するための前提知識

フレームを操作するための前提知識

Webスクレイピングをするにあたっての基本

実践

SessionIdの取得

ウィンドウの操作

補足：ネイティブアプリケーションのUI操作

要素の取得

FindElements

UIの操作（準備）

UIの操作（実践）

スクリプトの実行（前提知識）

スクリプトの実行（実践）

まとめ

所感

おわり

Discussion

はじめに

Part3に入る前におさらい

最初に確認しておくべき注意点

動作環境

ウィンドウを操作するための前提知識

エレメントを操作するための前提知識

フレームを操作するための前提知識

Webスクレイピングをするにあたっての基本

実践

SessionIdの取得

ウィンドウの操作

補足：ネイティブアプリケーションのUI操作

要素の取得

FindElements

UIの操作（準備）

UIの操作（実践）

スクリプトの実行（前提知識）

スクリプトの実行（実践）

Cookieの操作（前提知識）

Cookieの操作（実践）

まとめ

所感

おわり

Discussion