iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🎭

Getting Started with Browser Automation using Playwright MCP: A Beginner's Guide

に公開

Must-Read for Beginners! Starting Browser Automation with Playwright MCP

Introduction

When performing testing or automation of web applications, efficient browser operation is essential. While many automation tools have emerged over the years, "Playwright MCP," announced by Microsoft in 2025, is garnering attention as a next-generation browser automation tool designed with AI integration in mind.

The "MCP" in Playwright MCP stands for "Model Context Protocol," an innovative protocol that allows AI (LLMs: Large Language Models) to operate browsers directly. With this tool, you can operate a browser using natural language instructions and easily realize integration with AI agents.

In this article, we will explain everything from the basic concepts of Playwright MCP to its installation and practical usage in a way that is easy for beginners to understand. If you are interested in browser automation or test automation, or if you want to explore new possibilities through AI integration, please read on to the end.

What is Playwright MCP?

Playwright Basics

Before understanding Playwright MCP, let's briefly explain Playwright itself.

Playwright is an open-source browser automation tool developed by Microsoft. It supports multiple browsers such as Chromium, Firefox, and WebKit, allowing for cross-browser testing with a single API. It is a powerful tool that can automate browser operations using JavaScript or TypeScript.

Playwright Logo
Source: Playwright Official Site

What is MCP?

MCP stands for "Model Context Protocol," a standard protocol for Large Language Models (LLMs) to access tools and data. Simply put, it is like a common language for AI to interact with external tools and data safely and efficiently.

Features of Playwright MCP

Playwright MCP is a version of Playwright that supports MCP, and it has the following features:

  1. Utilization of the Accessibility Tree: Uses the structural information of web pages instead of pixel-based image recognition.
  2. LLM-friendly: Designed for easy integration with AI models.
  3. Fast and Lightweight: Processing is faster than visual recognition via screenshots.
  4. Deterministic Operations: Reliable element identification and operation.

Differences from Traditional Browser Automation

Compared to traditional browser automation tools, Playwright MCP has the following differences:

Traditional Approach Playwright MCP
Direct access to HTML/DOM Utilizes the accessibility tree
Image recognition or coordinate-based operation is mainstream Structure-based operation is mainstream
Additional implementation required for AI integration Designed assuming AI integration
Requires specification of complex selectors Enables instructions close to natural language

What is the Accessibility Tree?

The core technology of Playwright MCP is the "Accessibility Tree." This represents the semantic structure of a web page, rather than its visual representation.

The accessibility tree is the same as what assistive technologies like screen readers use, containing information such as the roles of elements on the page (buttons, links, text inputs, etc.), names, states, and relationships.

Example of Accessibility Tree
Conceptual diagram of the accessibility tree

Playwright MCP is, by using this accessibility tree, identifies and operates elements based on "meaning" rather than the look of the web page. This makes automation more stable and resistant to visual UI changes.

Two Operating Modes of Playwright MCP

Playwright MCP has two operating modes: "Snapshot Mode" and "Vision Mode." Let's look at the characteristics and use cases for each.

Snapshot Mode (Default)

Snapshot mode is the default operating mode of Playwright MCP. In this mode, it uses the browser's accessibility tree to identify and operate on elements based on the web page's structural information.

Features and Benefits:

  • Fast: Faster processing because it doesn't require screenshot generation or image processing
  • Lightweight: Consumes fewer resources
  • Stability: Resistant to changes in UI appearance
  • Structural Understanding: Enables operations based on the semantic structure of the page
  • Accuracy: Precise element identification

Snapshot mode is suitable for many common web operations (navigation, form input, data extraction, etc.).

// Example of Snapshot mode (default)
npx @playwright/mcp

Vision Mode

Vision mode operates similarly to traditional visual automation tools. In this mode, it takes screenshots of the web page and performs coordinate-based operations based on them.

Features and Use Cases:

  • Visual Element Manipulation: Manipulating visual elements that are difficult to represent structurally
  • Coordinate-based Operation: Allows operation based on X-Y coordinates
  • Compatibility with Visual AI Models: Suitable for integration with computer-vision-capable AI models
  • When Accessibility Information is Insufficient: Operation on sites where accessibility is not considered

Vision mode is helpful for operations on special UI components or websites with insufficient accessibility information.

// Example of enabling Vision mode
npx @playwright/mcp --vision

Criteria for Selecting a Mode

You should decide which mode to choose based on the following criteria:

  • Snapshot Mode (Recommended):

    • General website operations
    • When performance is important
    • When prioritizing stability
    • Normal form input or data extraction
  • Vision Mode:

    • Sites lacking accessibility information
    • When visual confirmation is necessary
    • Operations on canvas or graphical elements
    • Operations requiring image recognition

Two Modes of Playwright MCP
The two operating modes of Playwright MCP

Installation and Setup of Playwright MCP

From here, let's prepare to actually use Playwright MCP. I will explain it step-by-step so that even beginners won't get lost.

Required Environment

To use Playwright MCP, the following environment is required:

  1. Node.js: Playwright MCP operates in a Node.js environment
  2. Claude Desktop (Optional): If you want to integrate with AI

Installing Node.js

First, let's install Node.js. Download and install the LTS (Long Term Support) version from the official website.

  1. Access the Node.js Official Site
  2. Download the LTS version installer
  3. Run the installer and follow the instructions to install

Once the installation is complete, open a terminal (Command Prompt or PowerShell for Windows) and check the versions of Node.js and npm with the following commands:

node -v
npm -v

If the version numbers are displayed, the Node.js installation was successful.

Installing and Running Playwright MCP

Playwright MCP can be executed directly using the npx command. Permanent installation is not required.

Basic execution method:

npx @playwright/mcp

Running this command starts the Playwright MCP server. By default, it operates in Snapshot mode.

When running in Vision mode:

npx @playwright/mcp --vision

Other Options

Playwright MCP has the following options:

  • --port=<Port Number>: Start the server on a specific port
  • --headless: Headless mode (do not display the browser window)
  • --vision: Enable Vision mode

For example, to start the server in headless mode on port 8000:

npx @playwright/mcp --port=8000 --headless

How to Integrate Claude Desktop with Playwright MCP

One of the attractions of Playwright MCP is the ease of integration with LLMs like Claude AI. Here, we will explain how to integrate it with Claude Desktop.

Installing and Configuring Claude Desktop

  1. Download and install the desktop app from the Claude Official Site.
  2. Register an account and log in.
  3. Enable Developer Mode.
    • Top-left menu → Help → Enable Developer Mode.

Claude Desktop Developer Mode
Enabling Developer Mode in Claude Desktop

Editing the Configuration File

Once you have enabled Developer Mode in Claude Desktop, the next step is to edit the configuration file:

  1. Open the "Developer" tab in Claude Desktop.
  2. Click "Edit Config" or "Get Started with Config."
  3. Open and edit claude_desktop_config.json.

Add the following content:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": [
        "@playwright/mcp"
      ]
    }
  }
}

If you use Vision mode, change the args as follows:

"args": [
  "@playwright/mcp",
  "--vision"
]
  1. Save the file and restart Claude Desktop.

Now, you are ready to use Playwright MCP from Claude Desktop.

Integration between Claude Desktop and Playwright MCP
Configuration for Claude Desktop and Playwright MCP integration

Basic Usage of Playwright MCP

Once the Playwright MCP setup is complete, let's actually try using it. Here, we introduce the basic operation methods.

Key Tools and Commands

Playwright MCP offers different tools depending on the mode.

Key Tools in Snapshot Mode

In Snapshot mode (default), the following main tools are available:

  • browser_navigate: Navigation to a URL
  • browser_snapshot: Retrieval of accessibility snapshots
  • browser_click: Clicking an element
  • browser_type: Text input
  • browser_select_option: Selection from a dropdown menu
  • browser_go_back: Go back to the previous page in the browser
  • browser_go_forward: Go forward to the next page in the browser
  • browser_take_screenshot: Retrieval of a screenshot
  • browser_choose_file: File selection
  • browser_save_as_pdf: Save as PDF

Key Tools in Vision Mode

In Vision mode, additional tools such as the following are available:

  • browser_screenshot: Retrieval of a screenshot
  • browser_move_mouse: Mouse movement by specifying coordinates
  • browser_click: Clicking by specifying coordinates (behavior differs from Snapshot mode)

Basic Operation Examples

// Example of accessing Google
browser_navigate({ url: "https://www.google.com" })

// Retrieve a snapshot
const snapshot = browser_snapshot()

Clicking Elements and Input

// Find and click the search box from the snapshot
browser_click({ 
  element: "Search box", 
  ref: snapshot.searchBox.ref 
})

// Input text
browser_type({ 
  element: "Search box", 
  ref: snapshot.searchBox.ref, 
  text: "Playwright MCP", 
  submit: true 
})

Example of Form Input

// Example of a login form
browser_navigate({ url: "https://example.com/login" })
const snapshot = browser_snapshot()

// Input username
browser_click({ 
  element: "Username field", 
  ref: snapshot.usernameField.ref 
})
browser_type({ 
  element: "Username field", 
  ref: snapshot.usernameField.ref, 
  text: "testuser", 
  submit: false 
})

// Input password
browser_click({ 
  element: "Password field", 
  ref: snapshot.passwordField.ref 
})
browser_type({ 
  element: "Password field", 
  ref: snapshot.passwordField.ref, 
  text: "password123", 
  submit: false 
})

// Click login button
browser_click({ 
  element: "Login button", 
  ref: snapshot.loginButton.ref 
})

Example of Integration with Claude AI

When you integrate Claude Desktop with Playwright MCP, browser operations via natural language instructions become possible. For example:

Access Google, search for "Playwright MCP", and click on the first search result.

When you give an instruction like this, Claude AI actually operates the browser through Playwright MCP.

Browser operation by Claude AI
Image of browser operation by Claude AI

Practical Examples

Here, we will introduce more practical examples of using Playwright MCP.

Form Input Automation

For example, let's consider a case where you automate the task of entering the same data into multiple forms.

// Function for automatic form input
async function fillForm(url, formData) {
  await browser_navigate({ url: url });
  const snapshot = await browser_snapshot();
  
  // Input each field in the form
  for (const [fieldName, value] of Object.entries(formData)) {
    const field = findFieldByLabel(snapshot, fieldName);
    if (field) {
      await browser_click({ element: fieldName, ref: field.ref });
      await browser_type({ 
        element: fieldName, 
        ref: field.ref, 
        text: value, 
        submit: false 
      });
    }
  }
  
  // Find and click the submit button
  const submitButton = findSubmitButton(snapshot);
  if (submitButton) {
    await browser_click({ 
      element: "Submit button", 
      ref: submitButton.ref 
    });
  }
}

// Usage example
const userData = {
  "Name": "Taro Yamada",
  "Email Address": "yamada@example.com",
  "Phone Number": "03-1234-5678",
  "Comment": "This is a test of Playwright MCP."
};

fillForm("https://example.com/contact", userData);

Data Extraction and Scraping

This is an example of extracting specific data from a web page.

// Example of scraping product information
async function scrapeProductInfo(url) {
  await browser_navigate({ url: url });
  const snapshot = await browser_snapshot();
  
  // Array to store product information
  const products = [];
  
  // Find the product list
  const productList = findProductList(snapshot);
  if (productList && productList.children) {
    // Extract information for each product
    for (const product of productList.children) {
      const name = findProductName(product);
      const price = findProductPrice(product);
      const rating = findProductRating(product);
      
      products.push({
        name: name ? name.innerText : "Unknown",
        price: price ? price.innerText : "Unknown",
        rating: rating ? rating.innerText : "Unknown"
      });
    }
  }
  
  return products;
}

// Usage example
const products = await scrapeProductInfo("https://example.com/products");
console.log(products);

Integration with AI Agents

Here is an example of integrating an LLM like Claude AI with Playwright MCP.

// Example of task processing by an AI agent
async function aiAssistedTask(taskDescription) {
  // Explain the task to the AI agent
  const agentResponse = await claudeAgent.process({
    task: taskDescription,
    tools: ["playwright-mcp"]
  });
  
  // Execute the operation steps generated by the AI
  for (const step of agentResponse.steps) {
    switch (step.action) {
      case "navigate":
        await browser_navigate({ url: step.url });
        break;
      case "click":
        const snapshot = await browser_snapshot();
        const element = findElementByDescription(snapshot, step.elementDescription);
        if (element) {
          await browser_click({ 
            element: step.elementDescription, 
            ref: element.ref 
          });
        }
        break;
      case "type":
        await browser_type({ 
          element: step.elementDescription, 
          ref: step.elementRef, 
          text: step.text, 
          submit: step.submit || false 
        });
        break;
      // Other operations...
    }
  }
  
  return await browser_snapshot();
}

// Usage example
const result = await aiAssistedTask(
  "Access Amazon, search for the latest iPhone, and retrieve the price of the first search result."
);

Common Problems and Solutions

Here are some common problems you might encounter when using Playwright MCP, along with their solutions.

1. When Elements Cannot Be Found

Problem: There are cases where elements cannot be identified in Snapshot mode.

Solutions:

  • Check if the element is correctly represented in the accessibility tree.
  • Use a more specific description of the element.
  • Switch to Vision mode to identify the element visually.
// Specifying elements in more detail
const snapshot = await browser_snapshot();
console.log(JSON.stringify(snapshot, null, 2)); // Check the structure of the snapshot

// Identify elements by combining multiple attributes
const element = findElementByMultipleAttributes(snapshot, {
  role: "button",
  name: "Submit",
  // Other attributes...
});

2. Operations on Dynamically Changing Web Pages

Problem: On dynamically changing web pages, such as SPAs, elements may not be found.

Solutions:

  • Re-acquire snapshots at the appropriate timing.
  • Add logic to wait until the element is displayed.
// Function to wait until an element is displayed
async function waitForElement(description, maxAttempts = 10, interval = 500) {
  for (let i = 0; i < maxAttempts; i++) {
    const snapshot = await browser_snapshot();
    const element = findElementByDescription(snapshot, description);
    
    if (element) {
      return element;
    }
    
    // Wait for a certain period of time
    await new Promise(resolve => setTimeout(resolve, interval));
  }
  
  throw new Error(`Element "${description}" not found after ${maxAttempts} attempts`);
}

// Usage example
const loginButton = await waitForElement("Login button");
await browser_click({ 
  element: "Login button", 
  ref: loginButton.ref 
});

3. Operations in iframes or Shadow DOM

Problem: You may be unable to access elements inside iframes or the Shadow DOM.

Solutions:

  • Adjust the snapshot acquisition options.
  • Use special selectors.
// Retrieve snapshot including iframes
const snapshot = await browser_snapshot({ includeIframes: true });

// Retrieve snapshot including Shadow DOM
const snapshot = await browser_snapshot({ includeShadowDOM: true });

4. Performance Issues

Problem: Performance may degrade on large pages or with complex operations.

Solutions:

  • Minimize the number of times snapshots are acquired.
  • Perform only the necessary operations.
  • Use headless mode.
// Execution in headless mode
npx @playwright/mcp --headless

// Reuse of snapshots
const snapshot = await browser_snapshot();
// Use the same snapshot for multiple operations

Summary - Possibilities and Future Prospects of Playwright MCP

In this article, we have covered everything from the basic concepts of Playwright MCP to its practical usage. Let's summarize the main points of Playwright MCP:

  • Innovative Approach: Structure-based operation utilizing the accessibility tree.
  • Two Operating Modes: Snapshot mode (default) and Vision mode.
  • Easy Introduction: Can be started easily with just Node.js and the npx command.
  • AI Integration: Advanced automation integrated with LLMs like Claude AI.

Playwright MCP is particularly effective in the following scenarios:

  1. Test Automation: E2E testing of web applications.
  2. Data Collection: Web scraping and information gathering.
  3. Business Automation: Automation of routine web operations.
  4. AI Agents: Autonomous task execution utilizing LLMs.

Future Prospects

While Playwright MCP is a relatively new tool, further development is expected:

  • More Advanced AI Integration: AI understanding and executing more complex tasks.
  • Improved Visual Understanding: Better accuracy in Vision mode.
  • Addition of New Features: More diverse browser operations and control functions.
  • Expansion of the Ecosystem: Integration with various LLMs and tools.

Final Thoughts

Playwright MCP is a tool that opens up a new era of browser automation. Its design, based on the premise of AI integration, hints at the future direction of web and test automation.

I encourage beginners to try out Playwright MCP while referring to this article. By adopting new technologies early, you will be able to build a more efficient and creative web development and testing environment.


I hope this article serves as a helpful reference for anyone interested in Playwright MCP. If you have any questions or feedback, please feel free to leave a comment!

Discussion