Skip to main content

Vision Mode

By default, Playwright MCP uses accessibility snapshots for all interactions. Vision mode adds coordinate-based tools that work with screenshots, enabling interaction with elements not exposed in the accessibility tree.

Enabling vision mode

Add the vision capability:

{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--caps=vision"]
}
}
}

Additional tools

With vision mode, these coordinate-based tools become available:

ToolDescription
browser_mouse_move_xyMove mouse to x,y coordinates
browser_mouse_click_xyClick at x,y (supports button, double-click, delay)
browser_mouse_drag_xyDrag from start to end coordinates
browser_mouse_downPress mouse button
browser_mouse_upRelease mouse button
browser_mouse_wheelScroll with mouse wheel

Workflow: interacting with a canvas app

You: Draw a rectangle on the canvas.

→ browser_take_screenshot
// LLM sees the canvas and identifies coordinates

→ browser_mouse_click_xy { x: 100, y: 150 }
→ browser_mouse_drag_xy { startX: 100, startY: 150, endX: 300, endY: 250 }
→ browser_take_screenshot
// LLM verifies the rectangle was drawn

Workflow: clicking an icon without accessible name

→ browser_snapshot
// The gear icon has no accessible name in the snapshot

→ browser_take_screenshot
// LLM sees the gear icon at approximately (850, 45)

→ browser_mouse_click_xy { x: 850, y: 45 }
→ browser_snapshot
// Settings panel is now open with proper accessibility
- heading "Settings" [level=2]
- textbox "Display name" [ref=e12]

When to use vision mode

ScenarioApproach
Standard web pagesUse refs from snapshots (default)
Canvas / WebGL appsVision mode with screenshots
Map interactionsVision mode for pan/zoom
Image editorsVision mode for drawing
Charts / graphsVision mode to click data points
Custom widgets without ARIAVision mode as fallback

For most web applications, the default snapshot-based approach is more reliable and token-efficient. Use vision mode only when the accessibility tree doesn't cover your use case.

Combining capabilities

Enable multiple capabilities:

{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--caps=vision,pdf,devtools"]
}
}
}

Or in the config file:

{
"capabilities": ["core", "vision", "pdf", "devtools"]
}