Vision Mode
By default, Playwright MCP uses accessibility snapshots for all interactions. Vision mode adds coordinate-based tools that work with screenshots, enabling interaction with elements not exposed in the accessibility tree.
Enabling vision mode
Add the vision capability:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--caps=vision"]
}
}
}
Additional tools
With vision mode, these coordinate-based tools become available:
| Tool | Description |
|---|---|
browser_mouse_move_xy | Move mouse to x,y coordinates |
browser_mouse_click_xy | Click at x,y (supports button, double-click, delay) |
browser_mouse_drag_xy | Drag from start to end coordinates |
browser_mouse_down | Press mouse button |
browser_mouse_up | Release mouse button |
browser_mouse_wheel | Scroll with mouse wheel |
Workflow: interacting with a canvas app
You: Draw a rectangle on the canvas.
→ browser_take_screenshot
// LLM sees the canvas and identifies coordinates
→ browser_mouse_click_xy { x: 100, y: 150 }
→ browser_mouse_drag_xy { startX: 100, startY: 150, endX: 300, endY: 250 }
→ browser_take_screenshot
// LLM verifies the rectangle was drawn
Workflow: clicking an icon without accessible name
→ browser_snapshot
// The gear icon has no accessible name in the snapshot
→ browser_take_screenshot
// LLM sees the gear icon at approximately (850, 45)
→ browser_mouse_click_xy { x: 850, y: 45 }
→ browser_snapshot
// Settings panel is now open with proper accessibility
- heading "Settings" [level=2]
- textbox "Display name" [ref=e12]
When to use vision mode
| Scenario | Approach |
|---|---|
| Standard web pages | Use refs from snapshots (default) |
| Canvas / WebGL apps | Vision mode with screenshots |
| Map interactions | Vision mode for pan/zoom |
| Image editors | Vision mode for drawing |
| Charts / graphs | Vision mode to click data points |
| Custom widgets without ARIA | Vision mode as fallback |
For most web applications, the default snapshot-based approach is more reliable and token-efficient. Use vision mode only when the accessibility tree doesn't cover your use case.
Combining capabilities
Enable multiple capabilities:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--caps=vision,pdf,devtools"]
}
}
}
Or in the config file:
{
"capabilities": ["core", "vision", "pdf", "devtools"]
}