Vision Mode

By default, you interact with page elements using refs from accessibility snapshots. For elements not exposed in the accessibility tree — canvas apps, maps, custom widgets — use coordinate-based mouse commands with screenshots as your visual reference.

Commands

Command	Description
`mousemove <x> <y>`	Move mouse to pixel coordinates
`mousedown [button]`	Press mouse button (left, right, middle)
`mouseup [button]`	Release mouse button
`mousewheel <dx> <dy>`	Scroll (dx=horizontal, dy=vertical)
`screenshot`	Capture viewport for coordinate reference

Workflow: interacting with a canvas app

# Take a screenshot to see the canvas
playwright-cli screenshot --filename=canvas.png

# Agent identifies coordinates from the screenshot
# Click at position (150, 300)
playwright-cli mousemove 150 300
playwright-cli mousedown
playwright-cli mouseup

# Drag from (100, 200) to (400, 200)
playwright-cli mousemove 100 200
playwright-cli mousedown
playwright-cli mousemove 400 200
playwright-cli mouseup

# Verify the result
playwright-cli screenshot --filename=after-drag.png

Workflow: clicking an icon without accessible name

# Snapshot doesn't show the gear icon
playwright-cli snapshot
# (no gear icon in output)

# Take a screenshot — agent sees gear icon at approximately (850, 45)
playwright-cli screenshot

# Click it
playwright-cli mousemove 850 45
playwright-cli mousedown
playwright-cli mouseup

# Settings panel opens with proper accessibility
playwright-cli snapshot
# - heading "Settings" [level=2]
# - textbox "Display name" [ref=e12]

# Now use refs for the rest
playwright-cli fill e12 "New Name"

When to use vision mode

Scenario	Approach
Standard web pages	Use refs from snapshots (default)
Canvas / WebGL apps	Vision mode with screenshots
Map interactions	Vision mode for pan/zoom
Image editors	Vision mode for drawing
Charts / graphs	Vision mode to click data points
Custom widgets without ARIA	Vision mode as fallback

For most web applications, the default snapshot-based approach is more reliable and token-efficient. Use vision mode only when the accessibility tree doesn't cover your use case.

Commands​

Workflow: interacting with a canvas app​

Workflow: clicking an icon without accessible name​

When to use vision mode​

Commands

Workflow: interacting with a canvas app

Workflow: clicking an icon without accessible name

When to use vision mode