Skip to main content

Keyboard & Mouse

Keyboard

browser_press_key

Press a key or key combination.

ParameterTypeRequiredDescription
keystringyesKey to press

Common keys: Enter, Tab, Escape, Backspace, Delete, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, Home, End, PageUp, PageDown

Key combinations: Control+a, Control+c, Control+v, Shift+Tab, Alt+F4

→ browser_press_key { key: "Enter" }         // submit form
→ browser_press_key { key: "Tab" } // move to next field
→ browser_press_key { key: "Escape" } // close modal
→ browser_press_key { key: "Control+a" } // select all text
→ browser_press_key { key: "ArrowDown" } // navigate dropdown

browser_type

Type text into an element. See Forms for details.

Mouse (Vision mode)

These tools are available when the vision capability is enabled (--caps=vision). They use pixel coordinates from screenshots rather than element refs from snapshots.

browser_mouse_move_xy

ParameterTypeRequiredDescription
xnumberyesX coordinate in pixels
ynumberyesY coordinate in pixels

browser_mouse_down / browser_mouse_up

Press or release the mouse button at the current position.

browser_mouse_wheel

ParameterTypeRequiredDescription
deltaXnumberyesHorizontal scroll (pixels)
deltaYnumberyesVertical scroll (pixels, positive = down)

browser_mouse_click_xy

Click at specific coordinates without needing to move first.

ParameterTypeRequiredDescription
xnumberyesX coordinate
ynumberyesY coordinate
buttonstringnoleft (default), right, or middle
clickCountnumbernoNumber of clicks (2 for double-click)
delaynumbernoDelay between mousedown and mouseup (ms)
→ browser_mouse_click_xy { x: 150, y: 300 }
→ browser_mouse_click_xy { x: 150, y: 300, clickCount: 2 } // double-click

browser_mouse_drag_xy

Drag from one position to another.

ParameterTypeRequiredDescription
startXnumberyesStart X coordinate
startYnumberyesStart Y coordinate
endXnumberyesEnd X coordinate
endYnumberyesEnd Y coordinate
→ browser_mouse_drag_xy { startX: 100, startY: 200, endX: 400, endY: 200 }

When to use mouse tools

ScenarioUse
Clicking a button, link, or form elementbrowser_click with ref (default)
Canvas-based apps (drawing, maps)Mouse tools with vision
Custom UI controls without accessibilityMouse tools with vision
Drag interactions on pixel-precise targetsMouse tools with vision

For most web applications, refs from accessibility snapshots are more reliable than coordinates. Use mouse tools only when the accessibility tree doesn't expose the elements you need.