Use-case preset
Computer-use agent cost calculator
Browser and desktop automation with vision in long-running sessions.
A computer-use agent browses the web or controls a desktop UI autonomously: screenshots, DOM snapshots, and tool-call results accumulate in the context across many steps before any significant text is written back. The 70/30 input/output split reflects tool-result ingestion plus agent reasoning, with a 32k window to hold multi-turn state including encoded screenshots.
Sessions run unattended so best-effort latency applies, though per-step latency still matters for wall-clock completion time. cachedPromptPercent of 35 covers the reused tool spec, action schema, and system instructions that prefix every step. The dominant cost levers are step count and screenshot resolution: dropping from 1080p to 720p thumbnails cuts per-screenshot token count by roughly 55%. Vision-capable 70B+ models are required; smaller vision models degrade meaningfully on UI element grounding.