Use-case preset
Tool-calling-heavy workflow cost calculator
Agent with many parallel tool calls per turn; ops automation.
Operations-automation agents issue many tool calls per turn — querying APIs, reading configs, triggering webhooks — and accumulate tool-call results in context across multiple rounds. The 60/40 input/output split reflects that tool results fed back into the context make output token counts higher than typical chat. The 16k window accommodates a long tool spec plus several rounds of tool-result accumulation before the final answer.
Under-5s p95 latency applies to individual LLM steps; total workflow latency multiplies by the number of rounds. Models must support parallel function-calling — sequential tool calls kill throughput. Cache 30–50% of the prompt: the tool spec and system instructions are stable while user intent and tool results change. Cost lever: prune tool-result payloads aggressively before feeding them back; API responses often contain 10× more JSON than the agent actually needs.