Use-case preset

Web research agent cost calculator

Agentic loop issuing web searches; page content accumulates across turns.

Agentic loop: the model issues web search tool calls, receives page excerpts, synthesizes findings, and iterates until the research question is resolved. A single task typically spans 5–15 turns, with page content accumulating in the context across each step.

The 80/20 input/output split reflects a context that keeps growing — search results, prior reasoning, and tool specs compound with every turn. Sixteen-thousand tokens is necessary to hold 3–5 search result excerpts plus prior turn summaries without truncating critical evidence. Latency is best-effort since agents are async tasks, not real-time interactions. `cachedPromptPercent` is ~40: the tool schema and system instructions are constant across all turns; only the search results are new each step. Main cost lever: capping the iteration limit. An uncapped agent on a hard question can easily 5–10× the expected token spend — add a hard turn limit.

Recommended models

meta/llama-3.3-70b-instruct

Strong tool-use and multi-step reasoning; reliable at determining when to stop searching.

deepseek/deepseek-v3

Excellent reasoning and synthesis quality for complex research tasks at competitive pricing.

alibaba/qwen-3-72b-instruct

High-quality instruction following with good tool-use reliability across long agentic loops.

mistralai/mixtral-8x22b-instruct

MoE architecture handles the varied content types in search results — code, prose, tables — without degrading.