Use-case preset

Tool-calling-heavy workflow cost calculator

Agent with many parallel tool calls per turn; ops automation.

Operations-automation agents issue many tool calls per turn — querying APIs, reading configs, triggering webhooks — and accumulate tool-call results in context across multiple rounds. The 60/40 input/output split reflects that tool results fed back into the context make output token counts higher than typical chat. The 16k window accommodates a long tool spec plus several rounds of tool-result accumulation before the final answer.

Under-5s p95 latency applies to individual LLM steps; total workflow latency multiplies by the number of rounds. Models must support parallel function-calling — sequential tool calls kill throughput. Cache 30–50% of the prompt: the tool spec and system instructions are stable while user intent and tool results change. Cost lever: prune tool-result payloads aggressively before feeding them back; API responses often contain 10× more JSON than the agent actually needs.

Recommended models

meta/llama-3.3-70b-instruct

Reliable parallel tool-calling with strong JSON schema adherence at 70B scale.

alibaba/qwen-3-72b-instruct

Excellent function-calling accuracy; handles large tool specs and multi-round contexts well.

mistralai/mixtral-8x22b-instruct

MoE architecture provides fast inference on complex multi-tool workflows.

deepseek/deepseek-v3

Strong agentic reasoning with reliable structured output for tool-call payloads.

microsoft/phi-3.5-moe-instruct

MoE efficiency; competitive tool-calling at lower cost per step for simpler ops workflows.