Use-case preset

Code generation agent cost calculator

Generate multi-file code from a spec; long outputs, 5s latency budget.

A coding agent that accepts a spec and produces multi-file implementations: the prompt includes the spec, existing file tree, and style guidelines; the output is substantial generated code. The 60/40 input/output split is notably higher on output than most workloads — expect 2–4k output tokens per invocation. The 32k context window accommodates the full codebase context needed for coherent multi-file generation. The 5s p95 latency budget allows heavier models without interactive-tier pricing.

Caching the tool definitions and style guide (stable across invocations) saves roughly 35% of input cost. The main quality risk is context fragmentation: if you truncate the codebase to fit the context window, the model generates inconsistent interfaces. Prefer models with strong code benchmarks (HumanEval, SWE-bench) even at higher per-token cost — generation errors compound across files and correction is expensive.

Recommended models

deepseek/deepseek-coder-v2-instruct

Top-tier code generation benchmark scores; strong multi-file coherence at 32k context.

alibaba/qwen-2.5-coder-32b-instruct

Purpose-built code model with excellent instruction following for spec-to-code tasks.

mistralai/codestral-22b

Fast and accurate code generation; good balance of quality and cost for the 5s latency tier.

meta/llama-3.1-405b-instruct

Maximum reasoning capability for complex multi-file architectures when quality is paramount.

deepseek/deepseek-v3

Strong general code generation with competitive pricing and large context support.