Use-case preset
Bug fix suggestion cost calculator
Stacktrace + source files in; unified diff out; under 5s.
The prompt contains a stacktrace, the offending file(s), and optionally the surrounding test that failed. Output is a unified diff or patched code block. Input tokens dominate at 85% because pasting two or three source files pushes the context to 8k–14k; output is the patch, typically 200–600 tokens. The 16k window fits most real-world bug reports without truncation.
Under-5s p95 latency supports interactive developer workflows where the suggestion appears in the IDE within a few seconds. A code-specialist model (Qwen Coder, Codestral, StarCoder2) typically outperforms a general-purpose model at the same parameter count for this task. Cache 30–50% of the prompt — the system prompt and tool/linter spec are stable across requests. Cost lever: cap context to the minimum files needed; every extra file costs input tokens and increases distraction for the model.