Use-case preset

Bug fix suggestion cost calculator

Stacktrace + source files in; unified diff out; under 5s.

The prompt contains a stacktrace, the offending file(s), and optionally the surrounding test that failed. Output is a unified diff or patched code block. Input tokens dominate at 85% because pasting two or three source files pushes the context to 8k–14k; output is the patch, typically 200–600 tokens. The 16k window fits most real-world bug reports without truncation.

Under-5s p95 latency supports interactive developer workflows where the suggestion appears in the IDE within a few seconds. A code-specialist model (Qwen Coder, Codestral, StarCoder2) typically outperforms a general-purpose model at the same parameter count for this task. Cache 30–50% of the prompt — the system prompt and tool/linter spec are stable across requests. Cost lever: cap context to the minimum files needed; every extra file costs input tokens and increases distraction for the model.

Recommended models

alibaba/qwen-2.5-coder-32b-instruct

Top-tier code-specialist at 32B; strong patch generation and diff format adherence.

mistralai/codestral-22b

Code-focused 22B model with excellent bug localisation and fix accuracy.

deepseek/deepseek-coder-v2-instruct

Strong code reasoning; reliable at pinpointing root causes in multi-file stacktraces.

alibaba/qwen-2.5-coder-7b-instruct

7B code-specialist; fast and cheap for simpler single-file bug fixes.

bigcode/starcoder2-15b-instruct

Code-native 15B; solid diff output format compliance and file-level patch generation.