Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Codestral 22B
vs
Qwen 2.5 Coder 7B Instruct
vs
StarCoder2 15B Instruct
Codestral 22BA
Codestral 22B
22B params · 33K context · mistral-research
Cheapest provider—
$/1M input—
$/1M output—
Qwen 2.5 Coder 7B InstructB
Qwen 2.5 Coder 7B Instruct
7B params · 131K context · qwen
Cheapest provider—
$/1M input—
$/1M output—
StarCoder2 15B InstructC
StarCoder2 15B Instruct
15B params · 16K context · bigcode-openrail-m
Cheapest provider—
$/1M input—
$/1M output—
Specs and cheapest providers
| Spec | Codestral 22B | Qwen 2.5 Coder 7B Instruct | StarCoder2 15B Instruct |
|---|---|---|---|
| Parameters | 22B | 7B | 15B |
| Context window | 33K tokens | 131K tokens🏆 | 16K tokens |
| License | mistral-research | qwen | bigcode-openrail-m |
| Released | 2024-05-29 | 2024-11-12 | 2024-09-06 |
| Cheapest provider | |||
| Provider | — | — | — |
| Input / 1M tokens | — | — | — |
| Output / 1M tokens | — | — | — |
Benchmark comparison
No benchmark data available yet.
Editor's take
A direct comparison of three coding models at different parameter counts, from two commercially permissive options and one with a clear production blocker.
Codestral 22B is Mistral AI's first code-specialist model, released May 2024. It covers 80-plus programming languages with a 32K context window, and on HumanEval it competes with DeepSeek Coder V2 Lite and Qwen 2.5 Coder 7B. The 22B parameter count puts it above typical autocomplete models in quality, but the Mistral Research License prohibits commercial production deployment without a direct agreement with Mistral. This is a genuine constraint: teams often benchmark Codestral favorably and then hit this wall during legal review.
Qwen 2.5 Coder 7B Instruct, released November 2024, has HumanEval scores competitive with DeepSeek Coder 6.7B and StarCoder2 7B and a 131K context window at 7 billion parameters — generous context for its size tier. Hosted pricing typically runs below $0.20 per million tokens. The Qwen license permits commercial deployment. For teams looking for an autocomplete backbone that can also pass meaningful multi-file context, this is where most evaluation processes land.
StarCoder2 15B Instruct sits between the other two in parameter count at 15B, with a 16K context window from the BigCode collaboration (HuggingFace and ServiceNow). It trails both Codestral and Qwen 2.5 Coder on raw benchmarks, so it doesn't win on performance. Its differentiated case is training-data auditability: The Stack v2 training corpus is restricted to permissively licensed code, and that chain of custody matters to teams in regulated sectors. BigCode OpenRAIL-M is commercially usable under narrow restrictions.
Pick Codestral 22B for internal research or non-commercial tooling. Pick Qwen 2.5 Coder 7B for commercial production APIs at competitive token economics. Pick StarCoder2 15B when IP audit requirements demand verifiable training-data provenance.
Compare two at a time
Frequently asked questions
- How does Codestral 22B compare to Qwen 2.5 Coder 7B Instruct and StarCoder2 15B Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Codestral 22B, Qwen 2.5 Coder 7B Instruct, or StarCoder2 15B Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Codestral 22B, Qwen 2.5 Coder 7B Instruct, and StarCoder2 15B Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details