Model crosswalk
Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.
Codestral 22B
vs
StarCoder2 15B Instruct
Codestral 22BA
Codestral 22B
22B params · 33K context · mistral-research
Cheapest provider—
$/1M input—
$/1M output—
StarCoder2 15B InstructB
StarCoder2 15B Instruct
15B params · 16K context · bigcode-openrail-m
Cheapest provider—
$/1M input—
$/1M output—
Specs and cheapest providers
| Spec | Codestral 22B | StarCoder2 15B Instruct |
|---|---|---|
| Parameters | 22B | 15B |
| Context window | 33K tokens🏆 | 16K tokens |
| License | mistral-research | bigcode-openrail-m |
| Released | 2024-05-29 | 2024-09-06 |
| Cheapest provider | ||
| Provider | — | — |
| Input / 1M tokens | — | — |
| Output / 1M tokens | — | — |
Add a third model to compare
Benchmark comparison
No benchmark data available for either model yet.
Sample workload — 5M in + 2M out per month
using each model's cheapest providerWhat changes at scale
Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.
1M in · 250K out$0.00 · $0.00
5M in · 2M out$0.00 · $0.00
20M in · 10M out$0.00 · $0.00
100M in · 60M out$0.00 · $0.00
Capability vs price
scatter// scatter: benchmark × $/1M out
Calculate cost for your workload
Compare total monthly cost across providers for Codestral 22B and StarCoder2 15B Instruct using your own input/output token mix.
Open workload calculator →Editor's take
Codestral 22B and StarCoder2 15B Instruct are close enough in size to make this a genuine architecture-and-training debate rather than a raw-parameter story. Codestral is Mistral's dedicated code model; StarCoder2 15B comes from the BigCode collaboration, trained on The Stack v2 with explicit permissive licensing (BigCode OpenRAIL-M). If open, redistributable weights are a hard requirement, StarCoder2's license is cleaner for many enterprise legal reviews.
On HumanEval, Codestral 22B scores roughly 81%, while StarCoder2 15B Instruct lands around 72–73%. The 7-8 point gap narrows on multi-language benchmarks where StarCoder2's broad training corpus — over 600 programming languages — gives it solid coverage of niche languages like Fortran, COBOL, or Elixir that Codestral may handle less gracefully.
For polyglot codebases with a mix of mainstream and legacy languages — say, a financial system touching Python, Scala, and COBOL — [StarCoder2 15B Instruct](/models/bigcode--starcoder2-15b-instruct) is worth testing. The breadth of training data may outweigh the raw benchmark gap on your actual distribution of code.
Codestral 22B is the better choice for pure Python/JavaScript/TypeScript-heavy workloads where HumanEval-style accuracy matters and you can absorb slightly higher inference cost. Its 32K context window also beats StarCoder2 15B's shorter window for file-level refactoring tasks. Check provider pricing on [Codestral 22B's model page](/models/mistralai--codestral-22b).
**Pick StarCoder2 15B Instruct** for polyglot or license-sensitive deployments. **Pick Codestral 22B** for mainstream-language accuracy and longer context needs.
Related comparisons
Full model details