Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Codestral 22b
vs
Phi 3 Medium 128k
vs
Starcoder2 15b Instruct
Codestral 22bA
Codestral 22b
Cheapest provider—
$/1M input—
$/1M output—
Phi 3 Medium 128kB
Phi 3 Medium 128k
Cheapest provider—
$/1M input—
$/1M output—
Starcoder2 15b InstructC
Starcoder2 15b Instruct
Cheapest provider—
$/1M input—
$/1M output—
Specs and cheapest providers
| Spec | Codestral 22b | Phi 3 Medium 128k | Starcoder2 15b Instruct |
|---|---|---|---|
| Parameters | — | — | — |
| Context window | — | — | — |
| License | — | — | — |
| Released | — | — | — |
| Cheapest provider | |||
| Provider | — | — | — |
| Input / 1M tokens | — | — | — |
| Output / 1M tokens | — | — | — |
Benchmark comparison
No benchmark data available yet.
Editor's take
Three models in the 14–22B parameter range, each optimized for a different priority: coding breadth, reasoning quality, and training-data integrity.
Codestral 22B is Mistral AI's code-specialist model, released May 2024 with 22 billion parameters and coverage for 80-plus programming languages across a 32K context window. It performs competitively on HumanEval against DeepSeek Coder V2 Lite and early Qwen Coder models. The production ceiling is the Mistral Research License, which bars commercial deployment without a direct Mistral agreement. For internal tooling or research, it is a reasonable evaluation target; for shipping to end users, the licensing conversation is mandatory.
Phi-3 Medium 128K is Microsoft's 14-billion-parameter model from May 2024, built on heavily filtered synthetic textbook-quality training data rather than raw web text. The bet on data quality over parameter count pays off on MMLU and GSM8K, where it closes some of the gap to 70B-class models. The 131K context window — nearly four times Codestral's — makes it viable for long-document review, extended code refactors, and retrieval-augmented pipelines that need large context passes. It ships under the MIT license, so commercial deployment has no royalty friction. Provider coverage skews toward Azure AI and a handful of open inference hosts.
StarCoder2 15B Instruct, from the BigCode collaboration at HuggingFace and ServiceNow, runs 15 billion parameters with a 16K context window released September 2024. Benchmark performance trails both Codestral and Phi-3 Medium on code tasks. The differentiated case for StarCoder2 is training-data auditability: its training corpus, The Stack v2, is restricted to permissively licensed source code. In regulated industries where a model's training provenance must be documented for compliance review, that traceability justifies the benchmark trade.
Pick Codestral 22B for non-commercial research deployments. Pick Phi-3 Medium 128K for MIT-licensed production deployments that span both long-context reasoning and code review. Pick StarCoder2 15B when training-data provenance is a documented compliance requirement.
Compare two at a time
Frequently asked questions
- How does Codestral 22b compare to Phi 3 Medium 128k and Starcoder2 15b Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Codestral 22b, Phi 3 Medium 128k, or Starcoder2 15b Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Codestral 22b, Phi 3 Medium 128k, and Starcoder2 15b Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details