How does Codestral 22b compare to Phi 3 Medium 128k and Starcoder2 15b Instruct on price?

Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.

Which model is best for coding: Codestral 22b, Phi 3 Medium 128k, or Starcoder2 15b Instruct?

HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.

What is the context window for Codestral 22b, Phi 3 Medium 128k, and Starcoder2 15b Instruct?

Context window sizes are listed in the Specs row of the comparison table above.

Codestral 22b vs Phi 3 Medium 128k vs Starcoder2 15b Instruct (2026) — 3-way comparison

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Codestral 22b

Phi 3 Medium 128k

Starcoder2 15b Instruct

Codestral 22bA

Codestral 22b

Cheapest provider—

$/1M input—

$/1M output—

Phi 3 Medium 128kB

Phi 3 Medium 128k

Cheapest provider—

$/1M input—

$/1M output—

Starcoder2 15b InstructC

Starcoder2 15b Instruct

Cheapest provider—

$/1M input—

$/1M output—

Specs and cheapest providers

Spec	Codestral 22b	Phi 3 Medium 128k	Starcoder2 15b Instruct
Parameters	—	—	—
Context window	—	—	—
License	—	—	—
Released	—	—	—
Cheapest provider
Provider	—	—	—
Input / 1M tokens	—	—	—
Output / 1M tokens	—	—	—

Benchmark comparison

No benchmark data available yet.

Editor's take

Three models in the 14–22B parameter range, each optimized for a different priority: coding breadth, reasoning quality, and training-data integrity. Codestral 22B is Mistral AI's code-specialist model, released May 2024 with 22 billion parameters and coverage for 80-plus programming languages across a 32K context window. It performs competitively on HumanEval against DeepSeek Coder V2 Lite and early Qwen Coder models. The production ceiling is the Mistral Research License, which bars commercial deployment without a direct Mistral agreement. For internal tooling or research, it is a reasonable evaluation target; for shipping to end users, the licensing conversation is mandatory. Phi-3 Medium 128K is Microsoft's 14-billion-parameter model from May 2024, built on heavily filtered synthetic textbook-quality training data rather than raw web text. The bet on data quality over parameter count pays off on MMLU and GSM8K, where it closes some of the gap to 70B-class models. The 131K context window — nearly four times Codestral's — makes it viable for long-document review, extended code refactors, and retrieval-augmented pipelines that need large context passes. It ships under the MIT license, so commercial deployment has no royalty friction. Provider coverage skews toward Azure AI and a handful of open inference hosts. StarCoder2 15B Instruct, from the BigCode collaboration at HuggingFace and ServiceNow, runs 15 billion parameters with a 16K context window released September 2024. Benchmark performance trails both Codestral and Phi-3 Medium on code tasks. The differentiated case for StarCoder2 is training-data auditability: its training corpus, The Stack v2, is restricted to permissively licensed source code. In regulated industries where a model's training provenance must be documented for compliance review, that traceability justifies the benchmark trade. Pick Codestral 22B for non-commercial research deployments. Pick Phi-3 Medium 128K for MIT-licensed production deployments that span both long-context reasoning and code review. Pick StarCoder2 15B when training-data provenance is a documented compliance requirement.

Compare two at a time

Codestral 22b vs Phi 3 Medium 128k Codestral 22b vs Starcoder2 15b Instruct Phi 3 Medium 128k vs Starcoder2 15b Instruct

Frequently asked questions

How does Codestral 22b compare to Phi 3 Medium 128k and Starcoder2 15b Instruct on price?: Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Codestral 22b, Phi 3 Medium 128k, or Starcoder2 15b Instruct?: HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Codestral 22b, Phi 3 Medium 128k, and Starcoder2 15b Instruct?: Context window sizes are listed in the Specs row of the comparison table above.

Full model details

All providers for Codestral 22b →All providers for Phi 3 Medium 128k →All providers for Starcoder2 15b Instruct →