Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Phi-3 Medium 128K
vs
Phi-3 Mini 128K
vs
Phi-3.5 MoE Instruct
Phi-3 Medium 128KA
Phi-3 Medium 128K
14B params · 131K context · mit
Cheapest provider—
$/1M input—
$/1M output—
Phi-3 Mini 128KB
Phi-3 Mini 128K
4B params · 131K context · mit
Cheapest provider—
$/1M input—
$/1M output—
Phi-3.5 MoE InstructC
Phi-3.5 MoE Instruct
42B params · 131K context · mit
Cheapest provider—
$/1M input—
$/1M output—
Specs and cheapest providers
| Spec | Phi-3 Medium 128K | Phi-3 Mini 128K | Phi-3.5 MoE Instruct |
|---|---|---|---|
| Parameters | 14B | 4B | 42B |
| Context window | 131K tokens | 131K tokens | 131K tokens |
| License | mit | mit | mit |
| Released | 2024-05-21 | 2024-04-23 | 2024-08-20 |
| Cheapest provider | |||
| Provider | — | — | — |
| Input / 1M tokens | — | — | — |
| Output / 1M tokens | — | — | — |
Benchmark comparison
No benchmark data available yet.
Editor's take
Phi-3 Mini 128K, Phi-3 Medium 128K, and Phi-3.5 MoE Instruct are all Microsoft models released in 2024 under MIT licenses, all carrying 131K context windows. The unifying theme is Microsoft's textbook-quality synthetic training data strategy: the premise is that heavy data curation at small scale can close the gap to larger models trained on noisier corpora. All three are MIT-licensed with no commercial restrictions.
Phi-3 Mini 128K packs 3.8 billion parameters with a 131K context window, which is genuinely unusual for a sub-4B model. On reasoning and QA benchmarks it outperforms several 7B-class competitors, making it a cost-effective choice for document extraction and classification tasks that would otherwise require a larger host. Where it falls short is multi-step reasoning and complex instruction chains — at 3.8B, expectations need calibrating.
Phi-3 Medium 128K scales to 14 billion parameters while retaining the same context window and MIT license. MMLU scores and GSM8K accuracy exceed most 14B peers, closing some of the gap to 70B-class models on reasoning-heavy tasks. Hosted coverage skews toward Azure AI and a smaller set of open providers, which can be a real constraint if you need the breadth of Llama or Qwen's ecosystem.
Phi-3.5 MoE Instruct introduces a mixture-of-experts architecture: 41.9B total parameters across 16 experts but only about 6.6B active per forward pass. That active-parameter profile yields favorable inference economics on the hardware, with reasoning benchmark scores closer to a dense 14B than a 6B baseline. Azure AI is the primary hosting route. If MoE economics are relevant to your deployment cost model, this is worth a direct benchmark against Qwen MoE tiers.
Pick Phi-3 Mini for cost-floor long-context classification and extraction. Pick Phi-3 Medium for reasoning-intensive tasks at the 14B tier. Pick Phi-3.5 MoE when active-parameter efficiency matters and Azure AI is acceptable as the hosting provider.
Compare two at a time
Frequently asked questions
- How does Phi-3 Medium 128K compare to Phi-3 Mini 128K and Phi-3.5 MoE Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Phi-3 Medium 128K, Phi-3 Mini 128K, or Phi-3.5 MoE Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Phi-3 Medium 128K, Phi-3 Mini 128K, and Phi-3.5 MoE Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details