0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Phi-3 Medium 128K
vs
Phi-3 Mini 128K
vs
Phi-3.5 MoE Instruct
Phi-3 Medium 128KA

Phi-3 Medium 128K

14B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Phi-3 Mini 128KB

Phi-3 Mini 128K

4B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Phi-3.5 MoE InstructC

Phi-3.5 MoE Instruct

42B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecPhi-3 Medium 128KPhi-3 Mini 128KPhi-3.5 MoE Instruct
Parameters14B4B42B
Context window131K tokens131K tokens131K tokens
Licensemitmitmit
Released2024-05-212024-04-232024-08-20
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
Phi-3 Mini 128K, Phi-3 Medium 128K, and Phi-3.5 MoE Instruct are all Microsoft models released in 2024 under MIT licenses, all carrying 131K context windows. The unifying theme is Microsoft's textbook-quality synthetic training data strategy: the premise is that heavy data curation at small scale can close the gap to larger models trained on noisier corpora. All three are MIT-licensed with no commercial restrictions. Phi-3 Mini 128K packs 3.8 billion parameters with a 131K context window, which is genuinely unusual for a sub-4B model. On reasoning and QA benchmarks it outperforms several 7B-class competitors, making it a cost-effective choice for document extraction and classification tasks that would otherwise require a larger host. Where it falls short is multi-step reasoning and complex instruction chains — at 3.8B, expectations need calibrating. Phi-3 Medium 128K scales to 14 billion parameters while retaining the same context window and MIT license. MMLU scores and GSM8K accuracy exceed most 14B peers, closing some of the gap to 70B-class models on reasoning-heavy tasks. Hosted coverage skews toward Azure AI and a smaller set of open providers, which can be a real constraint if you need the breadth of Llama or Qwen's ecosystem. Phi-3.5 MoE Instruct introduces a mixture-of-experts architecture: 41.9B total parameters across 16 experts but only about 6.6B active per forward pass. That active-parameter profile yields favorable inference economics on the hardware, with reasoning benchmark scores closer to a dense 14B than a 6B baseline. Azure AI is the primary hosting route. If MoE economics are relevant to your deployment cost model, this is worth a direct benchmark against Qwen MoE tiers. Pick Phi-3 Mini for cost-floor long-context classification and extraction. Pick Phi-3 Medium for reasoning-intensive tasks at the 14B tier. Pick Phi-3.5 MoE when active-parameter efficiency matters and Azure AI is acceptable as the hosting provider.
Compare two at a time
Frequently asked questions
How does Phi-3 Medium 128K compare to Phi-3 Mini 128K and Phi-3.5 MoE Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Phi-3 Medium 128K, Phi-3 Mini 128K, or Phi-3.5 MoE Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Phi-3 Medium 128K, Phi-3 Mini 128K, and Phi-3.5 MoE Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details