How does Phi-3 Medium 128K compare to Phi-3 Mini 128K and Phi-3.5 MoE Instruct on price?

Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.

Which model is best for coding: Phi-3 Medium 128K, Phi-3 Mini 128K, or Phi-3.5 MoE Instruct?

HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.

What is the context window for Phi-3 Medium 128K, Phi-3 Mini 128K, and Phi-3.5 MoE Instruct?

Context window sizes are listed in the Specs row of the comparison table above.

Phi 3 Medium 128k vs Phi 3 Mini 128k vs Phi 3.5 Moe Instruct (2026) — 3-way comparison

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Phi-3 Medium 128K

Phi-3 Mini 128K

Phi-3.5 MoE Instruct

Phi-3 Medium 128KA

Phi-3 Medium 128K

14B params · 131K context · mit

Cheapest provider—

$/1M input—

$/1M output—

Phi-3 Mini 128KB

Phi-3 Mini 128K

4B params · 131K context · mit

Cheapest provider—

$/1M input—

$/1M output—

Phi-3.5 MoE InstructC

Phi-3.5 MoE Instruct

42B params · 131K context · mit

Cheapest provider—

$/1M input—

$/1M output—

Specs and cheapest providers

Spec	Phi-3 Medium 128K	Phi-3 Mini 128K	Phi-3.5 MoE Instruct
Parameters	14B	4B	42B
Context window	131K tokens	131K tokens	131K tokens
License	mit	mit	mit
Released	2024-05-21	2024-04-23	2024-08-20
Cheapest provider
Provider	—	—	—
Input / 1M tokens	—	—	—
Output / 1M tokens	—	—	—

Benchmark comparison

No benchmark data available yet.

Editor's take

Phi-3 Mini 128K, Phi-3 Medium 128K, and Phi-3.5 MoE Instruct are all Microsoft models released in 2024 under MIT licenses, all carrying 131K context windows. The unifying theme is Microsoft's textbook-quality synthetic training data strategy: the premise is that heavy data curation at small scale can close the gap to larger models trained on noisier corpora. All three are MIT-licensed with no commercial restrictions. Phi-3 Mini 128K packs 3.8 billion parameters with a 131K context window, which is genuinely unusual for a sub-4B model. On reasoning and QA benchmarks it outperforms several 7B-class competitors, making it a cost-effective choice for document extraction and classification tasks that would otherwise require a larger host. Where it falls short is multi-step reasoning and complex instruction chains — at 3.8B, expectations need calibrating. Phi-3 Medium 128K scales to 14 billion parameters while retaining the same context window and MIT license. MMLU scores and GSM8K accuracy exceed most 14B peers, closing some of the gap to 70B-class models on reasoning-heavy tasks. Hosted coverage skews toward Azure AI and a smaller set of open providers, which can be a real constraint if you need the breadth of Llama or Qwen's ecosystem. Phi-3.5 MoE Instruct introduces a mixture-of-experts architecture: 41.9B total parameters across 16 experts but only about 6.6B active per forward pass. That active-parameter profile yields favorable inference economics on the hardware, with reasoning benchmark scores closer to a dense 14B than a 6B baseline. Azure AI is the primary hosting route. If MoE economics are relevant to your deployment cost model, this is worth a direct benchmark against Qwen MoE tiers. Pick Phi-3 Mini for cost-floor long-context classification and extraction. Pick Phi-3 Medium for reasoning-intensive tasks at the 14B tier. Pick Phi-3.5 MoE when active-parameter efficiency matters and Azure AI is acceptable as the hosting provider.

Compare two at a time

Phi-3 Medium 128K vs Phi-3 Mini 128K Phi-3 Medium 128K vs Phi-3.5 MoE Instruct Phi-3 Mini 128K vs Phi-3.5 MoE Instruct

Frequently asked questions

How does Phi-3 Medium 128K compare to Phi-3 Mini 128K and Phi-3.5 MoE Instruct on price?: Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Phi-3 Medium 128K, Phi-3 Mini 128K, or Phi-3.5 MoE Instruct?: HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Phi-3 Medium 128K, Phi-3 Mini 128K, and Phi-3.5 MoE Instruct?: Context window sizes are listed in the Specs row of the comparison table above.

Full model details

All providers for Phi-3 Medium 128K →All providers for Phi-3 Mini 128K →All providers for Phi-3.5 MoE Instruct →