0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Phi-3 Medium 128K
vs
Phi-3 Mini 128K
vs
Qwen 3 32B Instruct
Phi-3 Medium 128KA

Phi-3 Medium 128K

14B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Phi-3 Mini 128KB

Phi-3 Mini 128K

4B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Qwen 3 32B InstructC

Qwen 3 32B Instruct

32B params · 131K context · qwen

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecPhi-3 Medium 128KPhi-3 Mini 128KQwen 3 32B Instruct
Parameters14B4B32B
Context window131K tokens131K tokens131K tokens
Licensemitmitqwen
Released2024-05-212024-04-232025-04-28
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
Two Microsoft models and one Alibaba model, all anchored around 128K context windows, spanning a 4B-to-32B parameter range. The selection question here is really about how much you pay for what kind of capability jump. Phi-3 Mini 128K at 3.8 billion parameters is the entry point. Microsoft's curated synthetic training data earns it competitive MMLU and GSM8K numbers against 7B-class peers, which is the core value proposition. The MIT license and low inference cost make it an appealing classification or structured-extraction workhorse. The boundaries show in creative generation, complex reasoning chains, and anything requiring broad world knowledge rather than reasoning depth. Phi-3 Medium 128K adds roughly 10 billion more parameters — 14B total — using the same training philosophy. The MMLU and GSM8K uplift is real, and the gap to 70B-class models on reasoning-heavy tasks narrows meaningfully. It is still behind on open-ended generation and multilingual coverage, and the Azure AI hosting skew limits provider flexibility. MIT license remains. Qwen 3 32B Instruct changes the game on multilingual breadth. At 32B parameters it delivers strong CJK and Arabic performance that neither Phi-3 model approaches, covers code tasks competently, and holds a 131K context window under the Qwen commercial license. Benchmark scores land at roughly 85% of Qwen 3 72B across standard suites, and providers like Together AI, Fireworks, Groq, and DeepInfra all carry it. Pick Phi-3 Mini 128K when cost minimization is the top priority and the workload is structured reasoning or QA. Pick Phi-3 Medium 128K for a meaningful capability step-up at a still-reasonable per-token rate. Pick Qwen 3 32B when multilingual coverage, broader benchmark performance, or provider choice flexibility matter.
Compare two at a time
Frequently asked questions
How does Phi-3 Medium 128K compare to Phi-3 Mini 128K and Qwen 3 32B Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Phi-3 Medium 128K, Phi-3 Mini 128K, or Qwen 3 32B Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Phi-3 Medium 128K, Phi-3 Mini 128K, and Qwen 3 32B Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details