Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Llama 3.1 405B Instruct
vs
Phi-3 Medium 128K
vs
Qwen 3 72B Instruct
Llama 3.1 405B InstructA
Llama 3.1 405B Instruct
405B params · 131K context · llama-3
Cheapest providerdeepinfra
$/1M input$2700000.00
$/1M output$8000000.00
Phi-3 Medium 128KB
Phi-3 Medium 128K
14B params · 131K context · mit
Cheapest provider—
$/1M input—
$/1M output—
Qwen 3 72B InstructC
Qwen 3 72B Instruct
72B params · 131K context · qwen
Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Specs and cheapest providers
| Spec | Llama 3.1 405B Instruct | Phi-3 Medium 128K | Qwen 3 72B Instruct |
|---|---|---|---|
| Parameters | 405B | 14B | 72B |
| Context window | 131K tokens | 131K tokens | 131K tokens |
| License | llama-3 | mit | qwen |
| Released | 2024-07-23 | 2024-05-21 | 2025-04-28 |
| Cheapest provider | |||
| Provider | deepinfra | — | fireworks-ai |
| Input / 1M tokens | $2700000.00 | — | $220000.00🏆 |
| Output / 1M tokens | $8000000.00 | — | $880000.00🏆 |
Benchmark comparison
No benchmark data available yet.
Editor's take
This is a comparison across three different scale philosophies: massive dense weights, efficient small-model training data, and a balanced mid-tier with broad multilingual coverage.
Llama 3.1 405B Instruct is Meta's largest openly licensed model from July 2024 — 405 billion dense parameters, 131K context, and the Llama 3 community license. It scores near the top of open-weights models on MMLU and complex reasoning evaluations. The cost to run 405B is substantial: hosted pricing is significantly higher than 70B-class alternatives, and multi-GPU requirements limit which providers carry it. It makes the most sense for workloads where raw capability matters more than cost-per-token — synthesis, complex document analysis, or agentic tasks that genuinely benefit from scale.
Phi-3 Medium 128K brings 14 billion parameters trained on Microsoft's curated synthetic corpus, delivering MMLU and GSM8K scores that approach several 70B models on reasoning-heavy benchmarks. The 131K context window is present, and MIT licensing removes commercial friction. The tradeoff is coverage: it underperforms on open-ended generation, creative tasks, and anything requiring broad world knowledge over reasoning depth. Provider availability is more limited than either peer.
Qwen 3 72B Instruct, released April 2025, occupies the middle ground well. At 72B parameters with a 131K context window, it covers MMLU, multilingual, and code benchmarks with fewer gaps than Phi-3 Medium while costing a fraction of 405B inference. The Qwen license supports commercial deployment.
Pick Llama 3.1 405B when your task genuinely requires frontier-level open-weights capability. Pick Qwen 3 72B for strong all-around performance at reasonable 72B pricing. Pick Phi-3 Medium 128K when per-token cost is the constraint and the task is structured reasoning or QA.
Compare two at a time
Frequently asked questions
- How does Llama 3.1 405B Instruct compare to Phi-3 Medium 128K and Qwen 3 72B Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Llama 3.1 405B Instruct, Phi-3 Medium 128K, or Qwen 3 72B Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Llama 3.1 405B Instruct, Phi-3 Medium 128K, and Qwen 3 72B Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details