0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Llama 3.1 405B Instruct
vs
Phi-3 Medium 128K
vs
Qwen 3 72B Instruct
Llama 3.1 405B InstructA

Llama 3.1 405B Instruct

405B params · 131K context · llama-3

Cheapest providerdeepinfra
$/1M input$2700000.00
$/1M output$8000000.00
Phi-3 Medium 128KB

Phi-3 Medium 128K

14B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Qwen 3 72B InstructC

Qwen 3 72B Instruct

72B params · 131K context · qwen

Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Specs and cheapest providers
SpecLlama 3.1 405B InstructPhi-3 Medium 128KQwen 3 72B Instruct
Parameters405B14B72B
Context window131K tokens131K tokens131K tokens
Licensellama-3mitqwen
Released2024-07-232024-05-212025-04-28
Cheapest provider
Providerdeepinfrafireworks-ai
Input / 1M tokens$2700000.00$220000.00🏆
Output / 1M tokens$8000000.00$880000.00🏆
Benchmark comparison

No benchmark data available yet.

Editor's take
This is a comparison across three different scale philosophies: massive dense weights, efficient small-model training data, and a balanced mid-tier with broad multilingual coverage. Llama 3.1 405B Instruct is Meta's largest openly licensed model from July 2024 — 405 billion dense parameters, 131K context, and the Llama 3 community license. It scores near the top of open-weights models on MMLU and complex reasoning evaluations. The cost to run 405B is substantial: hosted pricing is significantly higher than 70B-class alternatives, and multi-GPU requirements limit which providers carry it. It makes the most sense for workloads where raw capability matters more than cost-per-token — synthesis, complex document analysis, or agentic tasks that genuinely benefit from scale. Phi-3 Medium 128K brings 14 billion parameters trained on Microsoft's curated synthetic corpus, delivering MMLU and GSM8K scores that approach several 70B models on reasoning-heavy benchmarks. The 131K context window is present, and MIT licensing removes commercial friction. The tradeoff is coverage: it underperforms on open-ended generation, creative tasks, and anything requiring broad world knowledge over reasoning depth. Provider availability is more limited than either peer. Qwen 3 72B Instruct, released April 2025, occupies the middle ground well. At 72B parameters with a 131K context window, it covers MMLU, multilingual, and code benchmarks with fewer gaps than Phi-3 Medium while costing a fraction of 405B inference. The Qwen license supports commercial deployment. Pick Llama 3.1 405B when your task genuinely requires frontier-level open-weights capability. Pick Qwen 3 72B for strong all-around performance at reasonable 72B pricing. Pick Phi-3 Medium 128K when per-token cost is the constraint and the task is structured reasoning or QA.
Compare two at a time
Frequently asked questions
How does Llama 3.1 405B Instruct compare to Phi-3 Medium 128K and Qwen 3 72B Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Llama 3.1 405B Instruct, Phi-3 Medium 128K, or Qwen 3 72B Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Llama 3.1 405B Instruct, Phi-3 Medium 128K, and Qwen 3 72B Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details