How does Llama 3.1 405B Instruct compare to Phi-3 Medium 128K and Qwen 3 72B Instruct on price?

Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.

Which model is best for coding: Llama 3.1 405B Instruct, Phi-3 Medium 128K, or Qwen 3 72B Instruct?

HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.

What is the context window for Llama 3.1 405B Instruct, Phi-3 Medium 128K, and Qwen 3 72B Instruct?

Context window sizes are listed in the Specs row of the comparison table above.

Llama 3.1 405b Instruct vs Phi 3 Medium 128k vs Qwen 3 72b Instruct (2026) — 3-way comparison

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Llama 3.1 405B Instruct

Phi-3 Medium 128K

Qwen 3 72B Instruct

Llama 3.1 405B InstructA

Llama 3.1 405B Instruct

405B params · 131K context · llama-3

Cheapest providerdeepinfra

$/1M input$2700000.00

$/1M output$8000000.00

Phi-3 Medium 128KB

Phi-3 Medium 128K

14B params · 131K context · mit

Cheapest provider—

$/1M input—

$/1M output—

Qwen 3 72B InstructC

Qwen 3 72B Instruct

72B params · 131K context · qwen

Cheapest providerfireworks-ai

$/1M input$220000.00

$/1M output$880000.00

Specs and cheapest providers

Spec	Llama 3.1 405B Instruct	Phi-3 Medium 128K	Qwen 3 72B Instruct
Parameters	405B	14B	72B
Context window	131K tokens	131K tokens	131K tokens
License	llama-3	mit	qwen
Released	2024-07-23	2024-05-21	2025-04-28
Cheapest provider
Provider	deepinfra	—	fireworks-ai
Input / 1M tokens	$2700000.00	—	$220000.00🏆
Output / 1M tokens	$8000000.00	—	$880000.00🏆

Benchmark comparison

No benchmark data available yet.

Editor's take

This is a comparison across three different scale philosophies: massive dense weights, efficient small-model training data, and a balanced mid-tier with broad multilingual coverage. Llama 3.1 405B Instruct is Meta's largest openly licensed model from July 2024 — 405 billion dense parameters, 131K context, and the Llama 3 community license. It scores near the top of open-weights models on MMLU and complex reasoning evaluations. The cost to run 405B is substantial: hosted pricing is significantly higher than 70B-class alternatives, and multi-GPU requirements limit which providers carry it. It makes the most sense for workloads where raw capability matters more than cost-per-token — synthesis, complex document analysis, or agentic tasks that genuinely benefit from scale. Phi-3 Medium 128K brings 14 billion parameters trained on Microsoft's curated synthetic corpus, delivering MMLU and GSM8K scores that approach several 70B models on reasoning-heavy benchmarks. The 131K context window is present, and MIT licensing removes commercial friction. The tradeoff is coverage: it underperforms on open-ended generation, creative tasks, and anything requiring broad world knowledge over reasoning depth. Provider availability is more limited than either peer. Qwen 3 72B Instruct, released April 2025, occupies the middle ground well. At 72B parameters with a 131K context window, it covers MMLU, multilingual, and code benchmarks with fewer gaps than Phi-3 Medium while costing a fraction of 405B inference. The Qwen license supports commercial deployment. Pick Llama 3.1 405B when your task genuinely requires frontier-level open-weights capability. Pick Qwen 3 72B for strong all-around performance at reasonable 72B pricing. Pick Phi-3 Medium 128K when per-token cost is the constraint and the task is structured reasoning or QA.

Compare two at a time

Llama 3.1 405B Instruct vs Phi-3 Medium 128K Llama 3.1 405B Instruct vs Qwen 3 72B Instruct Phi-3 Medium 128K vs Qwen 3 72B Instruct

Frequently asked questions

How does Llama 3.1 405B Instruct compare to Phi-3 Medium 128K and Qwen 3 72B Instruct on price?: Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Llama 3.1 405B Instruct, Phi-3 Medium 128K, or Qwen 3 72B Instruct?: HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Llama 3.1 405B Instruct, Phi-3 Medium 128K, and Qwen 3 72B Instruct?: Context window sizes are listed in the Specs row of the comparison table above.

Full model details

All providers for Llama 3.1 405B Instruct →All providers for Phi-3 Medium 128K →All providers for Qwen 3 72B Instruct →