Model crosswalk
Side-by-side on price, capability and workload — three-way comparison.
Llama 3.3 70B Instruct
vs
Phi-3 Medium 128K
vs
Qwen 3 72B Instruct
Llama 3.3 70B InstructA
Llama 3.3 70B Instruct
70B params · 131K context · llama-3
Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Phi-3 Medium 128KB
Phi-3 Medium 128K
14B params · 131K context · mit
Cheapest provider—
$/1M input—
$/1M output—
Qwen 3 72B InstructC
Qwen 3 72B Instruct
72B params · 131K context · qwen
Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Specs and cheapest providers
| Spec | Llama 3.3 70B Instruct | Phi-3 Medium 128K | Qwen 3 72B Instruct |
|---|---|---|---|
| Parameters | 70B | 14B | 72B |
| Context window | 131K tokens | 131K tokens | 131K tokens |
| License | llama-3 | mit | qwen |
| Released | 2024-12-06 | 2024-05-21 | 2025-04-28 |
| Cheapest provider | |||
| Provider | fireworks-ai | — | fireworks-ai |
| Input / 1M tokens | $220000.00 | — | $220000.00 |
| Output / 1M tokens | $880000.00 | — | $880000.00 |
Benchmark comparison
No benchmark data available yet.
Editor's take
Three 128K-context models at different parameter counts covering the practical mid-tier of open-weights inference. The spread — 14B, 70B, and 72B — matters more than the numbers suggest because training data quality and architecture choices cut across raw scale.
Llama 3.3 70B Instruct is Meta's December 2024 70B refresh, targeting better instruction-following than Llama 3.1 70B at the same footprint. It holds the familiar 131K context and Llama 3 community license, and is already the recommended replacement over 3.1 70B for new deployments. Provider coverage is among the widest of any open model, giving you genuine flexibility on cost and latency. On standard evals, it closes a portion of the gap to 405B-class models that existed in 3.1.
Phi-3 Medium 128K, at 14 billion parameters, is the outlier on scale in this group. Microsoft's training data approach produces MMLU and GSM8K scores that match some 70B competitors on reasoning tasks, while costing substantially less per token. The gap shows in open-ended generation quality and broad knowledge coverage — GPQA-style science reasoning and MT-Bench conversational quality both reflect the smaller parameter budget. MIT license, Azure AI primary hosting.
Qwen 3 72B Instruct is Alibaba's April 2025 flagship 72B, covering multilingual, code, and reasoning benchmarks more evenly than Phi-3 Medium while offering competitive pricing against Llama 3.3 70B across providers like Together AI, Fireworks, and Groq.
Pick Llama 3.3 70B for the broadest provider choice and reliable instruction-following at 70B scale. Pick Qwen 3 72B when multilingual coverage or code tasks are part of the workload. Pick Phi-3 Medium 128K when per-token cost is a hard constraint and the task is structured reasoning.
Compare two at a time
Frequently asked questions
- How does Llama 3.3 70B Instruct compare to Phi-3 Medium 128K and Qwen 3 72B Instruct on price?
- Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
- Which model is best for coding: Llama 3.3 70B Instruct, Phi-3 Medium 128K, or Qwen 3 72B Instruct?
- HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
- What is the context window for Llama 3.3 70B Instruct, Phi-3 Medium 128K, and Qwen 3 72B Instruct?
- Context window sizes are listed in the Specs row of the comparison table above.
Full model details