0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Llama 3.3 70B Instruct
vs
Phi-3 Medium 128K
vs
Qwen 3 72B Instruct
Llama 3.3 70B InstructA

Llama 3.3 70B Instruct

70B params · 131K context · llama-3

Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Phi-3 Medium 128KB

Phi-3 Medium 128K

14B params · 131K context · mit

Cheapest provider
$/1M input
$/1M output
Qwen 3 72B InstructC

Qwen 3 72B Instruct

72B params · 131K context · qwen

Cheapest providerfireworks-ai
$/1M input$220000.00
$/1M output$880000.00
Specs and cheapest providers
SpecLlama 3.3 70B InstructPhi-3 Medium 128KQwen 3 72B Instruct
Parameters70B14B72B
Context window131K tokens131K tokens131K tokens
Licensellama-3mitqwen
Released2024-12-062024-05-212025-04-28
Cheapest provider
Providerfireworks-aifireworks-ai
Input / 1M tokens$220000.00$220000.00
Output / 1M tokens$880000.00$880000.00
Benchmark comparison

No benchmark data available yet.

Editor's take
Three 128K-context models at different parameter counts covering the practical mid-tier of open-weights inference. The spread — 14B, 70B, and 72B — matters more than the numbers suggest because training data quality and architecture choices cut across raw scale. Llama 3.3 70B Instruct is Meta's December 2024 70B refresh, targeting better instruction-following than Llama 3.1 70B at the same footprint. It holds the familiar 131K context and Llama 3 community license, and is already the recommended replacement over 3.1 70B for new deployments. Provider coverage is among the widest of any open model, giving you genuine flexibility on cost and latency. On standard evals, it closes a portion of the gap to 405B-class models that existed in 3.1. Phi-3 Medium 128K, at 14 billion parameters, is the outlier on scale in this group. Microsoft's training data approach produces MMLU and GSM8K scores that match some 70B competitors on reasoning tasks, while costing substantially less per token. The gap shows in open-ended generation quality and broad knowledge coverage — GPQA-style science reasoning and MT-Bench conversational quality both reflect the smaller parameter budget. MIT license, Azure AI primary hosting. Qwen 3 72B Instruct is Alibaba's April 2025 flagship 72B, covering multilingual, code, and reasoning benchmarks more evenly than Phi-3 Medium while offering competitive pricing against Llama 3.3 70B across providers like Together AI, Fireworks, and Groq. Pick Llama 3.3 70B for the broadest provider choice and reliable instruction-following at 70B scale. Pick Qwen 3 72B when multilingual coverage or code tasks are part of the workload. Pick Phi-3 Medium 128K when per-token cost is a hard constraint and the task is structured reasoning.
Compare two at a time
Frequently asked questions
How does Llama 3.3 70B Instruct compare to Phi-3 Medium 128K and Qwen 3 72B Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Llama 3.3 70B Instruct, Phi-3 Medium 128K, or Qwen 3 72B Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Llama 3.3 70B Instruct, Phi-3 Medium 128K, and Qwen 3 72B Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details