0 providers0 models

Model crosswalk

Side-by-side on price, capability and workload — three-way comparison.

Deepseek R1
vs
Llama 3.1 405b Instruct
vs
Qwen 3 72b Instruct
Deepseek R1A

Deepseek R1

Cheapest provider
$/1M input
$/1M output
Llama 3.1 405b InstructB

Llama 3.1 405b Instruct

Cheapest provider
$/1M input
$/1M output
Qwen 3 72b InstructC

Qwen 3 72b Instruct

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecDeepseek R1Llama 3.1 405b InstructQwen 3 72b Instruct
Parameters
Context window
License
Released
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available yet.

Editor's take
This is a genuine reasoning-versus-scale comparison, with three different architectures hitting different points on the benchmark-to-cost curve. DeepSeek R1 is a reasoning-specialized MoE model trained with reinforcement learning to produce explicit chain-of-thought traces before answering. On GPQA and competition math, it outperforms dense models several times its active-parameter count. The tradeoff is latency: reasoning traces add tokens, which increases time-to-first-response and total cost. Context window is 131K. DeepSeek's license terms require verification for commercial use. This model is the right choice when you need defensible step-by-step reasoning — scientific question answering, formal proofs, or tasks where auditability of the reasoning chain matters. Llama 3.1 405B Instruct is Meta's largest dense open-weights model from July 2024, 405 billion parameters, 131K context, and the broadest knowledge coverage in this group. MMLU scores are among the highest for any open model at launch. The cost is significant — multi-GPU serving at 405B means hosted pricing is substantially above 70B tiers, and provider availability is limited to those with large-memory inference infrastructure. For complex document synthesis, long-context analysis, or frontier reasoning without the explicit chain-of-thought style of R1, this is the reference. Qwen 3 72B Instruct from April 2025 is the practical workhorse at a fraction of 405B serving cost. It covers MMLU, HumanEval, and multilingual benchmarks solidly. What it lacks relative to R1 is explicit reasoning trace quality on hard GPQA-style tasks, and relative to 405B it lacks depth on knowledge-intensive questions. Pick DeepSeek R1 when chain-of-thought reasoning quality and GPQA/math performance are the primary evaluation criteria. Pick Llama 3.1 405B when raw frontier capability and broad knowledge coverage justify the cost. Pick Qwen 3 72B for strong all-around performance at an accessible price point.
Compare two at a time
Frequently asked questions
How does Deepseek R1 compare to Llama 3.1 405b Instruct and Qwen 3 72b Instruct on price?
Use the table above to compare input and output prices per 1M tokens across the cheapest available providers for each model.
Which model is best for coding: Deepseek R1, Llama 3.1 405b Instruct, or Qwen 3 72b Instruct?
HumanEval and other code benchmarks are shown in the table. For production code tasks, also consider context window size and provider latency.
What is the context window for Deepseek R1, Llama 3.1 405b Instruct, and Qwen 3 72b Instruct?
Context window sizes are listed in the Specs row of the comparison table above.
Full model details