Model crosswalk
Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.
DeepSeek R1
vs
DeepSeek V3
DeepSeek R1A
DeepSeek R1
671B params · 131K context · mit
Cheapest providerdeepinfra
$/1M input$400000.00
$/1M output$2000000.00
DeepSeek V3B
DeepSeek V3
671B params · 131K context · deepseek
Cheapest providerdeepinfra
$/1M input$200000.00
$/1M output$850000.00
Specs and cheapest providers
| Spec | DeepSeek R1 | DeepSeek V3 |
|---|---|---|
| Parameters | 671B | 671B |
| Context window | 131K tokens | 131K tokens |
| License | mit | deepseek |
| Released | 2025-01-20 | 2024-12-26 |
| Cheapest provider | ||
| Provider | deepinfra | deepinfra |
| Input / 1M tokens | $400000.00 | $200000.00🏆 |
| Output / 1M tokens | $2000000.00 | $850000.00🏆 |
Add a third model to compare
Benchmark comparison
No benchmark data available for either model yet.
Sample workload — 5M in + 2M out per month
using each model's cheapest providerWhat changes at scale
Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.
1M in · 250K out$900000.00 · $412500.00
5M in · 2M out$6000000.00 · $2700000.00
20M in · 10M out$28000000.00 · $12500000.00
100M in · 60M out$160000000.00 · $71000000.00
Capability vs price
scatter// scatter: benchmark × $/1M out
Calculate cost for your workload
Compare total monthly cost across providers for DeepSeek R1 and DeepSeek V3 using your own input/output token mix.
Open workload calculator →Editor's take
Same lab, fundamentally different design objectives. DeepSeek R1 is a reinforcement-learning-trained reasoning model — it generates extended chain-of-thought traces and excels on tasks where explicit step-by-step derivation beats retrieval. [DeepSeek V3](/models/deepseek--deepseek-v3) is a 671B Mixture-of-Experts dense-context model optimized for throughput, general capability, and instruction following at scale, with ~37B active parameters per forward pass.
The cost picture separates them clearly. DeepSeek V3 on commodity providers typically runs $0.14–$0.28/1M input tokens. DeepSeek R1 carries a premium — expect $0.50–$1.50/1M depending on provider — because the extended thinking overhead and GPU-hours per token are meaningfully higher. If you're running tens of millions of tokens per day, that delta compounds fast.
[DeepSeek R1](/models/deepseek--deepseek-r1) is the right call for tasks where accuracy on hard reasoning problems justifies the cost: competitive math, multi-step code proofs, formal verification assistance, or research workflows where you're willing to pay for a model that shows its work and catches its own errors. Its AIME 2024 and MATH-500 scores are significantly above V3.
DeepSeek V3 wins for high-volume general workloads: long-form drafting, RAG pipelines over large document sets, classification at scale, or agentic loops where most steps don't require deep reasoning — just reliable instruction execution at low per-token cost.
Pick DeepSeek R1 if your task requires verified, step-by-step reasoning and budget allows. Pick DeepSeek V3 if throughput and cost efficiency matter more than reasoning depth.
Related comparisons
Full model details