0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

DeepSeek V3
vs
Mixtral 8x22B Instruct
DeepSeek V3A

DeepSeek V3

671B params · 131K context · deepseek

Cheapest providerdeepinfra
$/1M input$200000.00
$/1M output$850000.00
Mixtral 8x22B InstructB

Mixtral 8x22B Instruct

141B params · 66K context · apache-2.0

Cheapest providerdeepinfra
$/1M input$600000.00
$/1M output$650000.00
Specs and cheapest providers
SpecDeepSeek V3Mixtral 8x22B Instruct
Parameters671B141B
Context window131K tokens🏆66K tokens
Licensedeepseekapache-2.0
Released2024-12-262024-04-17
Cheapest provider
Providerdeepinfradeepinfra
Input / 1M tokens$200000.00🏆$600000.00
Output / 1M tokens$850000.00$650000.00🏆

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider
DeepSeek V3
$2700000.00 /mo
Mixtral 8x22B Instruct
$4300000.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$412500.00 · $762500.00
5M in · 2M out$2700000.00 · $4300000.00
20M in · 10M out$12500000.00 · $18500000.00
100M in · 60M out$71000000.00 · $99000000.00

Capability vs price

scatter
// scatter: benchmark × $/1M out
Calculate cost for your workload

Compare total monthly cost across providers for DeepSeek V3 and Mixtral 8x22B Instruct using your own input/output token mix.

Open workload calculator →
Editor's take
Both models are sparse Mixture-of-Experts, but at very different scales. [Mixtral 8x22B Instruct](/models/mistralai--mixtral-8x22b-instruct) has 141B total parameters with ~39B active per token across 8 experts. [DeepSeek V3](/models/deepseek--deepseek-v3) runs 671B total with ~37B active, using a finer-grained 256-expert routing scheme. Active parameter counts are roughly comparable, but DeepSeek V3's larger total capacity enables substantially higher benchmark scores — roughly 10–15 points ahead on MMLU and HumanEval in most published evals. Pricing splits along provider maturity: Mixtral 8x22B is a well-established model with broad provider support and competitive spot rates often under $0.50/M input tokens. DeepSeek V3 rates vary more widely by provider, ranging from ~$0.14/M at the cheapest to $1+/M at providers without dedicated MoE kernels. Mixtral 8x22B Instruct holds a latency advantage on providers with mature vLLM deployments — its smaller total weight footprint makes cold starts and autoscaling faster. For low-latency classification, routing, or short-form generation workloads where first-token latency matters, Mixtral's smaller footprint can win despite lower benchmark ceiling. DeepSeek V3 is the choice for quality-sensitive tasks: complex code generation, multi-step reasoning, and long-document summarization where the larger total capacity produces measurably better outputs. At the cheapest providers, you get significantly higher reasoning quality at comparable or lower cost per token. Pick Mixtral 8x22B Instruct if you need mature provider tooling, low first-token latency, or consistent autoscaling. Pick DeepSeek V3 if output quality is the priority and you can tolerate more provider variability in exchange for a higher reasoning ceiling.
Related comparisons
Full model details