0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

DeepSeek V3.2
vs
Mixtral 8x22B Instruct
DeepSeek V3.2A

DeepSeek V3.2

671B params · 131K context · deepseek

Cheapest providertogether-ai
$/1M input$270000.00
$/1M output$1100000.00
Mixtral 8x22B InstructB

Mixtral 8x22B Instruct

141B params · 66K context · apache-2.0

Cheapest providerdeepinfra
$/1M input$600000.00
$/1M output$650000.00
Specs and cheapest providers
SpecDeepSeek V3.2Mixtral 8x22B Instruct
Parameters671B141B
Context window131K tokens🏆66K tokens
Licensedeepseekapache-2.0
Released2025-05-072024-04-17
Cheapest provider
Providertogether-aideepinfra
Input / 1M tokens$270000.00🏆$600000.00
Output / 1M tokens$1100000.00$650000.00🏆

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider
DeepSeek V3.2
$3550000.00 /mo
Mixtral 8x22B Instruct
$4300000.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$545000.00 · $762500.00
5M in · 2M out$3550000.00 · $4300000.00
20M in · 10M out$16400000.00 · $18500000.00
100M in · 60M out$93000000.00 · $99000000.00

Capability vs price

scatter
// scatter: benchmark × $/1M out
Calculate cost for your workload

Compare total monthly cost across providers for DeepSeek V3.2 and Mixtral 8x22B Instruct using your own input/output token mix.

Open workload calculator →
Editor's take
Both models are sparse Mixture-of-Experts, but the generational gap is wide. [Mixtral 8x22B Instruct](/models/mistralai--mixtral-8x22b-instruct) uses 8 experts with ~39B active parameters from a 141B total pool — a 2023-era architecture that remains well-supported across providers. [DeepSeek V3.2](/models/deepseek--deepseek-v3.2) deploys 256 fine-grained experts with ~37B active from 671B total, plus architectural innovations like Multi-Head Latent Attention that improve inference efficiency at scale. The benchmark delta is substantial: DeepSeek V3.2 outperforms Mixtral 8x22B by 15–20 points on MMLU, and the gap widens further on MATH and complex coding benchmarks. This is not a marginal quality difference — V3.2 operates at a meaningfully higher reasoning tier, competitive with frontier dense models, while Mixtral 8x22B occupies the 2023 performance bracket. Pricing somewhat narrows the practical gap: Mixtral 8x22B has been on the market long enough that most providers offer it at under $0.50/M input tokens with mature, stable infrastructure. DeepSeek V3.2 pricing varies more by provider — cheapest tiers run comparably low, but not every provider has deployed optimized MoE kernels for V3.2's finer-grained routing. Mixtral 8x22B Instruct remains useful for high-volume classification or extraction pipelines where its well-understood failure modes and stable provider integrations reduce operational risk. Teams with existing Mixtral tooling may prefer to defer migration. Pick Mixtral 8x22B Instruct if operational stability, mature provider tooling, and predictable costs on existing infrastructure outweigh benchmark quality. Pick DeepSeek V3.2 if output quality on reasoning and code tasks is a requirement and your provider supports it with optimized kernels.
Related comparisons
Full model details