0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Mistral Large 2
vs
Mixtral 8x22b Instruct
Mistral Large 2A

Mistral Large 2

Cheapest provider
$/1M input
$/1M output
Mixtral 8x22b InstructB

Mixtral 8x22b Instruct

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecMistral Large 2Mixtral 8x22b Instruct
Parameters
Context window
License
Released
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider
Mistral Large 2
$0.00 /mo
Mixtral 8x22b Instruct
$0.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$0.00 · $0.00
5M in · 2M out$0.00 · $0.00
20M in · 10M out$0.00 · $0.00
100M in · 60M out$0.00 · $0.00

Capability vs price

scatter
// scatter: benchmark × $/1M out
Calculate cost for your workload

Compare total monthly cost across providers for Mistral Large 2 and Mixtral 8x22b Instruct using your own input/output token mix.

Open workload calculator →
Editor's take
Mistral Large 2 is a 123B dense transformer; Mixtral 8x22B Instruct is a sparse mixture-of-experts model activating ~39B of its 141B total parameters per token. That architectural gap drives most of the pricing and latency divergence you'll see across providers. Mixtral 8x22B typically prices 20–35% lower per million output tokens than Mistral Large 2, because MoE inference requires fewer active FLOPs per step. Latency flips the advantage. Mistral Large 2's dense forward pass is more predictable under load: time-to-first-token stays tighter at high concurrency. Mixtral 8x22B's expert routing adds variable overhead, which shows up as tail-latency spikes on shared GPU clusters. **Where [Mixtral 8x22B Instruct](/models/mistralai--mixtral-8x22b-instruct) wins:** High-volume batch workloads — document summarization pipelines, offline classification, bulk translation — where you're cost-sensitive and latency SLAs are loose (>5 s P99 acceptable). The lower active-parameter count also means faster cold-start on self-hosted deployments. **Where [Mistral Large 2](/models/mistralai--mistral-large-2) wins:** Interactive applications, multi-turn agents, and structured-output tasks where P95 latency matters. Its dense architecture also shows stronger instruction-following consistency on long-context inputs (≥32 K tokens), making it the safer choice for agentic loops that parse or generate tool calls. Pick Mistral Large 2 if you need predictable latency and reliable instruction following in production agents. Pick Mixtral 8x22B Instruct if you're optimizing cost on throughput-heavy offline pipelines and can tolerate higher tail latency.
Related comparisons
Full model details