0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Llama 3.1 405B Instruct
vs
Mistral Large 2
Llama 3.1 405B InstructA

Llama 3.1 405B Instruct

405B params · 131K context · llama-3

Cheapest providerdeepinfra
$/1M input$2700000.00
$/1M output$8000000.00
Mistral Large 2B

Mistral Large 2

123B params · 131K context · mistral-research

Cheapest provideropenrouter
$/1M input$1800000.00
$/1M output$5400000.00
Specs and cheapest providers
SpecLlama 3.1 405B InstructMistral Large 2
Parameters405B123B
Context window131K tokens131K tokens
Licensellama-3mistral-research
Released2024-07-232024-07-24
Cheapest provider
Providerdeepinfraopenrouter
Input / 1M tokens$2700000.00$1800000.00🏆
Output / 1M tokens$8000000.00$5400000.00🏆

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider
Llama 3.1 405B Instruct
$29500000.00 /mo
Mistral Large 2
$19800000.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$4700000.00 · $3150000.00
5M in · 2M out$29500000.00 · $19800000.00
20M in · 10M out$134000000.00 · $90000000.00
100M in · 60M out$750000000.00 · $504000000.00

Capability vs price

scatter
// scatter: benchmark × $/1M out
Calculate cost for your workload

Compare total monthly cost across providers for Llama 3.1 405B Instruct and Mistral Large 2 using your own input/output token mix.

Open workload calculator →
Editor's take
[Llama 3.1 405B Instruct](/models/meta--llama-3.1-405b-instruct) is a 405-billion-parameter dense transformer; [Mistral Large 2](/models/mistralai--mistral-large-2) sits at roughly 123B parameters. That size gap drives most of the cost story: 405B typically prices at $2–5/M input tokens across providers, while Mistral Large 2 runs $2–3/M — a meaningful difference that widens further at scale. On throughput, 405B delivers roughly 20–35 tok/s per request on A100 clusters; Mistral Large 2 reaches 40–70 tok/s, nearly double, because fewer parameters fit a tighter GPU footprint. **Where 405B wins:** Long-context reasoning tasks — multi-document synthesis, complex code generation spanning thousands of lines, or multi-step agentic chains — benefit from the raw parameter depth. Teams running nightly batch jobs where latency matters less than answer quality tend to see measurable gains here. **Where Mistral Large 2 wins:** Latency-sensitive APIs (chat, autocomplete, retrieval-augmented generation with short contexts) favor the smaller model. Sub-second p50 latency is achievable on Mistral Large 2 at practical concurrency; 405B often pushes p50 above 2–3 seconds under load. Mistral Large 2 also offers stronger multilingual support across 80+ languages, relevant for European or APAC user bases. **Bottom line:** Pick Llama 3.1 405B Instruct if your workload is batch-oriented, quality-critical, and tolerates higher per-token spend. Pick Mistral Large 2 if you need real-time response latency, multilingual coverage, or want to cut inference cost by 30–50% with acceptable quality trade-offs.
Related comparisons
Full model details