Mixtral 8x22B Instruct — pricing, providers, and benchmarks

Parameters
141B
Context window
66k tokens
License
apache-2.0
Released
2024-04-17

Mistral's Mixtral 8x22B Instruct uses a sparse mixture-of-experts architecture (8 experts, 22B params each, 39B active per token) to deliver near-70B-class performance at substantially lower inference cost than dense models of similar quality. Released April 2024, it's been remarkably durable in production deployments — particularly for European companies subject to EU AI Act sovereignty preferences, given Mistral's France-based provenance and Apache 2.0 license. Pricing in 2026 sits in the $0.60–$0.90 per 1M token range across providers. Context window is 64K — smaller than the Llama/Qwen 128K tier — which matters for long-document RAG but not for typical chat. Strong choice for cost-sensitive English/French/German workloads.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

ProviderInput / 1MOutput / 1MMonthly costContext
DeepInfra$0.6000$0.6500$66.5066k
Fireworks AI$0.6500$0.6500$71.5066k
OpenRouter$0.6500$0.6500$71.5066k
Together AI$1.2000$1.2000$132.0065k

Frequently asked questions

How much does it cost to run Mixtral 8x22B Instruct for 100M tokens?

Running Mixtral 8x22B Instruct with 100M input and 10M output tokens per month costs approximately $66.50 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Mixtral 8x22B Instruct?

DeepInfra currently offers Mixtral 8x22B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Mixtral 8x22B Instruct support?

Mixtral 8x22B Instruct supports a context window of 65,536 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.