Mixtral 8x22B Instruct — pricing, providers, and benchmarks

Parameters

141B

Context window

66k tokens

License

apache-2.0

Released

2024-04-17

Mistral's Mixtral 8x22B Instruct uses a sparse mixture-of-experts architecture (8 experts, 22B params each, 39B active per token) to deliver near-70B-class performance at substantially lower inference cost than dense models of similar quality. Released April 2024, it's been remarkably durable in production deployments — particularly for European companies subject to EU AI Act sovereignty preferences, given Mistral's France-based provenance and Apache 2.0 license. Pricing in 2026 sits in the $0.60–$0.90 per 1M token range across providers. Context window is 64K — smaller than the Llama/Qwen 128K tier — which matters for long-document RAG but not for typical chat. Strong choice for cost-sensitive English/French/German workloads.

Provider pricing

Sorted by total monthly cost for 100M input + 10M output tokens.

Provider	Input / 1M	Output / 1M	Monthly cost	Context
DeepInfra	$0.6000	$0.6500	$66.50	66k
Fireworks AI	$0.6500	$0.6500	$71.50	66k
OpenRouter	$0.6500	$0.6500	$71.50	66k
Together AI	$1.2000	$1.2000	$132.00	65k

Calculate cost for your exact workload →

Frequently asked questions

How much does it cost to run Mixtral 8x22B Instruct for 100M tokens?▾

Running Mixtral 8x22B Instruct with 100M input and 10M output tokens per month costs approximately $66.50 on DeepInfra, the cheapest available provider as of the latest pricing data. Costs vary significantly depending on your input/output ratio and whether you use prompt caching.

What is the cheapest provider for Mixtral 8x22B Instruct?▾

DeepInfra currently offers Mixtral 8x22B Instruct at the lowest total cost for a standard workload. Prices change frequently — check the table above for the latest data.

What context window does Mixtral 8x22B Instruct support?▾

Mixtral 8x22B Instruct supports a context window of 65,536 tokens. Individual providers may cap this lower — see the pricing table for per-provider context limits.