0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Mixtral 8x22B Instruct
vs
WizardLM-2 8x22B
Mixtral 8x22B InstructA

Mixtral 8x22B Instruct

141B params · 66K context · apache-2.0

Cheapest providerdeepinfra
$/1M input$600000.00
$/1M output$650000.00
WizardLM-2 8x22BB

WizardLM-2 8x22B

141B params · 66K context · wizardlm-2-community

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecMixtral 8x22B InstructWizardLM-2 8x22B
Parameters141B141B
Context window66K tokens66K tokens
Licenseapache-2.0wizardlm-2-community
Released2024-04-172024-04-15
Cheapest provider
Providerdeepinfra
Input / 1M tokens$600000.00
Output / 1M tokens$650000.00

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider
Mixtral 8x22B Instruct
$4300000.00 /mo
WizardLM-2 8x22B
$0.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$762500.00 · $0.00
5M in · 2M out$4300000.00 · $0.00
20M in · 10M out$18500000.00 · $0.00
100M in · 60M out$99000000.00 · $0.00

Capability vs price

scatter
// scatter: benchmark × $/1M out
Calculate cost for your workload

Compare total monthly cost across providers for Mixtral 8x22B Instruct and WizardLM-2 8x22B using your own input/output token mix.

Open workload calculator →
Editor's take
[Mixtral 8x22B Instruct](/models/mistralai--mixtral-8x22b-instruct) and [WizardLM-2 8x22B](/models/microsoft/wizardlm-2-8x22b) share identical base architecture — both are MoE models derived from the same 141B/39B-active Mixtral 8x22B weights. The difference is in post-training: WizardLM-2 8x22B was fine-tuned by Microsoft's Evol-Instruct pipeline, which emphasizes complex instruction following, step-by-step reasoning, and chat alignment. Pricing between them is nearly identical across providers since the base compute requirements are the same. WizardLM-2 8x22B's Evol-Instruct training makes it measurably stronger on complex, multi-constraint instructions and extended reasoning chains. Independent evals consistently show it scoring higher on instruction-following benchmarks than the base Mixtral instruct variant, particularly on tasks requiring multiple nested conditions or careful constraint adherence. **Where [Mixtral 8x22B Instruct](/models/mistralai--mixtral-8x22b-instruct) wins:** Broad general-purpose batch workloads where the post-training difference doesn't surface — translation, summarization, classification, and content generation pipelines. Mixtral 8x22B Instruct also has wider provider support, giving you more options for geographic routing and spot pricing. **Where WizardLM-2 8x22B wins:** Complex instruction-following tasks, multi-step reasoning, and chat applications where users issue nuanced, multi-constraint prompts. The Evol-Instruct alignment produces noticeably cleaner outputs on hard instruction sets. Pick Mixtral 8x22B Instruct for simple, high-volume inference with maximum provider flexibility. Pick WizardLM-2 8x22B when your prompts are complex and instruction adherence quality directly affects output usability — at no meaningful cost premium.
Related comparisons
Full model details