Qwen 3 72B Instruct vs WizardLM-2 8x22B (2026) — pricing, benchmarks, cheapest providers

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Qwen 3 72b Instruct

Wizardlm 2 8x22b

Qwen 3 72b InstructA

Qwen 3 72b Instruct

Cheapest provider—

$/1M input—

$/1M output—

Wizardlm 2 8x22bB

Wizardlm 2 8x22b

Cheapest provider—

$/1M input—

$/1M output—

Specs and cheapest providers

Spec	Qwen 3 72b Instruct	Wizardlm 2 8x22b
Parameters	—	—
Context window	—	—
License	—	—
Released	—	—
Cheapest provider
Provider	—	—
Input / 1M tokens	—	—
Output / 1M tokens	—	—

#5 Qwen 3 72B Instruct in cheapest output #10 Qwen 3 72B Instruct in fastest TTFT #10 Qwen 3 72B Instruct in highest throughput

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider

Qwen 3 72b Instruct

$0.00 /mo

Wizardlm 2 8x22b

$0.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$0.00 · $0.00

5M in · 2M out$0.00 · $0.00

20M in · 10M out$0.00 · $0.00

100M in · 60M out$0.00 · $0.00

Capability vs price

scatter

// scatter: benchmark × $/1M out

Calculate cost for your workload

Compare total monthly cost across providers for Qwen 3 72b Instruct and Wizardlm 2 8x22b using your own input/output token mix.

Open workload calculator →

Editor's take

[WizardLM-2 8x22B](/models/microsoft--wizardlm-2-8x22b) is a sparse mixture-of-experts (MoE) model with ~141B total parameters but only ~39B active per forward pass. [Qwen 3 72B Instruct](/models/alibaba--qwen-3-72b-instruct) is dense at 72B active parameters. This architectural difference has direct pricing implications: MoE models often cost more per request at providers that price on total memory allocation, not active FLOPS — WizardLM-2 8x22B typically runs $0.50–0.90/M tokens versus Qwen 3 72B's $0.40–0.70/M. On reasoning benchmarks, Qwen 3 72B is more recent and generally scores higher on MMLU (86–88%) and math (GSM8K ~90%) than WizardLM-2 8x22B. WizardLM-2 was Microsoft's instruction-tuning research model; its strength is complex instruction-following and alignment with human preference tasks like creative writing and nuanced Q&A, where the MoE architecture provides specialization breadth. WizardLM-2 8x22B performs well on open-ended creative generation and long-form dialogue, areas where the MoE routing can activate specialized expert pathways. It's also a reasonable choice where Microsoft's model provenance matters for enterprise procurement. Qwen 3 72B Instruct is the better pick for math-heavy, code-adjacent, or structured reasoning pipelines where pure benchmark performance drives model selection. Its cost-per-token is lower than WizardLM-2 8x22B at most providers, and it benefits from a more active maintenance track. Pick Qwen 3 72B Instruct for reasoning and structured tasks. Pick WizardLM-2 8x22B if complex instruction alignment or creative generation quality justifies the higher serving cost.

Related comparisons

Qwen 3 72b Instruct vs Deepseek V3.2 →Qwen 3 72b Instruct vs Deepseek R1 →Qwen 3 72b Instruct vs Llama 3.1 405b Instruct →Wizardlm 2 8x22b vs Mixtral 8x22b Instruct →

Full model details

All providers for Qwen 3 72b Instruct →All providers for Wizardlm 2 8x22b →