Compare open-weights LLM inference
across 5 providers
Real pricing data, updated daily. Find the cheapest or fastest provider for your exact workload in seconds — no sign-up required.
Data last verified: May 17, 2026
Workload calculator
Enter your monthly token volumes and constraints. The calculator ranks every provider by cost and flags rate limit or latency mismatches before you commit.
Top models by parameter count
DeepSeek V3
671B params
131,072 ctx
deepseek
DeepSeek V3.2
671B params
131,072 ctx
deepseek
DeepSeek R1
671B params
131,072 ctx
mit
Arctic Instruct
480B params
4,096 ctx
apache-2.0
Hermes 3 Llama 3.1 405B
405B params
131,072 ctx
llama-3
Llama 3.1 405B Instruct
405B params
131,072 ctx
llama-3
Nemotron-4 340B Instruct
340B params
4,096 ctx
nvidia-open-model
DeepSeek Coder V2 Instruct
236B params
131,072 ctx
deepseek
WizardLM-2 8x22B
141B params
65,536 ctx
wizardlm-2-community
Mixtral 8x22B Instruct
141B params
65,536 ctx
apache-2.0