LLM leaderboards
Objective rankings for every dimension that matters in production: input/output token pricing, blended monthly cost, time-to-first-token, throughput, context window, and benchmarks. Scraped nightly — no estimates.Last updated May 2026.
Browse by dimension
9 surfacesCheapest LLM Input Price
Find the most cost-effective models for prompt-heavy workloads. Ranked by the lowest input token price across all providers, updated nightly from live scrapes.
View rankings →Cheapest LLM Output Price
Output tokens dominate cost for generation-heavy use cases. This leaderboard ranks models by the lowest output token price across all providers.
View rankings →Cheapest Blended LLM Cost
Blended cost for a workload of 100M input tokens and 10M output tokens per month — the most realistic cost-of-ownership comparison for most production applications.
View rankings →Fastest LLM Time to First Token
Time to first token (TTFT) determines how quickly a response starts streaming to your users. Lower is better. Values are the best-published TTFT per model across providers.
View rankings →Highest LLM Throughput (tok/s)
Throughput (tokens per second) determines how fast a model generates output. Critical for batch workloads and applications where generation speed matters.
View rankings →Longest LLM Context Window
Context window determines how much text a model can process in a single call — essential for document summarisation, long-form coding, and RAG pipelines.
View rankings →Best LLM MMLU Score
MMLU (Massive Multitask Language Understanding) measures reasoning and knowledge across 57 subjects. Higher is better. Scores sourced from published model cards and papers.
View rankings →Best LLM HumanEval Score
HumanEval measures code-generation ability: the percentage of coding problems solved correctly (pass@1). Higher is better. Sourced from published evals.
View rankings →Most Available LLM Providers
Provider count indicates ecosystem breadth and supply-side competition. Models available on more providers are less likely to suffer downtime or rate-limit bottlenecks.
View rankings →Best overall — cheapest blended cost
Ranked by total monthly cost for a 100M input / 10M output workload.
| # | Model | Family | Blended monthly cost | Providers | Last updated |
|---|---|---|---|---|---|
| 01 | Gemma 2 9B IT | gemma | $0.06/mo | 3 | May 16 |
| 02 | Llama 3.1 8B Instruct | llama | $0.06/mo | 4 | May 16 |
| 03 | Mistral Small 3 | mistral | $0.13/mo | 1 | May 16 |
| 04 | Qwen 2.5 Coder 32B Instruct | qwen | $0.14/mo | 1 | May 16 |
| 05 | Qwen 3 32B Instruct | qwen | $0.20/mo | 2 | May 16 |
| 06 | Qwen 2.5 72B Instruct | qwen | $0.21/mo | 3 | May 16 |
| 07 | Mixtral 8x7B Instruct | mixtral | $0.22/mo | 2 | May 16 |
| 08 | Qwen 3 72B Instruct | qwen | $0.25/mo | 4 | May 17 |
| 09 | Llama 3.3 70B Instruct | llama | $0.26/mo | 5 | May 17 |
| 10 | Llama 3.1 70B Instruct | llama | $0.26/mo | 3 | May 16 |
| 11 | DeepSeek V3 | deepseek | $0.28/mo | 3 | May 16 |
| 12 | DeepSeek R1 Distill Llama 70B | deepseek | $0.34/mo | 3 | May 16 |
| 13 | DeepSeek V3.2 | deepseek | $0.38/mo | 1 | May 17 |
| 14 | DeepSeek R1 | deepseek | $0.60/mo | 3 | May 16 |
| 15 | Mixtral 8x22B Instruct | mixtral | $0.67/mo | 4 | May 17 |
| 16 | Mistral Large 2 | mistral | $2.34/mo | 1 | May 16 |
| 17 | Llama 3.1 405B Instruct | llama | $3.05/mo | 4 | May 17 |
| 18 | Command R+ | command-r | $3.50/mo | 1 | May 16 |