Local LLM leaderboard
Every open-weight model ranked across the metrics that matter in production — intelligence, blended price, output speed, latency, end-to-end response time, context window, and the hardware needed to run it locally. Sort by any column. Scraped nightly, no estimates on pricing.Last updated May 2026.
All models
5 modelsLoading leaderboard…
Intelligence Index and hardware tier are derived estimates — see the methodology docs. Hardware VRAM is an estimate at the best available quantization.
Browse by single dimension
9 surfacesCheapest LLM Input Price
Find the most cost-effective models for prompt-heavy workloads. Ranked by the lowest input token price across all providers, updated nightly from live scrapes.
View rankings →Cheapest LLM Output Price
Output tokens dominate cost for generation-heavy use cases. This leaderboard ranks models by the lowest output token price across all providers.
View rankings →Cheapest Blended LLM Cost
Blended cost for a workload of 100M input tokens and 10M output tokens per month — the most realistic cost-of-ownership comparison for most production applications.
View rankings →Fastest LLM Time to First Token
Time to first token (TTFT) determines how quickly a response starts streaming to your users. Lower is better. Values are the best-published TTFT per model across providers.
View rankings →Highest LLM Throughput (tok/s)
Throughput (tokens per second) determines how fast a model generates output. Critical for batch workloads and applications where generation speed matters.
View rankings →Longest LLM Context Window
Context window determines how much text a model can process in a single call — essential for document summarisation, long-form coding, and RAG pipelines.
View rankings →Best LLM MMLU Score
MMLU (Massive Multitask Language Understanding) measures reasoning and knowledge across 57 subjects. Higher is better. Scores sourced from published model cards and papers.
View rankings →Best LLM HumanEval Score
HumanEval measures code-generation ability: the percentage of coding problems solved correctly (pass@1). Higher is better. Sourced from published evals.
View rankings →Most Available LLM Providers
Provider count indicates ecosystem breadth and supply-side competition. Models available on more providers are less likely to suffer downtime or rate-limit bottlenecks.
View rankings →