Leaderboard · cheapest-output

Cheapest LLM output pricing

May 27, 2026

Output tokens dominate cost for generation-heavy use cases. This leaderboard ranks models by the lowest output token price across all providers.

Family

License

Size

Quant

Region

Stale entries

14+ days old

These models haven't had a confirmed pricing scrape in the last 14 days.

#	Model	Family	Cheapest output	Providers	Last updated
01	Llama 3.1 8B Instructstale	llama	$0.0500/M output	1	May 27
02	Llama 3.1 70B Instructstale	llama	$0.40/M output	1	May 27
03	Qwen 2.5 72B Instructstale	qwen	$0.40/M output	1	May 27
04	DeepSeek R1 Distill Llama 70Bstale	deepseek	$0.80/M output	1	May 27
05	DeepSeek V3stale	deepseek	$0.89/M output	1	May 27

Related leaderboards

Cheapest LLM Input Price Cheapest Blended LLM Cost Fastest LLM Time to First Token Highest LLM Throughput (tok/s)Longest LLM Context Window Best LLM MMLU Score Best LLM HumanEval Score Most Available LLM Providers

Frequently asked questions

What does this leaderboard measure?

This leaderboard ranks models by their output token price in USD per million tokens, based on daily scrapes of public provider pricing pages. Output tokens are everything the model generates in response — the completion text, any tool-call payloads, and chain-of-thought tokens if the model emits them. The cheapest price shown reflects the lowest rate available across all providers currently hosting the model. The last-verified date is displayed next to every row; treat rows older than a few days as approximate until the next scrape confirms them.

Why is output cost the dominant cost for most chat workloads?

Most chat and agent workloads produce more output volume than their prompt volume might suggest. A user sends a one-sentence message; the model may return several paragraphs. At a 5:1 output-to-input ratio, the output price per token matters 5× more than the input price for that exchange. Compound this across thousands of requests and the output line dominates the bill. Even at input-heavy workloads like RAG, where long context chunks inflate prompt size, output pricing usually remains the more expensive line item once you factor in the response length distribution.

Are these prices for FP16, FP8, or INT4 quantized inference?

Providers don't always disclose the quantization level behind a given price point, so the leaderboard captures the price as advertised and doesn't annotate quantization unless the provider explicitly states it. In practice, most sub-$0.50/M output prices reflect quantized (FP8 or INT4) serving. Full-precision FP16 serving at comparable throughput costs more and is rarely offered at the same price. If quantization fidelity matters for your application, check the provider's documentation or run a quality benchmark on your specific prompts before committing to the cheapest option.

Can I expect these prices to hold for high-volume usage?

At moderate volumes, advertised per-token rates typically hold. At very high volumes — tens of billions of tokens per month — most providers will negotiate enterprise contracts that differ from the public page. Additionally, some providers implement rate limits that may force you to use a more expensive tier. Prices can also change silently; the leaderboard's daily scrape and 14-day freshness window are designed to catch these changes quickly, but there's always a potential lag between a provider updating their page and the scraper running. See the methodology note in the main index for how we handle large price swings.