Output tokens dominate cost for generation-heavy use cases. This leaderboard ranks models by the lowest output token price across all providers.
| # | Model | Family | Cheapest output | Providers | Last updated |
|---|---|---|---|---|---|
| 04 | Qwen 2.5 Coder 32B Instruct | qwen | $0.0025/M output | 1 | May 16 |
| 05 |
| qwen |
| $0.0029/M output |
| 4 |
| May 17 |
| 06 | Mistral Small 3 | mistral | $0.0030/M output | 1 | May 16 |
| 07 | Qwen 2.5 72B Instruct | qwen | $0.0035/M output | 3 | May 16 |
| 08 | Llama 3.3 70B Instruct | llama | $0.0040/M output | 5 | May 17 |
| 09 | Llama 3.1 70B Instruct | llama | $0.0040/M output | 3 | May 16 |
| 10 | Qwen 3 32B Instruct | qwen | $0.0055/M output | 2 | May 16 |
| 11 | DeepSeek R1 Distill Llama 70B | deepseek | $0.0055/M output | 3 | May 16 |
| 12 | Mixtral 8x22B Instruct | mixtral | $0.0065/M output | 4 | May 17 |
| 13 | DeepSeek V3 | deepseek | $0.0085/M output | 3 | May 16 |
| 14 | DeepSeek V3.2 | deepseek | $0.0110/M output | 1 | May 17 |
| 15 | DeepSeek R1 | deepseek | $0.0200/M output | 3 | May 16 |
| 16 | Llama 3.1 405B Instruct | llama | $0.0350/M output | 4 | May 17 |
| 17 | Mistral Large 2 | mistral | $0.0540/M output | 1 | May 16 |
| 18 | Command R+ | command-r | $0.1000/M output | 1 | May 16 |
This leaderboard ranks models by their output token price in USD per million tokens, based on daily scrapes of public provider pricing pages. Output tokens are everything the model generates in response — the completion text, any tool-call payloads, and chain-of-thought tokens if the model emits them. The cheapest price shown reflects the lowest rate available across all providers currently hosting the model. The last-verified date is displayed next to every row; treat rows older than a few days as approximate until the next scrape confirms them.
Most chat and agent workloads produce more output volume than their prompt volume might suggest. A user sends a one-sentence message; the model may return several paragraphs. At a 5:1 output-to-input ratio, the output price per token matters 5× more than the input price for that exchange. Compound this across thousands of requests and the output line dominates the bill. Even at input-heavy workloads like RAG, where long context chunks inflate prompt size, output pricing usually remains the more expensive line item once you factor in the response length distribution.
Providers don't always disclose the quantization level behind a given price point, so the leaderboard captures the price as advertised and doesn't annotate quantization unless the provider explicitly states it. In practice, most sub-$0.50/M output prices reflect quantized (FP8 or INT4) serving. Full-precision FP16 serving at comparable throughput costs more and is rarely offered at the same price. If quantization fidelity matters for your application, check the provider's documentation or run a quality benchmark on your specific prompts before committing to the cheapest option.
At moderate volumes, advertised per-token rates typically hold. At very high volumes — tens of billions of tokens per month — most providers will negotiate enterprise contracts that differ from the public page. Additionally, some providers implement rate limits that may force you to use a more expensive tier. Prices can also change silently; the leaderboard's daily scrape and 14-day freshness window are designed to catch these changes quickly, but there's always a potential lag between a provider updating their page and the scraper running. See the methodology note in the main index for how we handle large price swings.