Find the most cost-effective models for prompt-heavy workloads. Ranked by the lowest input token price across all providers, updated nightly from live scrapes.
| # | Model | Family | Cheapest input | Providers | Last updated |
|---|---|---|---|---|---|
| 04 | Qwen 2.5 Coder 32B Instruct | qwen | $0.0012/M input | 1 | May 16 |
| 05 | Qwen 3 32B Instruct |
| qwen |
| $0.0014/M input |
| 2 |
| May 16 |
| 06 | Qwen 2.5 72B Instruct | qwen | $0.0018/M input | 3 | May 16 |
| 07 | DeepSeek V3 | deepseek | $0.0020/M input | 3 | May 16 |
| 08 | Mixtral 8x7B Instruct | mixtral | $0.0020/M input | 2 | May 16 |
| 09 | Llama 3.3 70B Instruct | llama | $0.0022/M input | 5 | May 17 |
| 10 | Llama 3.1 70B Instruct | llama | $0.0022/M input | 3 | May 16 |
| 11 | Qwen 3 72B Instruct | qwen | $0.0022/M input | 4 | May 17 |
| 12 | DeepSeek V3.2 | deepseek | $0.0027/M input | 1 | May 17 |
| 13 | DeepSeek R1 Distill Llama 70B | deepseek | $0.0028/M input | 3 | May 16 |
| 14 | DeepSeek R1 | deepseek | $0.0040/M input | 3 | May 16 |
| 15 | Mixtral 8x22B Instruct | mixtral | $0.0060/M input | 4 | May 17 |
| 16 | Mistral Large 2 | mistral | $0.0180/M input | 1 | May 16 |
| 17 | Command R+ | command-r | $0.0250/M input | 1 | May 16 |
| 18 | Llama 3.1 405B Instruct | llama | $0.0270/M input | 4 | May 17 |
This leaderboard ranks models by their input token price in USD per million tokens, sourced from public provider pricing pages and updated daily. Input tokens are the characters you send to the model — your prompt, system instructions, and any injected context. The ranking reflects the lowest available price across all providers hosting the same model, so a single model can appear at different price points depending on which provider you use. The last-verified snapshot date is shown next to each row.
Providers price input tokens lower because processing a prompt is computationally cheaper than sampling the response. During the prefill phase, the GPU can process your entire prompt in parallel, which is fast and memory-efficient. Output token generation is sequential — each token depends on the previous one — so the same hardware produces fewer output tokens per second. That asymmetry in compute cost is reflected in the pricing split. Most providers charge 2–5× more per output token than per input token, though models optimized for long-context retrieval sometimes narrow that gap.
Not for most workloads. If your output-to-input ratio is high — for example, you're generating long summaries or code from short prompts — the output rate dominates your bill regardless of how cheap the input is. The cheapest-blended leaderboard uses a 100M input + 10M output tokens/month workload to give a more realistic cost picture. Always run your actual ratio through the calculator before making a decision based on input price alone. Some providers also offer tiered pricing or cached-token discounts that can change the math significantly at scale.
Prices are scraped daily from each provider's public pricing page. Because providers update prices silently and without announcements, the scraper runs on a 24-hour cycle and each row shows the date of the last successful scrape. A price only advances to the leaderboard if the parser confidence is ≥ 0.8 and the value hasn't shifted more than 50% from the prior snapshot without a confirming second scrape. If a provider's page hasn't been successfully scraped within the 14-day freshness window, that provider's row is removed from the active ranking and flagged as stale.