Leaderboard · cheapest-input

Cheapest LLM input pricing

May 27, 2026

Find the most cost-effective models for prompt-heavy workloads. Ranked by the lowest input token price across all providers, updated nightly from live scrapes.

Family

License

Size

Quant

Region

Stale entries

14+ days old

These models haven't had a confirmed pricing scrape in the last 14 days.

#	Model	Family	Cheapest input	Providers	Last updated
01	Llama 3.1 8B Instructstale	llama	$0.0200/M input	1	May 27
02	DeepSeek V3stale	deepseek	$0.32/M input	1	May 27
03	Qwen 2.5 72B Instructstale	qwen	$0.36/M input	1	May 27
04	Llama 3.1 70B Instructstale	llama	$0.40/M input	1	May 27
05	DeepSeek R1 Distill Llama 70Bstale	deepseek	$0.70/M input	1	May 27

Related leaderboards

Cheapest LLM Output Price Cheapest Blended LLM Cost Fastest LLM Time to First Token Highest LLM Throughput (tok/s)Longest LLM Context Window Best LLM MMLU Score Best LLM HumanEval Score Most Available LLM Providers

Frequently asked questions

What does the "cheapest input" leaderboard rank?

This leaderboard ranks models by their input token price in USD per million tokens, sourced from public provider pricing pages and updated daily. Input tokens are the characters you send to the model — your prompt, system instructions, and any injected context. The ranking reflects the lowest available price across all providers hosting the same model, so a single model can appear at different price points depending on which provider you use. The last-verified snapshot date is shown next to each row.

Why is input pricing usually lower than output pricing?

Providers price input tokens lower because processing a prompt is computationally cheaper than sampling the response. During the prefill phase, the GPU can process your entire prompt in parallel, which is fast and memory-efficient. Output token generation is sequential — each token depends on the previous one — so the same hardware produces fewer output tokens per second. That asymmetry in compute cost is reflected in the pricing split. Most providers charge 2–5× more per output token than per input token, though models optimized for long-context retrieval sometimes narrow that gap.

Does the cheapest input price always mean the lowest total cost?

Not for most workloads. If your output-to-input ratio is high — for example, you're generating long summaries or code from short prompts — the output rate dominates your bill regardless of how cheap the input is. The cheapest-blended leaderboard uses a 100M input + 10M output tokens/month workload to give a more realistic cost picture. Always run your actual ratio through the calculator before making a decision based on input price alone. Some providers also offer tiered pricing or cached-token discounts that can change the math significantly at scale.

How often is this leaderboard updated?

Prices are scraped daily from each provider's public pricing page. Because providers update prices silently and without announcements, the scraper runs on a 24-hour cycle and each row shows the date of the last successful scrape. A price only advances to the leaderboard if the parser confidence is ≥ 0.8 and the value hasn't shifted more than 50% from the prior snapshot without a confirming second scrape. If a provider's page hasn't been successfully scraped within the 14-day freshness window, that provider's row is removed from the active ranking and flagged as stale.