0 providers50 models

Use-case preset

News article summarization cost calculator

Article → 3-sentence summary; high-volume batch workload.

Each request sends a news article (600–6000 words, typically 1k–5k tokens) and asks for a 3-sentence summary. Input tokens dominate at 95%; output is 60–90 tokens. The 8k context window covers even long feature pieces without splitting. This is a pure batch workload — thousands of articles processed overnight or in queue — so latency is irrelevant and you optimise exclusively on cost-per-token.

At these ratios, input price is the budget driver. An 8B-class model costs 3–10× less per input token than a 70B model and produces summaries that are indistinguishable to most downstream consumers. Cache hit rate is low (0–10%) because every article is a new document; a stable system-prompt preamble is the only cacheable prefix. The main quality risk is hallucinated facts in the summary — verify with a spot-check pass before shipping to production.

Recommended models

Low input cost dominates at 95% input ratio; 8B quality is sufficient for factual summarisation.
Competitive summarisation quality at 8B–9B cost; handles 8k articles cleanly.
Strong factual summarisation at 8B scale; low per-token price suits high-volume batch.
Economical 7B option with reliable 3-sentence summary instruction adherence.
Enterprise-grade 8B with consistent extractive summarisation; cost-efficient for batch.