Use-case preset
News article summarization cost calculator
Article → 3-sentence summary; high-volume batch workload.
Each request sends a news article (600–6000 words, typically 1k–5k tokens) and asks for a 3-sentence summary. Input tokens dominate at 95%; output is 60–90 tokens. The 8k context window covers even long feature pieces without splitting. This is a pure batch workload — thousands of articles processed overnight or in queue — so latency is irrelevant and you optimise exclusively on cost-per-token.
At these ratios, input price is the budget driver. An 8B-class model costs 3–10× less per input token than a 70B model and produces summaries that are indistinguishable to most downstream consumers. Cache hit rate is low (0–10%) because every article is a new document; a stable system-prompt preamble is the only cacheable prefix. The main quality risk is hallucinated facts in the summary — verify with a spot-check pass before shipping to production.