Use-case preset

News article summarization cost calculator

Article → 3-sentence summary; high-volume batch workload.

Each request sends a news article (600–6000 words, typically 1k–5k tokens) and asks for a 3-sentence summary. Input tokens dominate at 95%; output is 60–90 tokens. The 8k context window covers even long feature pieces without splitting. This is a pure batch workload — thousands of articles processed overnight or in queue — so latency is irrelevant and you optimise exclusively on cost-per-token.

At these ratios, input price is the budget driver. An 8B-class model costs 3–10× less per input token than a 70B model and produces summaries that are indistinguishable to most downstream consumers. Cache hit rate is low (0–10%) because every article is a new document; a stable system-prompt preamble is the only cacheable prefix. The main quality risk is hallucinated facts in the summary — verify with a spot-check pass before shipping to production.

Recommended models

meta/llama-3.1-8b-instruct

Low input cost dominates at 95% input ratio; 8B quality is sufficient for factual summarisation.

google/gemma-2-9b-it

Competitive summarisation quality at 8B–9B cost; handles 8k articles cleanly.

alibaba/qwen-3-8b-instruct

Strong factual summarisation at 8B scale; low per-token price suits high-volume batch.

mistralai/mistral-7b-instruct-v0.3

Economical 7B option with reliable 3-sentence summary instruction adherence.

ibm/granite-3.1-8b-instruct

Enterprise-grade 8B with consistent extractive summarisation; cost-efficient for batch.