Use-case preset

Multi-document summarization cost calculator

Synthesize a summary across many docs like board packs or eDiscovery.

This workload feeds multiple source documents — board packs, eDiscovery batches, research dossiers — into a single prompt and returns a unified summary or answer. The 95/5 input/output ratio is extreme: you're paying mostly for reading, not writing. Context is 64k to fit a full set of source documents without chunking.

Batch mode is the right choice because latency is irrelevant; these jobs run overnight or as async tasks. Cached prompt percent is low (5–10%) since each job brings a unique document set. The main cost driver is raw input volume — at 64k context per job, 1,000 jobs/day is 64B input tokens. Prioritize models with the best cost-per-input-token. Output quality matters more than speed, so favor larger models; the output token cost is negligible at 5% of total spend.

Recommended models

meta/llama-3.1-405b-instruct

Best-in-class synthesis quality on complex multi-document tasks; output token cost is negligible at 5% of spend.

deepseek/deepseek-v3

Low input cost at 64k context makes it the budget pick for high-volume batch summarization.

alibaba/qwen-2.5-72b-instruct

Strong long-context coherence with competitive input pricing.

mistralai/mixtral-8x22b-instruct

MoE architecture keeps latency reasonable on large contexts without sacrificing quality.