Use-case preset
Academic paper Q&A cost calculator
Q&A over a full paper or research corpus; 64k context window.
The full text of a paper or a small research corpus (20k–60k tokens) sits in context alongside a user question. Output is a 200–600 token answer that cites sections by name. The 90/10 input/output split reflects that the document dominates the prompt; the 64k context window handles even long multi-paper sets without chunking.
Best-effort latency is appropriate: researchers accept 10–20s response times for deep document Q&A. The document is stable across a session, so cached prompt hit rates of 40–60% are achievable — the entire paper loads once and subsequent questions hit the cache. This is the single workload where context-caching economics most dramatically change the bill: with a 5-minute cache TTL and 5 questions per session, effective input cost drops by ~50%. Use a model with strong long-context fidelity; small models frequently hallucinate citations when the document exceeds 32k tokens.