Use-case preset
Document Q&A — large corpus cost calculator
Q&A over a multi-hundred-page corpus with retrieved-chunk-heavy prompts.
Document Q&A over a large corpus — legal precedents, engineering wikis, compliance manuals — feeds 25–30k tokens of retrieved passages into each call, with a user question of 200–500 tokens and a concise answer as output. The extreme 95/5 ratio and 32k context reflect that retrieved chunks, not generated prose, dominate every request.
Latency is best-effort because users accept a few seconds to search a thousand-page corpus. The high cachedPromptPercent (60) captures the stable system prompt and boilerplate chunk templates that appear in every request. Cost control here lives almost entirely on the input side: tighter retrieval (top-3 vs top-10 chunks) can cut per-query spend 3×. Models with strong long-context faithfulness outperform raw benchmark scores — prioritize recall accuracy over general reasoning ratings.