Use-case preset
Internal team copilot cost calculator
Internal bot answering employees with company-specific context.
An internal team copilot answers employee questions by pulling from a company knowledge base — Confluence docs, Slack history, runbooks — and returning grounded, concise replies. A typical turn involves a few hundred tokens of user query plus retrieved context chunks totalling 6–7k tokens, producing a short answer under 200 tokens; hence the 80/20 input/output split and 8k context window.
The `under_5s_p95` latency target keeps responses fast enough for conversational use without paying for sub-second inference. System-prompt caching (cachedPromptPercent 40) captures the repeated persona, tool spec, and company-context preamble — the largest static block. Using a 70B-class model balances instruction-following quality against throughput cost; dropping to a distilled or quantized variant adds ~15% error rate on policy questions but cuts per-query cost by roughly 4×.