Use-case preset
Voice assistant backend cost calculator
Voice-pipeline LLM step (STT→LLM→TTS); sub-1s latency.
The LLM step in a voice assistant pipeline sits between STT (which outputs a transcript) and TTS (which needs a response string). The input is typically 100–300 tokens of transcript plus a short system prompt; output is 50–150 tokens of spoken-language reply. At 60/40 input/output and 2k context, the token budget fits comfortably.
Latency is the only constraint that matters here. End-to-end voice round-trip budgets are 800–1200 ms, so the LLM step must clear p95 under 500 ms to leave room for STT and TTS. That forces you toward smaller, faster models — 7B–8B class — deployed on providers with low first-token latency. Cached prompts (system prompt + conversation history) cover roughly 40% of input tokens; keep the system prompt stable to maximise cache hits. Cost matters less than TTFT; overpaying 20% for a faster cold-start is almost always the right trade.