Use-case preset
Real-time translation cost calculator
Live translation of short utterances with strict sub-1s latency.
Live translation of short utterances — speech-to-text output or typed messages — where the source text and translated result are roughly the same token length. Typical in conferencing, customer support, and live captioning pipelines.
The 50/50 ratio is accurate: the source utterance plus brief system instructions mirror the length of the translated output. A 4k context window is generous for utterance-level translation — most inputs are under 100 tokens. The interactive latency target (sub-1s) is required for live use; anything over 800 ms breaks conversational flow. `cachedPromptPercent` is ~30: language pair instructions and a few shot examples are stable across a session, but each utterance is new. Cost is proportional to throughput — at high volume (e.g., 10k utterances/hour), a compact 7B multilingual model is the obvious choice over a 70B, with comparable BLEU scores on short segments.