Use-case preset

Real-time translation cost calculator

Live translation of short utterances with strict sub-1s latency.

Live translation of short utterances — speech-to-text output or typed messages — where the source text and translated result are roughly the same token length. Typical in conferencing, customer support, and live captioning pipelines.

The 50/50 ratio is accurate: the source utterance plus brief system instructions mirror the length of the translated output. A 4k context window is generous for utterance-level translation — most inputs are under 100 tokens. The interactive latency target (sub-1s) is required for live use; anything over 800 ms breaks conversational flow. `cachedPromptPercent` is ~30: language pair instructions and a few shot examples are stable across a session, but each utterance is new. Cost is proportional to throughput — at high volume (e.g., 10k utterances/hour), a compact 7B multilingual model is the obvious choice over a 70B, with comparable BLEU scores on short segments.

Recommended models

alibaba/qwen-2.5-72b-instruct

Top-tier multilingual capability across 29+ languages; reliable quality on short utterances.

alibaba/qwen-3-8b-instruct

Fast 8B multilingual model; near-identical translation quality to 72B on short segments at 8× lower cost.

meta/llama-3.1-8b-instruct

Low latency, decent multilingual coverage for the major language pairs in most deployments.

mistralai/mistral-small-3

Strong European language pairs with fast inference; good fit for EU-facing live translation.