0 providers50 models

Use-case preset

Real-time translation cost calculator

Live translation of short utterances with strict sub-1s latency.

Live translation of short utterances — speech-to-text output or typed messages — where the source text and translated result are roughly the same token length. Typical in conferencing, customer support, and live captioning pipelines.

The 50/50 ratio is accurate: the source utterance plus brief system instructions mirror the length of the translated output. A 4k context window is generous for utterance-level translation — most inputs are under 100 tokens. The interactive latency target (sub-1s) is required for live use; anything over 800 ms breaks conversational flow. `cachedPromptPercent` is ~30: language pair instructions and a few shot examples are stable across a session, but each utterance is new. Cost is proportional to throughput — at high volume (e.g., 10k utterances/hour), a compact 7B multilingual model is the obvious choice over a 70B, with comparable BLEU scores on short segments.

Recommended models

Top-tier multilingual capability across 29+ languages; reliable quality on short utterances.
Fast 8B multilingual model; near-identical translation quality to 72B on short segments at 8× lower cost.
Low latency, decent multilingual coverage for the major language pairs in most deployments.
Strong European language pairs with fast inference; good fit for EU-facing live translation.