Use-case preset

Email drafting cost calculator

Draft email replies from a short prompt or thread snippet.

An email drafting assistant that takes a 1–2 paragraph prompt or thread snippet and returns a polished reply. Input and output are roughly equal in length, hence the 50/50 ratio. The 2k context window is intentionally tight — email threads are short, and padding with irrelevant history increases cost without improving quality.

Latency is best-effort; users submitting an email draft can tolerate a few seconds. Because every email carries a different prompt and thread, caching is low value (15%). The cost lever is model size: email drafting is a fluency task, not a reasoning task, so a well-tuned 7–8B model typically matches a 70B model in user satisfaction at one-fifth the cost. Run an A/B on your user base before committing to a larger model. The real cost driver at scale is call volume, not context length.

Recommended models

mistralai/mistral-7b-instruct-v0.3

Fast, cheap, and fluent — well-suited for professional writing tasks at short context lengths.

meta/llama-3.1-8b-instruct

Strong writing quality for its size; cost-efficient at the 2k context and 50/50 ratio.

alibaba/qwen-3-8b-instruct

Excellent fluency with good multilingual coverage for diverse user bases.

google/gemma-2-9b-it

Smooth, natural prose generation; competitive token pricing for high call volumes.