Use-case preset

Early-stage startup (100M tokens/mo) cost calculator

Real production usage at 100M tokens/mo; cost-per-quality and rate limits matter.

At 100M tokens/month with a 70/30 input/output split and 8k average context, you're spending roughly $80–150/mo on a mid-tier model at $0.9/M input and $0.9/M output tokens. That's a sustainable R&D budget before revenue kicks in. Under-5s p95 latency keeps the product feeling responsive without requiring the most expensive real-time infrastructure.

Rate-limit headroom is the hidden constraint: many providers cap new accounts at 10–50 RPM, which becomes a bottleneck before cost does. Cached prompts at 40% reflects that you likely have a stable system prompt but variable user messages. Prioritize models that offer free tier or low-commitment monthly plans so you can iterate without locking in. Cost-per-quality ratio matters more than raw cost — don't over-optimize for cheapest before you've shipped.

Recommended models

meta/llama-3.3-70b-instruct

Best quality-to-cost ratio at this scale; widely available with generous rate limits on most providers.

alibaba/qwen-3-32b-instruct

Lower cost than 70B-class models with competitive quality; good fit when budget is the primary constraint.

mistralai/mistral-small-3

Fast and inexpensive; rate limits are rarely a bottleneck at this tier.

deepseek/deepseek-v3

Strong reasoning at low cost; good option for logic-heavy startup use cases.

google/gemma-2-27b-it

Open-weights option for self-hosting if provider rate limits become a recurring issue.