Use-case preset

Social media content cost calculator

Generate tweets/threads/posts from a brief; batch workload.

The prompt is a short creative brief — topic, tone, target platform, optional key message — typically 100–300 tokens. Output is a tweet, thread, or LinkedIn post: 150–400 tokens. The 40/60 input/output split reflects that output dominates; the 1k context window is generous for this use case. This runs as a batch workload: a content team queues up 50–200 briefs overnight and reviews the results in the morning.

Because output tokens are cheaper than input tokens on most providers, the absolute cost per post is low even at high volume. Cache 0–15%: system-prompt boilerplate (brand voice, prohibited phrases) is stable, but each brief is unique so cache hit rate is limited. The main quality risk is generic, repetitive output — a small fine-tuned model with brand examples in the context often beats a large general model. Experiment with 7B–14B models before defaulting to 70B.

Recommended models

alibaba/qwen-3-14b-instruct

14B sweet spot for creative generation; varied output style at low batch cost.

meta/llama-3.1-8b-instruct

Fast and cheap for high-volume batch; adequate creative quality for short-form posts.

mistralai/mistral-7b-instruct-v0.3

Low output-token cost; reliable instruction following for platform-specific format constraints.

google/gemma-2-9b-it

Strong conversational and creative output at 9B scale; good tone control.

nous/hermes-3-llama-3.1-70b

70B creative capability for premium content tiers that require higher brand fidelity.