0 providers0 models

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Llama 3.2 1b Instruct
vs
Llama 3.2 3b Instruct
Llama 3.2 1b InstructA

Llama 3.2 1b Instruct

Cheapest provider
$/1M input
$/1M output
Llama 3.2 3b InstructB

Llama 3.2 3b Instruct

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecLlama 3.2 1b InstructLlama 3.2 3b Instruct
Parameters
Context window
License
Released
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens
Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider
Llama 3.2 1b Instruct
$0.00 /mo
Llama 3.2 3b Instruct
$0.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$0.00 · $0.00
5M in · 2M out$0.00 · $0.00
20M in · 10M out$0.00 · $0.00
100M in · 60M out$0.00 · $0.00

Capability vs price

scatter
// scatter: benchmark × $/1M out
Calculate cost for your workload

Compare total monthly cost across providers for Llama 3.2 1b Instruct and Llama 3.2 3b Instruct using your own input/output token mix.

Open workload calculator →
Editor's take
## Llama 3.2 1B Instruct vs Llama 3.2 3B Instruct Both models target edge and on-device deployment, but they occupy distinct operating points. [Llama 3.2 1B Instruct](/models/meta--llama-3.2-1b-instruct) costs roughly $0.04–$0.06/1M tokens on hosted providers; [Llama 3.2 3B Instruct](/models/meta--llama-3.2-3b-instruct) runs $0.06–$0.10/1M tokens — a 1.5–2× premium for roughly 15–20% higher accuracy on instruction-following benchmarks (IFEval, MT-Bench). The 1B fits comfortably in ~2 GB of RAM at 4-bit quantization; the 3B needs ~2.5 GB, which matters on constrained devices. Throughput on shared cloud inference is fast for both — expect 200–400 tokens/sec for the 1B and 150–300 tokens/sec for the 3B — making latency a non-issue for most real-time text applications. **Where 1B wins:** Keyword extraction, intent classification, simple slot-filling, and on-device inference on phones or microcontrollers with tight memory budgets. When you need sub-100 ms responses and the task is well-structured enough that a smaller model doesn't struggle, the 1B delivers at the lowest possible cost. **Where 3B wins:** Multi-turn chat, short summarization, lightweight code completion, and any task requiring coherent paragraph-length outputs. The 3B noticeably reduces repetition and hallucination on free-form generation tasks compared to the 1B, justifying the modest price increase. Pick the 1B if you're optimizing for inference cost or device memory and your task is classification or structured extraction. Pick the 3B if output quality degrades visibly with the 1B and you can tolerate a 2× cost increase.
Related comparisons
Full model details