0 providers50 models

Model crosswalk

Side-by-side on price, capability and workload. Both columns use the cheapest provider for that model.

Granite 3.1 2B Instruct
vs
Llama 3.2 3B Instruct
Granite 3.1 2B InstructA

Granite 3.1 2B Instruct

2B params · 131K context · apache-2.0

Cheapest provider
$/1M input
$/1M output
Llama 3.2 3B InstructB

Llama 3.2 3B Instruct

3B params · 131K context · llama-3

Cheapest provider
$/1M input
$/1M output
Specs and cheapest providers
SpecGranite 3.1 2B InstructLlama 3.2 3B Instruct
Parameters2B3B
Context window131K tokens131K tokens
Licenseapache-2.0llama-3
Released2024-12-192024-09-25
Cheapest provider
Provider
Input / 1M tokens
Output / 1M tokens

Add a third model to compare

Benchmark comparison

No benchmark data available for either model yet.

Sample workload — 5M in + 2M out per month

using each model's cheapest provider
Granite 3.1 2B Instruct
$0.00 /mo
Llama 3.2 3B Instruct
$0.00 /mo

What changes at scale

Output tokens dominate cost above a 1:3 input/output ratio. Below 1:1, input dominates and cheaper-input providers win regardless of headline price.

1M in · 250K out$0.00 · $0.00
5M in · 2M out$0.00 · $0.00
20M in · 10M out$0.00 · $0.00
100M in · 60M out$0.00 · $0.00

Capability vs price

scatter
// scatter: benchmark × $/1M out
Calculate cost for your workload

Compare total monthly cost across providers for Granite 3.1 2B Instruct and Llama 3.2 3B Instruct using your own input/output token mix.

Open workload calculator →
Editor's take
At the sub-4B tier, every fraction of a cent matters. [Granite 3.1 2B Instruct](/models/ibm--granite-3.1-2b-instruct) and Llama 3.2 3B Instruct both price out under $0.08/1M tokens at competitive providers, but Llama 3.2 3B typically runs $0.01–0.02/1M tokens cheaper given the volume of provider competition behind Meta models. Granite 3.1 2B has a 1B-parameter weight advantage for tighter memory budgets, fitting in ~4 GB VRAM at INT4 vs ~7 GB for Llama 3.2 3B. Granite 3.1 2B Instruct was built with enterprise compliance pipelines in mind — IBM tuned it specifically for code understanding, log analysis, and retrieval-augmented generation tasks where a small, auditable model is required. On coding classification tasks (e.g., tagging support tickets by error type, routing CI/CD alerts), Granite 3.1 2B holds accuracy within 2–3% of larger models while staying well under $0.05/1M tokens. Its Apache 2.0 license also simplifies on-prem deployment approval. [Llama 3.2 3B Instruct](/models/meta--llama-3.2-3b-instruct) wins on general-purpose instruction following. Trained on a broader corpus, it scores higher on open-domain QA and summarization benchmarks, and its wider provider ecosystem means you get more flexibility on latency SLAs. For mobile or edge inference scenarios, Llama 3.2 3B has broader quantized-model support across GGUF and MLX runtimes. **Pick Granite 3.1 2B Instruct** if you're running enterprise log/code classification tasks, need Apache 2.0 licensing, or are constrained to 4 GB VRAM. **Pick Llama 3.2 3B Instruct** if you want the cheapest general-purpose small model with maximum provider choice.
Related comparisons
Full model details