Hardware match · 16 GB usable

Best open-weight LLMs for the CPU only (no GPU)

With ~16 GB available for model weights, 0 open-weight models run locally on a CPU only (no GPU) — each at the best quantization that fits. The 5 that need more memory are listed below with the cheapest provider to rent them instead. VRAM figures are estimates.Updated May 2026.

Ranked for CPU only (no GPU)

5 models

Different machine? Pick another rig or enter your VRAM →

Rank for your hardwarecustom

Runs on your hardware

No models fit this VRAM budget. Try a larger rig, or rent one below.

Needs more — rent it

These exceed your VRAM. Blended price + provider count show the cheapest way to rent them in the cloud.

#			↗	↗	↗	↗		Providers
01	Qwen 2.5 72B Instructqwen· hosted only· stale	—	$327272.76	—	—	131k	needs ≈172.8 GB	1
02	DeepSeek R1 Distill Llama 70Bdeepseek· hosted only· stale	—	$636363.71	—	—	131k	needs ≈168 GB	1
03	DeepSeek V3deepseek· hosted only· stale	—	$290909.17	—	—	131k	needs ≈1610.4 GB	1
04	Llama 3.1 70B Instructllama· hosted only· stale	—	$363636.40	—	—	131k	needs ≈168 GB	1
05	Llama 3.1 8B Instructllama· hosted only· stale	—	$1818.19	—	—	131k	needs ≈19.2 GB	1

VRAM ≈ parameters × bytes-per-param × 1.2 overhead, at the best available quantization — see the methodology docs. Estimates, not a guarantee.

Other hardware

← All rigs

12 GB

Best open-weight LLMs for the CPU only (no GPU)

Ranked for CPU only (no GPU)

Runs on your hardware

Needs more — rent it

Other hardware

RTX 3060 (12GB)

RTX 4060 Ti (16GB)

RTX 4090 (24GB)

RTX 3090 (24GB)

2× RTX 3090 (48GB)

RTX A6000 (48GB)

MacBook Pro M3 Max (64GB)

Mac Studio M2 Ultra (192GB)

Runs on your hardware

Needs more — rent it