Choosing an open-source model to run locally
Every model in this catalog runs entirely on your own hardware — no API keys, no per-token billing and no data leaving your machine. That makes local models a good fit for private workloads, offline environments and high-volume tasks where a metered cloud API would get expensive. The trade-off is that you pick the hardware, so the right model depends as much on your machine as on the task.
The two numbers that matter most are parameter count and VRAM required. Parameter count is a rough proxy for capability — larger models reason and write better, but need more memory and run slower. VRAM required tells you whether a model fits on your GPU at all; a quantized build lowers that number, at a small cost to quality. Use the sidebar filters to narrow the list to models your machine can actually run before comparing anything else.
Match the model to the task
A model tuned for code completion behaves differently from one tuned for chat or vision, even at the same size. The Tasks filter groups models by what they were trained to do — text generation, image-to-text, text-to-image and more — so start there, then sort within the task by size or how recently the weights were updated.
Best models for general chat and reasoning
These general-purpose models balance answer quality against hardware cost. Each one runs comfortably on a recent laptop or a mid-range GPU.
| Model | Parameters | VRAM required | Best for |
|---|---|---|---|
Qwen 3 8B | 8B | ~8 GB | Everyday chat on a laptop |
Llama 3.1 8B | 8B | ~8 GB | Balanced reasoning and writing |
Mistral Small 24B | 24B | ~16 GB | Stronger reasoning, mid-range GPU |
Qwen 3 32B | 32B | ~24 GB | Highest quality, desktop GPU |
Best models for low-end hardware
If you're working on a CPU-only machine or a GPU with limited memory, these compact and quantized models stay responsive without a discrete graphics card.
| Model | Parameters | VRAM required | Best for |
|---|---|---|---|
Qwen 3 1.7B | 1.7B | CPU only | Fast replies on any laptop |
Llama 3.2 3B | 3B | < 8 GB | Lightweight assistant tasks |
Mistral 7B (Q4) | 7B | ~6 GB | Quantized build for older GPUs |
These picks are a starting point, not a ranking — the right model is the largest one that runs smoothly on your hardware for the task you care about. Use the filters above to explore the full catalog.

