Overview
gemma-4-26B-A4B-it is an instruction-tuned model from Google, part of the Gemma 4 family. Its name encodes the architecture: it is a Mixture-of-Experts (MoE) model with 26.5B total parameters but only about 4B active per token, routing each forward pass through a small subset of 128 fine-grained experts. The practical result is that it runs close to the speed of a 4B model while keeping the knowledge of a much larger one.
On atomic.chat the model runs through Atomic Chat, a free open-source desktop app that loads open-weight LLMs directly on your machine. Nothing is sent to a server. Once the weights are downloaded, gemma-4-26B-A4B-it works fully offline on your own hardware, so prompts and files stay local.
What it is good at
The model is multimodal and tuned for reasoning, with a 262144-token context window that handles long documents and codebases in a single pass.
- Code — generation, completion, and correction across languages, plus agentic tool-use workflows that call functions and parse structured output.
- Reasoning and thinking — a configurable thinking mode where the model writes an internal reasoning pass before its final answer, which helps on multi-step math and logic problems.
- Vision and multimodal input — it reads images and short video alongside text, so you can ask questions about a screenshot, a chart, or a diagram locally.
Running it locally
At 26.5B total parameters the weights are compact thanks to quantization. A Q4_K_M build of gemma-4-26B-A4B-it fits in roughly 18GB of VRAM, which a single 24GB consumer GPU or an Apple Silicon Mac with enough unified memory can hold. The 262144-token context grows the KV cache as conversations get long, so leave memory headroom beyond the weights themselves.
huggingface-cli download google/gemma-4-26B-A4B-it
The model has day-one support in Transformers and vLLM, plus llama.cpp, MLX, and Ollama. In Atomic Chat you can load it one-click without writing any setup code.
License
gemma-4-26B-A4B-it is released under the apache-2.0 license. That permits free use, modification, redistribution, and commercial deployment, including running the weights privately on your own hardware and fine-tuning them for your own projects.
