gemma-4-E2B-it

Updated
25.06.2026
Thinking
Embedding
Vision
Audio
Reasoning
Code
Multilingual

huggingface-cli download google/gemma-4-E2B-it
from transformers import AutoModel
model = AutoModel.from_pretrained("google/gemma-4-E2B-it")

More models

NameSize / UsageContextInput
gemma-4-31B-it
256KText, Image, Audio
gemma-4-26B-A4B-it
256KText, Image, Audio
gemma-4-12B-it
125KText, Image, Audio
diffusiongemma-26B-A4B-it
256KText, Image, Audio
gemma-3-270m
32KText
gemma-3-1b-it
32KText

At a glance

  • License: Apache 2.0
  • Context length: 125K tokens
  • Languages: Multilingual
  • Minimum hardware: ~3 GB VRAM
  • Strengths: reasoning and on-device inference

Overview

gemma-4-E2B-it is an instruction-tuned model from Google, built on the open-weight Gemma 4 family. The "E" stands for effective parameters: E2B has about 2.3B effective parameters and 5.1B parameters once embeddings are counted. It is a compact, multimodal model designed for edge and on-device use, with a 125K-token context window.

Because the weights are open and the model is small, gemma-4-E2B-it runs fully on your own hardware through Atomic Chat. Nothing leaves your machine, so prompts and files stay private, and the model keeps working without a network connection once downloaded.

What it is good at

The "-it" suffix means it follows instructions, and the model carries the Gemma 4 capability set across text, image, and audio. Three things it handles well:

  • Vision and audio — it reads images and accepts native audio input, so you can ask questions about a screenshot or transcribe and understand speech locally.
  • Reasoning — a built-in thinking mode lets it work through a problem step by step before answering, which helps on multi-step questions and structured tasks.
  • Code and multilingual chat — it writes and explains code and holds conversations across many languages, useful for quick edits and drafting without sending anything to a server.

Running it locally

At 5.1B parameters, gemma-4-E2B-it is one of the lighter models you can run at home. A 4-bit quantized build uses roughly 5GB of RAM, and the model can run on a CPU rather than needing a dedicated GPU, so a machine with about 4GB of free RAM can load it. The 125K context window lets you feed in long documents or chat history.

huggingface-cli download google/gemma-4-E2B-it

You can load the weights through Transformers or vLLM, or skip the setup entirely and open gemma-4-E2B-it in Atomic Chat with one click.

License

gemma-4-E2B-it is released under the apache-2.0 license. That permits commercial use, modification, and redistribution, so you can run it locally, fine-tune it, and ship it inside your own projects without a subscription or per-token fee.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

It is an instruction-tuned model from Google in the Gemma 4 family. The E2B variant is a compact multimodal model with about 2.3B effective parameters (5.1B including embeddings) and a 125K-token context window. It handles text, images, and audio, and is built to run on edge and consumer hardware.

A 4-bit quantized build of gemma-4-E2B-it uses around 5GB of RAM, and Google lists a minimum of roughly 4GB of free RAM to load it. It runs on a CPU, so a dedicated GPU is not required, though a GPU speeds up generation. The model was engineered for offline mobile and IoT devices, including boards like the Jetson Orin Nano.

Yes. The weights are released under the apache-2.0 license and are free to download and run locally, with no subscription or per-token charge. The license also allows commercial use, modification, and redistribution.

Yes. Once you download the weights, gemma-4-E2B-it runs fully offline with no network connection. Running it through Atomic Chat keeps every prompt and file on your own device, so your data is never sent to a server.

It is a good fit for private, on-device tasks: reading images and audio, step-by-step reasoning through its built-in thinking mode, writing and explaining code, and multilingual chat. Its small footprint makes it practical for laptops, phones, and small edge devices where larger models will not fit.