gemma-4-E2B-it

Updated

Thinking

Embedding

Vision

Audio

Reasoning

Code

Multilingual

Run

huggingface-cli download google/gemma-4-E2B-it

from transformers import AutoModel
model = AutoModel.from_pretrained("google/gemma-4-E2B-it")

More models

View all

Name	Context	Input
gemma-4-31B-it	256K	Text, Image, Audio
gemma-4-26B-A4B-it	256K	Text, Image, Audio
gemma-4-12B-it	125K	Text, Image, Audio
diffusiongemma-26B-A4B-it	256K	Text, Image, Audio
gemma-3-270m	32K	Text
gemma-3-1b-it	32K	Text

At a glance

License: Apache 2.0
Context length: 125K tokens
Languages: Multilingual
Minimum hardware: ~3 GB VRAM
Strengths: reasoning and on-device inference

Overview

gemma-4-E2B-it is an instruction-tuned model from Google, built on the open-weight Gemma 4 family. The "E" stands for effective parameters: E2B has about 2.3B effective parameters and 5.1B parameters once embeddings are counted. It is a compact, multimodal model designed for edge and on-device use, with a 125K-token context window.

Because the weights are open and the model is small, gemma-4-E2B-it runs fully on your own hardware through Atomic Chat. Nothing leaves your machine, so prompts and files stay private, and the model keeps working without a network connection once downloaded.

What it is good at

The "-it" suffix means it follows instructions, and the model carries the Gemma 4 capability set across text, image, and audio. Three things it handles well:

Vision and audio — it reads images and accepts native audio input, so you can ask questions about a screenshot or transcribe and understand speech locally.
Reasoning — a built-in thinking mode lets it work through a problem step by step before answering, which helps on multi-step questions and structured tasks.
Code and multilingual chat — it writes and explains code and holds conversations across many languages, useful for quick edits and drafting without sending anything to a server.

Running it locally

At 5.1B parameters, gemma-4-E2B-it is one of the lighter models you can run at home. A 4-bit quantized build uses roughly 5GB of RAM, and the model can run on a CPU rather than needing a dedicated GPU, so a machine with about 4GB of free RAM can load it. The 125K context window lets you feed in long documents or chat history.

huggingface-cli download google/gemma-4-E2B-it

You can load the weights through Transformers or vLLM, or skip the setup entirely and open gemma-4-E2B-it in Atomic Chat with one click.

License

gemma-4-E2B-it is released under the apache-2.0 license. That permits commercial use, modification, and redistribution, so you can run it locally, fine-tune it, and ship it inside your own projects without a subscription or per-token fee.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

It is an instruction-tuned model from Google in the Gemma 4 family. The E2B variant is a compact multimodal model with about 2.3B effective parameters (5.1B including embeddings) and a 125K-token context window. It handles text, images, and audio, and is built to run on edge and consumer hardware.

A 4-bit quantized build of gemma-4-E2B-it uses around 5GB of RAM, and Google lists a minimum of roughly 4GB of free RAM to load it. It runs on a CPU, so a dedicated GPU is not required, though a GPU speeds up generation. The model was engineered for offline mobile and IoT devices, including boards like the Jetson Orin Nano.

Yes. The weights are released under the apache-2.0 license and are free to download and run locally, with no subscription or per-token charge. The license also allows commercial use, modification, and redistribution.

Yes. Once you download the weights, gemma-4-E2B-it runs fully offline with no network connection. Running it through Atomic Chat keeps every prompt and file on your own device, so your data is never sent to a server.

It is a good fit for private, on-device tasks: reading images and audio, step-by-step reasoning through its built-in thinking mode, writing and explaining code, and multilingual chat. Its small footprint makes it practical for laptops, phones, and small edge devices where larger models will not fit.