gemma-3-1b-it

Updated
25.06.2026
Reasoning
Code
Multilingual

huggingface-cli download google/gemma-3-1b-it
from transformers import AutoModel
model = AutoModel.from_pretrained("google/gemma-3-1b-it")

More models

NameSize / UsageContextInput
gemma-4-31B-it
256KText, Image, Audio
gemma-4-26B-A4B-it
256KText, Image, Audio
gemma-4-12B-it
125KText, Image, Audio
diffusiongemma-26B-A4B-it
256KText, Image, Audio
gemma-4-E2B-it
125KText, Image, Audio
gemma-3-270m
32KText

At a glance

  • License: Gemma
  • Context length: 32K tokens
  • Languages: Multilingual
  • Minimum hardware: ~3 GB VRAM
  • Strengths: tiny on-device chat and summarization

Overview

gemma-3-1b-it is the 1B-parameter, instruction-tuned model from Google's Gemma 3 family, built on the same research behind the Gemini models. It uses a dense transformer architecture (the dense and gemma3 tags) rather than a mixture-of-experts design, so every parameter is active on each token. The edge tag signals what it was built for: running on a phone, laptop, or single-board computer instead of a datacenter.

In Atomic Chat the model runs fully on your own hardware. Weights live on your disk, prompts never leave the machine, and once the download finishes you can use gemma-3-1b-it with no internet connection and no API key. The "-it" suffix marks the instruction-tuned variant, so it follows chat-style prompts out of the box.

What it is good at

At 1B parameters gemma-3-1b-it trades raw depth for speed and a tiny footprint, which suits quick local tasks rather than heavy long-form work. Its capability tags point to three areas:

  • Reasoning — short question answering, summarization, and step-by-step instruction following on a 32K-token context window.
  • Code — generating snippets, explaining functions, and drafting shell or regex one-liners for fast local help inside an editor or terminal workflow.
  • Multilingual — the Gemma 3 training covers a wide range of languages, so the model handles translation and non-English prompts beyond plain English.

Running it locally

The 1B size is the lightest Gemma 3 model. It needs only about 2GB of RAM and the quantized GGUF builds are roughly 529MB on disk, so it runs on modest laptops and even on CPU when no GPU is present. The context window holds 32K tokens.

huggingface-cli download google/gemma-3-1b-it

You can load it through Hugging Face Transformers or serve it with vLLM, or skip the setup entirely and install it in Atomic Chat with one click, which handles the download and runs inference on-device.

License

gemma-3-1b-it is released under the Gemma license. It permits free use, modification, and redistribution, including commercial use, provided you follow Google's Gemma Terms of Use and the prohibited-use policy. Review the terms on the model's Hugging Face page before deploying it in a product.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

gemma-3-1b-it is Google's 1B-parameter, instruction-tuned model from the Gemma 3 family. It is a dense transformer built for edge and on-device use, with a 32K-token context window. The "-it" means it is tuned to follow chat-style instructions out of the box.

The 1B size is the lightest Gemma 3 model and needs roughly 2GB of RAM. Quantized GGUF builds are around 529MB on disk, so it runs on ordinary laptops and even on CPU when no dedicated GPU is available. A GPU speeds it up but is not required.

Yes. The weights are open and free to download under the Gemma license, which allows commercial use under Google's terms. Running it yourself in Atomic Chat means there are no per-prompt fees, API limits, or subscriptions.

Yes. After the initial download the model runs entirely on your machine with no internet connection. Prompts and outputs stay local, which keeps your data private since nothing is sent to a server.

It fits fast, lightweight local tasks: short question answering, summarization, instruction following, small code snippets, and multilingual prompts. The 1B size favors speed and a tiny footprint over the depth of larger models, so it is a strong pick for quick on-device help rather than long, complex reasoning.