gemma-3-270m

Updated
25.06.2026
Reasoning
Multilingual

huggingface-cli download google/gemma-3-270m
from transformers import AutoModel
model = AutoModel.from_pretrained("google/gemma-3-270m")

More models

NameSize / UsageContextInput
gemma-4-31B-it
256KText, Image, Audio
gemma-4-26B-A4B-it
256KText, Image, Audio
gemma-4-12B-it
125KText, Image, Audio
diffusiongemma-26B-A4B-it
256KText, Image, Audio
gemma-4-E2B-it
125KText, Image, Audio
gemma-3-1b-it
32KText

At a glance

  • License: Gemma
  • Context length: 32K tokens
  • Languages: Multilingual
  • Minimum hardware: ~2 GB VRAM
  • Strengths: ultra-light tasks and fine-tuning base

Overview

gemma-3-270m is the smallest member of Google's Gemma 3 family, a text-only model with just 0.27B parameters. It uses a dense transformer architecture and ships as a fine-tune base, meant to be adapted to a narrow task rather than used as a broad general-purpose chatbot out of the box. The tiny footprint comes from how those parameters are split: a large 256k-token vocabulary takes up most of the weight, leaving a compact transformer core that runs fast.

The point of a model this small is that it lives on your own hardware. In Atomic Chat you load gemma-3-270m and it runs fully on-device, so prompts and outputs never leave your machine and you can keep working with no network. That makes it a practical pick for private, offline use where a 7B or 70B model would be overkill or too heavy to run.

What it is good at

gemma-3-270m carries reasoning and multilingual capabilities, and at this size it shines on focused, repeatable jobs rather than open-ended conversation:

  • Structured extraction — pulling entities, fields, or sentiment out of raw text and returning clean structured output, the kind of task Google highlights for this model.
  • Multilingual text handling — the 256k vocabulary covers rare tokens and many languages, so it can route, classify, or rewrite text across languages on-device.
  • Fine-tuned task agents — as a fine-tune base it is cheap to specialize for one job (query routing, tagging, compliance checks) and then run that specialized version locally.

Running it locally

With 0.27B parameters and a 32K context window, gemma-3-270m is one of the lightest LLMs you can run. Quantized to INT4 it needs under 200 MB of memory, which means it fits on laptops, desktops, and even phones without a dedicated GPU. Pull the weights from Hugging Face:

huggingface-cli download google/gemma-3-270m

You can load it through Transformers or vLLM for scripting and serving, or open it in Atomic Chat with one click and start chatting on-device, no setup beyond the download.

License

gemma-3-270m is released under the Gemma license. It provides open weights and permits responsible commercial use, including fine-tuning and deploying your adapted version in your own projects, subject to Google's usage terms.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

gemma-3-270m is the smallest model in Google's Gemma 3 family, a text-only transformer with 0.27B parameters and a 32K context window. It is a fine-tune base built for narrow, efficient tasks like extraction, classification, and routing rather than broad chat. You can run it fully on-device through Atomic Chat.

Very little. Quantized to INT4, gemma-3-270m uses under 200 MB of memory and runs without a dedicated GPU, so a standard laptop, desktop, or even a phone can handle it. For local fine-tuning rather than inference, you'd want an NVIDIA GPU with around 8 GB of VRAM, but plain inference in Atomic Chat works on modest CPUs.

Yes. The weights are open and free to download from Hugging Face under the Gemma license, which permits responsible commercial use including fine-tuning and deploying your own version. Running it locally in Atomic Chat costs nothing beyond your own hardware and electricity.

Yes. Once you download the weights, gemma-3-270m runs entirely on your machine with no network connection. In Atomic Chat your prompts and outputs stay on-device, which keeps the model usable on a plane, in the field, or anywhere private data shouldn't leave your computer.

It is strongest on focused, high-volume tasks: entity extraction, sentiment analysis, query routing, and turning unstructured text into structured output. As a fine-tune base it is cheap to specialize for one job and then run that version locally. For long open-ended conversation or deep reasoning, a larger model is a better fit.