Overview
SmolLM2-135M-Instruct is the smallest model in Hugging Face's SmolLM2 family, which also includes 360M and 1.7B versions. It has 135 million parameters and uses a Llama-style transformer decoder. Hugging Face pretrained the base model on 2 trillion tokens drawn from FineWeb-Edu, DCLM, and The Stack, then produced this instruct variant through supervised fine-tuning followed by Direct Preference Optimization on the UltraFeedback dataset. The SmolLM2 work was published in early 2025 (arXiv:2502.02737).
What it's good at
The model is built for lightweight, on-device language tasks: short chat, instruction following, text rewriting, and summarization. Compared with the earlier SmolLM-135M-Instruct it improved sharply on instruction following, lifting IFEval from 17.2 to 29.9 and MT-Bench from 16.8 to 19.8, with gains on HellaSwag, ARC, and BBH as well. Its knowledge and math remain limited at this size, with GSM8K around 1.4, so it works best on simple, well-scoped prompts rather than open-ended reasoning. Function calling is reserved for the 1.7B variant, not this one.
Running locally
At 135M parameters the model needs roughly 720 MB in its default precision and runs on CPU, which makes it practical for laptops, phones, and other resource-constrained or offline devices. You can load it directly with Hugging Face Transformers, chat with it through the TRL CLI, or run it in the browser via Transformers.js. Quantized GGUF builds from the community (for example via llama.cpp or Ollama) shrink the footprint further for edge deployment.
License
SmolLM2-135M-Instruct is released under the Apache 2.0 license. That allows commercial use, modification, and redistribution, provided the license text and attribution notices are kept. Hugging Face also released the SFT dataset and the fine-tuning recipe, so the training process can be reproduced and extended.
