VibeThinker-3B

Updated
24.06.2026
Tools
Thinking
Reasoning
Code

huggingface-cli download WeiboAI/VibeThinker-3B
from transformers import AutoModel
model = AutoModel.from_pretrained("WeiboAI/VibeThinker-3B")

More models

No items found.
NameSize / UsageContextInput

At a glance

  • License: Mit
  • Context length: 128K tokens
  • Languages: en
  • Minimum hardware: ~2 GB VRAM
  • Strengths: reasoning, coding and on-device inference

Overview

VibeThinker-3B is a 3.1B-parameter reasoning model from WeiboAI, the AI team behind Weibo. It is a dense model post-trained on top of Qwen2.5-Coder-3B, and it carries a 131,072-token context window. The training pipeline, called the Spectrum-to-Signal Principle, pushes a compact model toward the kind of step-by-step math and code reasoning usually reserved for far larger systems.

The point of a model this small is where it runs. VibeThinker-3B is open-weight under an MIT license, so Atomic Chat loads it on your own machine and keeps every prompt and response on-device. Nothing leaves your computer, it works with no internet connection, and there is no API bill or rate limit.

What it is good at

VibeThinker-3B was tuned for verifiable reasoning: problems where an answer can be checked. Its capabilities cover thinking, reasoning, code, and tool calling.

  • Competition math — the model thinks through multi-step problems and shows its work, the area where it posted strong AIME and HMMT scores in WeiboAI's reported benchmarks.
  • Code generation and debugging — built on a Qwen2.5-Coder base, it writes functions, traces logic errors, and explains what a snippet does.
  • Tool calling in agents — it supports structured tool calls, so it can drive local agent loops that hit a calculator, a search function, or your own scripts.

Running it locally

At 3.1B parameters the model is light. A 4-bit GGUF quantization runs in roughly 2 GB of memory, full bf16 weights need about 6 GB, and a 16 GB laptop or an Apple Silicon Mac handles it comfortably. The 131,072-token context is available, though a long context grows the KV cache and wants more headroom.

huggingface-cli download WeiboAI/VibeThinker-3B

You can serve the raw weights through Transformers or vLLM, or skip the setup entirely and load VibeThinker-3B in Atomic Chat with one click, which pulls a quantized build and runs it offline.

License

VibeThinker-3B is released under the MIT license. That permits commercial use, modification, fine-tuning, and redistribution, with the only requirement being that you keep the original copyright and license notice.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

VibeThinker-3B is a 3.1B-parameter reasoning model from WeiboAI, post-trained on Qwen2.5-Coder-3B. It is aimed at verifiable tasks like competition math, STEM problems, and code, and it ships open-weight under an MIT license. Despite its small size, WeiboAI reports it competing with much larger models on math benchmarks such as AIME and HMMT.

It is one of the lighter models to run. A 4-bit GGUF quantization needs roughly 2 GB of memory, and the full bf16 weights need about 6 GB. A laptop with 16 GB of RAM, any modern GPU with 6 GB or more of VRAM, or an Apple Silicon Mac runs it without trouble.

Yes. The weights are released under the MIT license, which allows free use including commercial projects, modification, and redistribution. Running it locally in Atomic Chat means there is no subscription, no API key, and no per-token cost.

Yes. Once the model is downloaded it runs fully on your own machine with no internet connection. In Atomic Chat every prompt and response stays on-device, so nothing is sent to an external server.

It is strongest on verifiable reasoning: competition-style math, STEM problems, and coding, where an answer can be checked. It also supports structured tool calling for local agent workflows. For broad open-domain knowledge and general chat, a larger general-purpose model is usually a better fit.