Qwen3-30B-A3B

Updated
25.06.2026
Tools
Thinking
Reasoning
Code
Multilingual

huggingface-cli download Qwen/Qwen3-30B-A3B
from transformers import AutoModel
model = AutoModel.from_pretrained("Qwen/Qwen3-30B-A3B")

More models

NameSize / UsageContextInput
Qwen3.6-35B-A3B
256KText, Image
Qwen3.6-27B
256KText, Image
WebWorld-8B
Web agents, multimodal reasoning40KText, Image
MiniCPM-V 4.6
5213 GB421KText, Image
anima
421 GB31KText
Qwen3-Coder-30B-A3B-Instruct
256KText
Qwen3-14B
128KText
Qwen3-32B
128KText

At a glance

  • License: Apache 2.0
  • Context length: 128K tokens
  • Languages: Multilingual
  • Minimum hardware: ~62 GB VRAM
  • Strengths: fast MoE reasoning with low active params

Overview

Qwen3-30B-A3B is a Mixture-of-Experts (MoE) language model from Qwen, the AI lab at Alibaba. It holds 30.5B total parameters but routes only about 3B of them per token, so it reasons with the breadth of a large model while running at the speed of a much smaller one. It carries a 128K context window and supports tool calling, multilingual text, and a switchable thinking mode for step-by-step reasoning.

In Atomic Chat the appeal is that all of this happens on your own machine. The weights load locally, nothing leaves your computer, and the model answers with no internet connection once it is downloaded. Your prompts and files stay on-device.

What it is good at

The model is built around reasoning, agentic use, and language coverage, which maps to a few concrete jobs:

  • Coding and debugging — writes functions, explains stack traces, and refactors across files, with the thinking mode helping it work through multi-step logic before answering.
  • Tool and agent workflows — the tools capability lets it call functions and structured APIs, so it can drive local automations or act as the brain of an agent loop.
  • Multilingual drafting and translation — Qwen3 covers over 100 languages, which makes it useful for translating, summarizing foreign-language text, and writing content in languages other than English.

Running it locally

With 30.5B total parameters, the model needs the full weight set in memory even though only 3B are active per token. A 4-bit quantization (Q4_K_M) lands around 17 GB, which fits a 24 GB GPU such as an RTX 4090, and people also run it on Apple Silicon with 32 GB or more of unified memory. The 128K context lets you feed it long documents or large code files in one pass.

huggingface-cli download Qwen/Qwen3-30B-A3B

You can load the weights with Transformers or serve them with vLLM. In Atomic Chat the model is a one-click download and run, so you skip the manual setup and start chatting offline.

License

Qwen3-30B-A3B is released under the apache-2.0 license. That permits commercial use, modification, and redistribution, so you can run it in a product or fine-tune it on your own data without a separate usage fee.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

Qwen3-30B-A3B is an open-weight Mixture-of-Experts model from Qwen with 30.5B total parameters and roughly 3B active per token. It supports a 128K context window, tool calling, multilingual text, and a switchable thinking mode for step-by-step reasoning. The A3B in the name refers to the 3B active parameters that give it small-model speed.

Because it is a MoE model, all 30.5B parameters load into memory even though only 3B run per token. A 4-bit quant (Q4_K_M) needs around 17 GB, which fits a 24 GB GPU like the RTX 4090, and many people run it with at least 32 GB of system or unified memory on Apple Silicon. Higher precision needs more.

Yes. It is released under the Apache 2.0 license, so the weights are free to download from Hugging Face and free to run. The license also allows commercial use and modification, so there is no per-token or API fee when you run it yourself.

It runs fully offline once the weights are downloaded. In Atomic Chat the model loads on your own hardware and answers with no internet connection, so your prompts and files stay on-device. That makes it a fit for private or air-gapped work.

Qwen3-30B-A3B ships with thinking mode enabled by default. You can switch it per turn by adding /think or /no_think to your prompt, or set the enable_thinking parameter to False through the chat template or API. Turning it off gives faster, shorter replies for simple chat, while leaving it on helps with reasoning, math, and code.