Qwen3-30B-A3B

Updated

Tools

Thinking

Reasoning

Code

Multilingual

Run

huggingface-cli download Qwen/Qwen3-30B-A3B

from transformers import AutoModel
model = AutoModel.from_pretrained("Qwen/Qwen3-30B-A3B")

More models

View all

Name	Size / Usage	Context	Input
Qwen3.6-35B-A3B		256K	Text, Image
Qwen3.6-27B		256K	Text, Image
WebWorld-8B	Web agents, multimodal reasoning	40K	Text, Image
MiniCPM-V 4.6	5213 GB	421K	Text, Image
anima	421 GB	31K	Text
Qwen3-Coder-30B-A3B-Instruct		256K	Text
Qwen3-14B		128K	Text
Qwen3-32B		128K	Text

At a glance

License: Apache 2.0
Context length: 128K tokens
Languages: Multilingual
Minimum hardware: ~62 GB VRAM
Strengths: fast MoE reasoning with low active params

Overview

Qwen3-30B-A3B is a Mixture-of-Experts (MoE) language model from Qwen, the AI lab at Alibaba. It holds 30.5B total parameters but routes only about 3B of them per token, so it reasons with the breadth of a large model while running at the speed of a much smaller one. It carries a 128K context window and supports tool calling, multilingual text, and a switchable thinking mode for step-by-step reasoning.

In Atomic Chat the appeal is that all of this happens on your own machine. The weights load locally, nothing leaves your computer, and the model answers with no internet connection once it is downloaded. Your prompts and files stay on-device.

What it is good at

The model is built around reasoning, agentic use, and language coverage, which maps to a few concrete jobs:

Coding and debugging — writes functions, explains stack traces, and refactors across files, with the thinking mode helping it work through multi-step logic before answering.
Tool and agent workflows — the tools capability lets it call functions and structured APIs, so it can drive local automations or act as the brain of an agent loop.
Multilingual drafting and translation — Qwen3 covers over 100 languages, which makes it useful for translating, summarizing foreign-language text, and writing content in languages other than English.

Running it locally

With 30.5B total parameters, the model needs the full weight set in memory even though only 3B are active per token. A 4-bit quantization (Q4_K_M) lands around 17 GB, which fits a 24 GB GPU such as an RTX 4090, and people also run it on Apple Silicon with 32 GB or more of unified memory. The 128K context lets you feed it long documents or large code files in one pass.

huggingface-cli download Qwen/Qwen3-30B-A3B

You can load the weights with Transformers or serve them with vLLM. In Atomic Chat the model is a one-click download and run, so you skip the manual setup and start chatting offline.

License

Qwen3-30B-A3B is released under the apache-2.0 license. That permits commercial use, modification, and redistribution, so you can run it in a product or fine-tune it on your own data without a separate usage fee.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

Qwen3-30B-A3B is an open-weight Mixture-of-Experts model from Qwen with 30.5B total parameters and roughly 3B active per token. It supports a 128K context window, tool calling, multilingual text, and a switchable thinking mode for step-by-step reasoning. The A3B in the name refers to the 3B active parameters that give it small-model speed.

Because it is a MoE model, all 30.5B parameters load into memory even though only 3B run per token. A 4-bit quant (Q4_K_M) needs around 17 GB, which fits a 24 GB GPU like the RTX 4090, and many people run it with at least 32 GB of system or unified memory on Apple Silicon. Higher precision needs more.

Yes. It is released under the Apache 2.0 license, so the weights are free to download from Hugging Face and free to run. The license also allows commercial use and modification, so there is no per-token or API fee when you run it yourself.

It runs fully offline once the weights are downloaded. In Atomic Chat the model loads on your own hardware and answers with no internet connection, so your prompts and files stay on-device. That makes it a fit for private or air-gapped work.

Qwen3-30B-A3B ships with thinking mode enabled by default. You can switch it per turn by adding /think or /no_think to your prompt, or set the enable_thinking parameter to False through the chat template or API. Turning it off gives faster, shorter replies for simple chat, while leaving it on helps with reasoning, math, and code.