Qwen3-Coder-30B-A3B-Instruct

Updated
25.06.2026
Tools
Reasoning
Code
Multilingual

huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct
from transformers import AutoModel
model = AutoModel.from_pretrained("Qwen/Qwen3-Coder-30B-A3B-Instruct")

More models

NameSize / UsageContextInput
Qwen3.6-35B-A3B
256KText, Image
Qwen3.6-27B
256KText, Image
WebWorld-8B
Web agents, multimodal reasoning40KText, Image
MiniCPM-V 4.6
5213 GB421KText, Image
anima
421 GB31KText
Qwen3-30B-A3B
128KText
Qwen3-14B
128KText
Qwen3-32B
128KText

At a glance

  • License: Apache 2.0
  • Context length: 256K tokens
  • Languages: Multilingual
  • Minimum hardware: ~62 GB VRAM
  • Strengths: agentic coding and long-context code

Overview

Qwen3-Coder-30B-A3B-Instruct is a coding-focused large language model built by Qwen (Alibaba's model team). It uses a Mixture-of-Experts (MoE) architecture with 30.5B total parameters, but only about 3.3B are active on any given token because the router picks 8 of its 128 experts per forward pass. That design gives it the knowledge of a 30B model while running at the speed and memory cost closer to a much smaller dense one.

In Atomic Chat the model runs fully on your own machine. Weights load locally, inference happens on your CPU or GPU, and nothing about your prompts or code leaves the device. You can keep working with it offline once the download finishes, which suits private repositories and sensitive work where sending code to a cloud API is not an option.

What it is good at

This is an agentic coding model, instruction-tuned for writing, editing, and reasoning over code across many languages. Its real strengths line up with the capabilities it ships with.

  • Agentic coding — it handles multi-step coding tasks and tool calls, with a function-call format designed to drive agent loops in setups like Qwen Code and CLINE.
  • Long-context work — a 256K native window lets it read across a whole repository, follow imports, and answer questions that span many files instead of one snippet at a time.
  • Multilingual code reasoning — it works across mainstream programming languages and natural languages, so it can explain a bug, refactor a function, or write tests in the language you ask in.

Running it locally

The model is 30.5B parameters with a 256K context length. Because the MoE design keeps active parameters near 3.3B, a 4-bit quant (Q4_K_M) fits in roughly 18-22GB of VRAM, and people report 12-15 tokens per second on a modern CPU with 32GB of RAM at that quant. Full FP16 weights need around 67GB, so most local users run a quantized build.

huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct

From there you can load it with Hugging Face Transformers or serve it through vLLM, or skip the setup and open it in Atomic Chat, which downloads and loads the model with one click.

License

Qwen3-Coder-30B-A3B-Instruct is released under the Apache 2.0 license. That permits commercial use, modification, and redistribution, including in closed-source products, as long as you keep the license and attribution notices.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

It is a coding-focused large language model from Qwen (Alibaba) built on a Mixture-of-Experts design, with 30.5B total parameters and about 3.3B active per token. It is instruction-tuned for writing code, editing code, and running agentic coding workflows, and it supports a 256K-token context. In Atomic Chat it runs locally on your own hardware.

At a 4-bit quant (Q4_K_M) the model fits in roughly 18-22GB of VRAM, so a 24GB card like an RTX 4090 runs it comfortably. A Q6_K build wants about 27GB, and full FP16 weights need around 67GB. You can also run it on CPU with 32GB of system RAM, where users report 12-15 tokens per second at 4-bit.

Yes. The model is released under the Apache 2.0 license, which is free for personal and commercial use. You can download the weights from Hugging Face at no cost and run them locally in Atomic Chat without an API key or subscription.

Yes. Once you download the weights, the model runs entirely on your machine with no internet connection required. Prompts and code stay on the device, which is why it works well for private repositories and sensitive projects. Atomic Chat handles the download and then loads the model fully on-device.

It is built for agentic coding, so it is strong at multi-step coding tasks, tool calls, and code editing across many programming languages. The 256K context window lets it reason over large codebases rather than single snippets. Note that it runs in non-thinking mode and does not emit separate reasoning blocks in its output.