Qwen3.6-35B-A3B

Updated
24.06.2026
Tools
Thinking
Embedding
Vision
Reasoning
Code
Multilingual

huggingface-cli download Qwen/Qwen3.6-35B-A3B
from transformers import AutoModel
model = AutoModel.from_pretrained("Qwen/Qwen3.6-35B-A3B")

More models

NameSize / UsageContextInput
Qwen3.6-27B
256KText, Image
WebWorld-8B
Web agents, multimodal reasoning40KText, Image
MiniCPM-V 4.6
5213 GB421KText, Image
anima
421 GB31KText
Qwen3-Coder-30B-A3B-Instruct
256KText
Qwen3-30B-A3B
128KText
Qwen3-14B
128KText
Qwen3-32B
128KText

At a glance

  • License: Apache 2.0
  • Context length: 256K tokens
  • Languages: Multilingual
  • Minimum hardware: ~21 GB VRAM
  • Strengths: reasoning, coding and on-device inference

Overview

Qwen3.6-35B-A3B is an open-weight language model from Qwen, the AI team at Alibaba. The "A3B" in the name is the giveaway to its architecture: it is a Mixture-of-Experts (MoE) model with 36B total parameters but only about 3B active per token. The router picks a small set of expert networks for each token, so you get the knowledge capacity of a large model while paying the compute cost of a much smaller one. It handles text and images, and ships under Apache-2.0.

The model is built to run on your own machine. With Atomic Chat, the weights and every prompt stay on-device, so the model works offline once downloaded and nothing is sent to a server. That makes it a fit for private notes, internal documents, or any work you would rather not push to a hosted API.

What it is good at

Qwen3.6-35B-A3B carries the full Qwen capability set, including a toggleable thinking mode, vision, tool calling, and embeddings. A few concrete uses:

  • Long-document work — the 262,144-token context window holds a large codebase, a long contract, or hundreds of pages of notes in a single prompt for summarizing, searching, or cross-referencing.
  • Agentic and coding tasks — native tool calling and a reasoning/thinking mode let it plan multi-step jobs, call functions, and write or debug code without a hosted backend.
  • Multimodal and multilingual inputvision support reads screenshots, charts, and scanned pages, and its multilingual training handles prompts and output across many languages.

Running it locally

The model is 36B parameters total. At Q4_K_M quantization the weights are roughly 21 GB, so a 24 GB GPU (RTX 3090, 4090, or 5090) or a Mac with 32 GB or more unified memory runs it comfortably. The full 262,144-token context grows the KV cache by tens of GB, so trim the context if you are tight on memory. Pull the weights from Hugging Face:

huggingface-cli download Qwen/Qwen3.6-35B-A3B

You can serve it with Transformers or vLLM, or skip the setup entirely and load it in Atomic Chat with one click, which downloads a quantized build and runs it on-device.

License

Qwen3.6-35B-A3B is released under the Apache-2.0 license. That permits commercial use, modification, and redistribution, including fine-tuning the weights and shipping the model inside your own products, as long as you keep the license and attribution notices.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

It is an open-weight Mixture-of-Experts language model from Qwen (Alibaba's AI team), with 36B total parameters and about 3B active per token. It supports text, vision, tool calling, a thinking mode, and a 262,144-token context window, all under the Apache-2.0 license. In Atomic Chat it runs fully on your own hardware.

At Q4_K_M quantization the weights are about 21 GB, so a 24 GB GPU such as an RTX 3090, 4090, or 5090, or a Mac with 32 GB or more unified memory, runs it well. The Q8_0 build is closer to 37 GB and needs 48 GB-class hardware. 16 GB cards are not enough even at Q4 unless you use aggressive offloading.

Yes. The model is released under the Apache-2.0 license, which allows free use, modification, and commercial deployment. Running it locally through Atomic Chat means there are no API fees or per-token costs, only your own hardware and electricity.

Yes. Once the weights are downloaded, the model runs entirely on-device with no internet connection required. Every prompt and response stays on your machine, so nothing is sent to Alibaba or any other server, which is the main reason to run it in Atomic Chat.

The simplest route is Atomic Chat: open the app, find Qwen3.6-35B-A3B in the model list, and click to download and load a quantized build. If you prefer the command line, pull the weights with huggingface-cli download Qwen/Qwen3.6-35B-A3B and serve them through Transformers, vLLM, or llama.cpp.