Name

Size / Usage

Context

Input

Qwen3.6-27B

256K

Text, Image

WebWorld-8B

Web agents, multimodal reasoning

40K

Text, Image

MiniCPM-V 4.6

5213 GB

421K

Text, Image

anima

421 GB

31K

Text

Qwen3-Coder-30B-A3B-Instruct

256K

Text

Qwen3-30B-A3B

128K

Text

Qwen3-14B

128K

Text

Qwen3-32B

128K

Text

Overview

Qwen3.6-35B-A3B is an open-weight language model from Qwen, the AI team at Alibaba. The "A3B" in the name is the giveaway to its architecture: it is a Mixture-of-Experts (MoE) model with 36B total parameters but only about 3B active per token. The router picks a small set of expert networks for each token, so you get the knowledge capacity of a large model while paying the compute cost of a much smaller one. It handles text and images, and ships under Apache-2.0.

The model is built to run on your own machine. With Atomic Chat, the weights and every prompt stay on-device, so the model works offline once downloaded and nothing is sent to a server. That makes it a fit for private notes, internal documents, or any work you would rather not push to a hosted API.

What it is good at

Qwen3.6-35B-A3B carries the full Qwen capability set, including a toggleable thinking mode, vision, tool calling, and embeddings. A few concrete uses:

Long-document work — the 262,144-token context window holds a large codebase, a long contract, or hundreds of pages of notes in a single prompt for summarizing, searching, or cross-referencing.
Agentic and coding tasks — native tool calling and a reasoning/thinking mode let it plan multi-step jobs, call functions, and write or debug code without a hosted backend.
Multimodal and multilingual input — vision support reads screenshots, charts, and scanned pages, and its multilingual training handles prompts and output across many languages.

Running it locally

The model is 36B parameters total. At Q4_K_M quantization the weights are roughly 21 GB, so a 24 GB GPU (RTX 3090, 4090, or 5090) or a Mac with 32 GB or more unified memory runs it comfortably. The full 262,144-token context grows the KV cache by tens of GB, so trim the context if you are tight on memory. Pull the weights from Hugging Face:

huggingface-cli download Qwen/Qwen3.6-35B-A3B

You can serve it with Transformers or vLLM, or skip the setup entirely and load it in Atomic Chat with one click, which downloads a quantized build and runs it on-device.

License

Qwen3.6-35B-A3B is released under the Apache-2.0 license. That permits commercial use, modification, and redistribution, including fine-tuning the weights and shipping the model inside your own products, as long as you keep the license and attribution notices.

Frequently asked questions

It is an open-weight Mixture-of-Experts language model from Qwen (Alibaba's AI team), with 36B total parameters and about 3B active per token. It supports text, vision, tool calling, a thinking mode, and a 262,144-token context window, all under the Apache-2.0 license. In Atomic Chat it runs fully on your own hardware.

At Q4_K_M quantization the weights are about 21 GB, so a 24 GB GPU such as an RTX 3090, 4090, or 5090, or a Mac with 32 GB or more unified memory, runs it well. The Q8_0 build is closer to 37 GB and needs 48 GB-class hardware. 16 GB cards are not enough even at Q4 unless you use aggressive offloading.

Yes. The model is released under the Apache-2.0 license, which allows free use, modification, and commercial deployment. Running it locally through Atomic Chat means there are no API fees or per-token costs, only your own hardware and electricity.

Yes. Once the weights are downloaded, the model runs entirely on-device with no internet connection required. Every prompt and response stays on your machine, so nothing is sent to Alibaba or any other server, which is the main reason to run it in Atomic Chat.

The simplest route is Atomic Chat: open the app, find Qwen3.6-35B-A3B in the model list, and click to download and load a quantized build. If you prefer the command line, pull the weights with huggingface-cli download Qwen/Qwen3.6-35B-A3B and serve them through Transformers, vLLM, or llama.cpp.

Qwen3.6-35B-A3B

More models

At a glance

Overview

What it is good at

Running it locally

License

Frequently asked questions