Overview

Qwen3.6-27B is a 27.8B-parameter dense model from Qwen, the model team at Alibaba Cloud. The Hugging Face tags mark it as image-text-to-text and part of the qwen3_5 line, so it is natively multimodal: it reads images and text in one pass rather than bolting vision on afterward. It ships with a 262,144-token context window, and Qwen documents extending that toward roughly a million tokens with the right serving setup.

Because it is a dense model instead of a Mixture-of-Experts router, it loads and runs without MoE routing overhead, which makes it predictable on a single GPU. With Atomic Chat you can run all 27.8B parameters on your own machine, fully offline. Prompts, code, and documents stay on your hardware, and nothing is sent to an API.

What it is good at

The capability set covers reasoning, code, vision, tool calling, and embeddings, so a single local copy handles a wide range of work:

Agentic coding — Qwen reports Qwen3.6-27B scoring 77.2 on SWE-bench Verified and 59.3 on Terminal-Bench 2.0, ahead of the older Qwen3.5-397B-A17B flagship, which makes it suited to repository-level edits and frontend work.
Vision and document understanding — the built-in vision encoder reads screenshots, scanned pages with OCR, and long video, so you can ask questions about an image or a PDF without an external service.
Tool calling with thinking — the model supports structured tool calls and a thinking mode, letting it plan multi-step tasks and invoke functions inside an agent loop.

Running it locally

At 27.8B parameters, a 16 GB GPU such as an RTX 4080 or 16 GB of Apple unified memory runs the Q4_K_M quantization at around 16.8 GB. A 24 GB card like the RTX 3090 or 4090 gives headroom for Q6_K and longer slices of the 262,144-token context. Pull the original weights from Hugging Face:

huggingface-cli download Qwen/Qwen3.6-27B

From there you can serve it with Transformers or vLLM, load a GGUF build in LM Studio, or use the one-click download in Atomic Chat to get it running without touching the terminal.

License

Qwen3.6-27B is released under the Apache-2.0 license. That permits commercial use, modification, redistribution, and private deployment with no royalties, so you can build products on top of the model and run it on your own hardware without a usage fee.

Frequently asked questions

Qwen3.6-27B is a 27.8B-parameter dense, multimodal language model from Qwen (Alibaba Cloud). It handles text, images, and video, supports reasoning and tool calling, and carries a 262,144-token context window. The weights are open under Apache-2.0, so it can run locally in Atomic Chat without an API.

The Q4_K_M quantization needs about 16.8 GB, so a 16 GB GPU like an RTX 4080 or 16 GB of Apple unified memory is the practical floor. A 24 GB card such as an RTX 3090 or RTX 4090 gives room for the higher-quality Q6_K build and longer context. Lower quants like Q3_K_M drop the requirement to around 12 GB if memory is tight.

Yes. Qwen3.6-27B is published under the Apache-2.0 license, which allows free use including commercial deployment, with no royalties or usage fees. You can download the weights from Hugging Face and run them on your own hardware at no cost.

Yes. Once the weights are downloaded, the model runs entirely on your machine with no network connection. In Atomic Chat your prompts, code, and files never leave your device, which suits private or air-gapped work.

Its strongest area is agentic coding, where Qwen reports 77.2 on SWE-bench Verified, ahead of the larger Qwen3.5-397B-A17B model. It is also natively multimodal, reading images, OCR documents, and long video, and it supports tool calling plus a thinking mode for multi-step agent tasks.

Qwen3.6-27B

More models

At a glance

Overview

What it is good at

Running it locally

License

Frequently asked questions