GLM-5.2

Updated

Thinking

Reasoning

Code

huggingface-cli download zai-org/GLM-5.2

from transformers import AutoModel
model = AutoModel.from_pretrained("zai-org/GLM-5.2")

More models

View all

No items found.

Name	Size / Usage	Context	Input

At a glance

License: Mit
Context length: 1M tokens
Languages: en, zh
Minimum hardware: ~421 GB VRAM
Strengths: reasoning, coding and on-device inference

Overview

GLM-5.2 is a 753.3B-parameter open-weight language model from zai-org (Z.ai / Zhipu AI). It uses a mixture-of-experts design, tagged glm_moe_dsa, so only a fraction of those parameters fire on any given token. The model handles English and Chinese, carries a 1,048,576-token context window, and is built for thinking, reasoning, and code.

The local angle is the point of running it through Atomic Chat. The weights are public on Hugging Face, so once they are on your machine the model runs on your own hardware, offline, with nothing sent to an external API. Your prompts and code stay on the device.

What it is good at

GLM-5.2 leans toward agentic coding and long-horizon work. These are the tasks it fits.

Code generation and review — it writes implementations from a spec and reasons across multiple files, and ranks near the top of open-weight coding benchmarks like SWE-bench Pro and Terminal-Bench.
Long-document reasoning — the 1,048,576-token window holds an entire codebase or a long technical document in one pass, so you can ask questions across the whole thing.
Step-by-step problem solving — the thinking capability lets it work through multi-step logic and agentic tasks before committing to an answer.

Running it locally

At 753.3B parameters this is a heavy model. Full BF16 weights need hundreds of gigabytes, so most local users run a quantized build: a 2-bit dynamic quant lands around 241 GB and fits a 256 GB unified-memory Mac or a 24 GB GPU paired with 256 GB of system RAM and MoE offloading. The full 1,048,576-token context is available if your memory can hold the KV cache.

huggingface-cli download zai-org/GLM-5.2

From there you can serve it with vLLM or run a GGUF quant through llama.cpp, or load it in Atomic Chat with one click and start a private offline session.

License

GLM-5.2 ships under the MIT license. That permits commercial use, modification, redistribution, and fine-tuning the weights on your own code or domain data, as long as the license notice stays attached.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

GLM-5.2 is a 753.3B-parameter open-weight model from zai-org (Z.ai), built on a mixture-of-experts architecture and released under the MIT license. It targets coding, reasoning, and agentic tasks, supports English and Chinese, and has a 1,048,576-token context window. The weights are public on Hugging Face, so you can run it yourself instead of through a hosted API.

The full BF16 weights need hundreds of gigabytes of memory, so most people run a quantized build. A 2-bit dynamic quant is around 241 GB and fits a 256 GB unified-memory Mac, or a single 24 GB GPU combined with 256 GB of system RAM using MoE offloading. On consumer hardware with low-bit quants, expect roughly 3 to 9 tokens per second.

Yes. GLM-5.2 is released under the MIT license, which makes the weights free to download, run, modify, and use commercially. Running it locally through Atomic Chat has no per-token cost since the model executes on your own machine. You only pay for the hardware and electricity.

Yes. Once the weights are downloaded, GLM-5.2 runs entirely on your device with no internet connection required. Nothing is sent to an external server, so your prompts and code stay local. Atomic Chat loads the model on-device for this kind of private, offline use.

Its strongest area is coding and agentic engineering, where it ranks among the top open-weight models on benchmarks like SWE-bench Pro and Terminal-Bench. The 1,048,576-token context also makes it good for reasoning over a full codebase or a long technical document. Its thinking capability suits multi-step problems that need the model to work through logic before answering.