Overview
Qwen2.5-72B-Instruct is the largest instruction-tuned model in Alibaba Cloud's Qwen2.5 series, released in September 2024 by the Qwen team. It has 72.7 billion parameters (70.0B excluding embeddings) spread across 80 transformer layers, and uses a Qwen2 architecture with RoPE position embeddings, SwiGLU activations, RMSNorm, grouped-query attention (64 query heads, 8 key/value heads), and QKV bias. The model is the chat-tuned variant of the Qwen2.5-72B base model and is meant for direct deployment as an assistant.
What it's good at
Compared with Qwen2, this release adds noticeably more knowledge and sharper coding and mathematics, which the Qwen team credits to specialized expert models used during training. It follows instructions more reliably, writes long outputs past 8K tokens, reads structured data like tables, and produces clean structured output such as JSON. It handles function calling through a tool-call template, which suits agentic and API-driven workflows. Multilingual coverage spans more than 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Russian, Japanese, Korean, and Arabic.
Running locally
The weights run with Hugging Face transformers (4.37.0 or newer). At full BF16 precision the 72B weights need around 145 GB, so unquantized inference typically uses two or more high-memory GPUs; vLLM with tensor parallelism is the recommended serving path. With 4-bit quantization (GPTQ, AWQ, or GGUF via llama.cpp and Ollama) memory drops to roughly 45-48 GB, which fits a single 48 GB card. The default context is 32,768 tokens; reaching the full 131,072-token window requires enabling YaRN rope scaling, which Qwen suggests turning on only for genuinely long inputs.
License
Qwen2.5-72B-Instruct is distributed under the Qwen License, not Apache 2.0. The weights are free to download and the license permits commercial use, but products with more than 100 million monthly active users must obtain a separate license from Alibaba Cloud. Review the license text before shipping the model in a large-scale product.


