DeepSeek-R1-0528

Updated
27.06.2026
Thinking
Reasoning
Code
Tools

A 671B-parameter MoE reasoning model from DeepSeek with 37B active params, MIT-licensed, strong at math, code, and long chain-of-thought.

pip install -U transformers
huggingface-cli download deepseek-ai/DeepSeek-R1-0528
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-0528",
    "messages": [{"role": "user", "content": "Prove 1+2+...+n = n(n+1)/2."}],
    "temperature": 0.6
  }'
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-0528", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-0528")
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "local" });
const res = await client.chat.completions.create({
  model: "deepseek-ai/DeepSeek-R1-0528",
  messages: [{ role: "user", content: "Solve: integral of x^2 dx." }],
  temperature: 0.6,
});
console.log(res.choices[0].message.content);

More models

NameSize / UsageContextInput
DeepSeek-V4-Pro
1MText
DeepSeek-V4-Flash
1MText
DeepSeek-V3-0324
General chat, coding, agents128KText
DeepSeek-Coder-V2-Lite-Instruct
Code generation, completion128KText
DeepSeek-V3.2
128KText
DeepSeek-R1
128KText

At a glance

  • License: MIT
  • Context length: 128K tokens
  • Architecture: DeepSeek-V3 MoE, 671B total / 37B active
  • Minimum hardware: ~700 GB (FP8) full; ~150-250 GB at 1.5-2 bit quant
  • Strengths: math, coding, long-form reasoning, function calling

Overview

DeepSeek-R1-0528 is the May 28, 2025 refresh of DeepSeek AI's R1 reasoning model. It is built on the DeepSeek-V3 mixture-of-experts architecture, with 671B total parameters and 37B activated per token, and ships under the MIT License. Rather than a new base model, this release is a post-training upgrade that uses more compute and tuned algorithms to push reasoning quality close to proprietary systems like OpenAI o3 and Gemini 2.5 Pro.

What it's good at

The model is tuned for hard reasoning. On AIME 2025 it scores 87.5% pass@1, up from 70% in the original R1, and its Codeforces rating climbs from 1530 to 1930. Coding benchmarks improve too: LiveCodeBench rises to 73.3 and SWE-Verified to 57.6. The 0528 update also reduces hallucination, supports system prompts, and adds stronger function calling, which makes it more usable for agentic and tool-driven workflows. The gains come from longer thinking, so it averages about 23K tokens per AIME question versus 12K before.

Running locally

This is a large model. The native FP8 weights are around 700 GB, so the full version expects a multi-GPU server or a high-RAM host. Community dynamic GGUF quants from teams like Unsloth bring it down to roughly 150-250 GB at 1.5-2 bit, which lets it run on CPU with enough system memory or a workstation with several cards. You can serve it with vLLM or SGLang, or run quantized builds through llama.cpp and Ollama. For single-GPU setups, DeepSeek also released a distilled DeepSeek-R1-0528-Qwen3-8B that runs like an 8B model.

License

DeepSeek-R1-0528 is released under the MIT License. That permits commercial use, modification, and redistribution, and DeepSeek explicitly allows distillation of the model's outputs to train other models. There is no separate acceptable-use addendum beyond standard MIT terms.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

DeepSeek-R1-0528 is a May 28, 2025 update to DeepSeek AI's R1 reasoning model. It uses the DeepSeek-V3 mixture-of-experts architecture with 671B total parameters and 37B active per token. The update deepened the model's chain-of-thought reasoning, raising AIME 2025 accuracy from 70% to 87.5%, and added support for system prompts and function calling.

The full 671B model in its native FP8 weights needs roughly 700 GB of memory, so it targets multi-GPU servers or high-memory hosts. Quantized GGUF builds shrink this considerably: community 1.5-2 bit dynamic quants from Unsloth bring it into the 150-250 GB range, runnable on a CPU with enough system RAM or a workstation with several GPUs. For a single consumer card, the distilled DeepSeek-R1-0528-Qwen3-8B is the practical option.

Yes. The model weights are published on Hugging Face under the MIT License, which permits commercial use, modification, and distillation. You can download and self-host it at no cost, or call it through DeepSeek's OpenAI-compatible API and third-party providers. The MIT terms explicitly allow training other models on its outputs.

DeepSeek-R1-0528 supports a 128K-token context window, inherited from the DeepSeek-V3 architecture it is built on. Because it is a reasoning model that produces long chains of thought, it can spend a large share of that budget on internal thinking. DeepSeek caps maximum generation length at 64K tokens in its own evaluations.

DeepSeek-R1-0528 is a post-training refresh of the same base model, so it keeps the 671B parameter count but thinks harder. On AIME 2025 it jumps from 70% to 87.5%, on LiveCodeBench from 63.5 to 73.3, and its Codeforces rating climbs from 1530 to 1930. The trade-off is longer outputs: it averages about 23K tokens per AIME question versus 12K before. It also lowers hallucination and improves function calling.