DeepSeek-R1-0528

Updated

Thinking

Reasoning

Code

Tools

Run

A 671B-parameter MoE reasoning model from DeepSeek with 37B active params, MIT-licensed, strong at math, code, and long chain-of-thought.

pip install -U transformers
huggingface-cli download deepseek-ai/DeepSeek-R1-0528

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-0528",
    "messages": [{"role": "user", "content": "Prove 1+2+...+n = n(n+1)/2."}],
    "temperature": 0.6
  }'

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-0528", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-0528")

import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "local" });
const res = await client.chat.completions.create({
  model: "deepseek-ai/DeepSeek-R1-0528",
  messages: [{ role: "user", content: "Solve: integral of x^2 dx." }],
  temperature: 0.6,
});
console.log(res.choices[0].message.content);

More models

View all

Name	Size / Usage	Context	Input
DeepSeek-V4-Pro		1M	Text
DeepSeek-V4-Flash		1M	Text
DeepSeek-V3-0324	General chat, coding, agents	128K	Text
DeepSeek-Coder-V2-Lite-Instruct	Code generation, completion	128K	Text
DeepSeek-V3.2		128K	Text
DeepSeek-R1		128K	Text

At a glance

License: MIT
Context length: 128K tokens
Architecture: DeepSeek-V3 MoE, 671B total / 37B active
Minimum hardware: ~700 GB (FP8) full; ~150-250 GB at 1.5-2 bit quant
Strengths: math, coding, long-form reasoning, function calling

Overview

DeepSeek-R1-0528 is the May 28, 2025 refresh of DeepSeek AI's R1 reasoning model. It is built on the DeepSeek-V3 mixture-of-experts architecture, with 671B total parameters and 37B activated per token, and ships under the MIT License. Rather than a new base model, this release is a post-training upgrade that uses more compute and tuned algorithms to push reasoning quality close to proprietary systems like OpenAI o3 and Gemini 2.5 Pro.

What it's good at

The model is tuned for hard reasoning. On AIME 2025 it scores 87.5% pass@1, up from 70% in the original R1, and its Codeforces rating climbs from 1530 to 1930. Coding benchmarks improve too: LiveCodeBench rises to 73.3 and SWE-Verified to 57.6. The 0528 update also reduces hallucination, supports system prompts, and adds stronger function calling, which makes it more usable for agentic and tool-driven workflows. The gains come from longer thinking, so it averages about 23K tokens per AIME question versus 12K before.

Running locally

This is a large model. The native FP8 weights are around 700 GB, so the full version expects a multi-GPU server or a high-RAM host. Community dynamic GGUF quants from teams like Unsloth bring it down to roughly 150-250 GB at 1.5-2 bit, which lets it run on CPU with enough system memory or a workstation with several cards. You can serve it with vLLM or SGLang, or run quantized builds through llama.cpp and Ollama. For single-GPU setups, DeepSeek also released a distilled DeepSeek-R1-0528-Qwen3-8B that runs like an 8B model.

License

DeepSeek-R1-0528 is released under the MIT License. That permits commercial use, modification, and redistribution, and DeepSeek explicitly allows distillation of the model's outputs to train other models. There is no separate acceptable-use addendum beyond standard MIT terms.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

DeepSeek-R1-0528 is a May 28, 2025 update to DeepSeek AI's R1 reasoning model. It uses the DeepSeek-V3 mixture-of-experts architecture with 671B total parameters and 37B active per token. The update deepened the model's chain-of-thought reasoning, raising AIME 2025 accuracy from 70% to 87.5%, and added support for system prompts and function calling.

The full 671B model in its native FP8 weights needs roughly 700 GB of memory, so it targets multi-GPU servers or high-memory hosts. Quantized GGUF builds shrink this considerably: community 1.5-2 bit dynamic quants from Unsloth bring it into the 150-250 GB range, runnable on a CPU with enough system RAM or a workstation with several GPUs. For a single consumer card, the distilled DeepSeek-R1-0528-Qwen3-8B is the practical option.

Yes. The model weights are published on Hugging Face under the MIT License, which permits commercial use, modification, and distillation. You can download and self-host it at no cost, or call it through DeepSeek's OpenAI-compatible API and third-party providers. The MIT terms explicitly allow training other models on its outputs.

DeepSeek-R1-0528 supports a 128K-token context window, inherited from the DeepSeek-V3 architecture it is built on. Because it is a reasoning model that produces long chains of thought, it can spend a large share of that budget on internal thinking. DeepSeek caps maximum generation length at 64K tokens in its own evaluations.

DeepSeek-R1-0528 is a post-training refresh of the same base model, so it keeps the 671B parameter count but thinks harder. On AIME 2025 it jumps from 70% to 87.5%, on LiveCodeBench from 63.5 to 73.3, and its Codeforces rating climbs from 1530 to 1930. The trade-off is longer outputs: it averages about 23K tokens per AIME question versus 12K before. It also lowers hallucination and improves function calling.