DeepSeek-V3-0324

Updated
27.06.2026
Reasoning
Code
Multilingual
Tools

A 671B-parameter Mixture-of-Experts LLM from DeepSeek-AI (37B active) with a 128K context, strong coding and improved function calling.

pip install -U transformers
huggingface-cli download deepseek-ai/DeepSeek-V3-0324
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek-ai/DeepSeek-V3-0324", "messages": [{"role": "user", "content": "Hello"}]}'
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V3-0324", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3-0324", trust_remote_code=True)
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "local" });
const res = await client.chat.completions.create({
  model: "deepseek-ai/DeepSeek-V3-0324",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(res.choices[0].message.content);

More models

NameSize / UsageContextInput
DeepSeek-V4-Pro
1MText
DeepSeek-V4-Flash
1MText
DeepSeek-R1-0528
Reasoning, math, coding128KText
DeepSeek-Coder-V2-Lite-Instruct
Code generation, completion128KText
DeepSeek-V3.2
128KText
DeepSeek-R1
128KText

At a glance

  • License: MIT
  • Context length: 128K tokens
  • Architecture: 671B Mixture-of-Experts, ~37B active per token
  • Minimum hardware: ~700 GB VRAM at FP8; ~130 GB with 1.58-bit quantization
  • Strengths: coding, reasoning, function calling, long-context chat

Overview

DeepSeek-V3-0324 is an open-weight large language model released by DeepSeek-AI in March 2025. It is a Mixture-of-Experts (MoE) model with 671 billion total parameters, of which roughly 37 billion are activated for any given token. The 0324 tag marks a post-training refresh of the original DeepSeek-V3 rather than a new architecture, and it keeps the Multi-head Latent Attention design that compresses the key-value cache to keep inference memory manageable.

What it's good at

Compared with the first DeepSeek-V3 release, this checkpoint posts higher scores on knowledge and reasoning benchmarks, with MMLU-Pro moving from 75.9 to 81.2 and GPQA from 59.1 to 68.4. DeepSeek also reports better front-end and general code generation, steadier multi-turn rewriting, and more dependable function calling, which makes it a practical backbone for coding assistants and tool-using agents. Its 128K token context window handles long documents and large codebases in a single request. It is a general instruction and chat model, not a dedicated chain-of-thought model like DeepSeek-R1.

Running locally

The native weights ship in FP8 and occupy around 700 GB, so full-precision inference needs a multi-GPU server with comparable VRAM. Community quantizations narrow this gap: 4-bit GGUF builds land near 400 GB, and aggressive 1.58-bit dynamic quants from Unsloth drop to roughly 130 GB, runnable across high-RAM workstations or smaller clusters at lower throughput. Common serving paths include vLLM and SGLang for GPU deployment and llama.cpp for quantized CPU or mixed setups.

License

The repository and the model weights are released under the MIT License. That permits commercial use, modification, and redistribution with minimal conditions, so teams can self-host or fine-tune the model without separate licensing terms.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

DeepSeek-V3-0324 is an open-weight large language model released in March 2025 by DeepSeek-AI. It is a Mixture-of-Experts (MoE) model with 671B total parameters, of which about 37B are activated per token. The 0324 checkpoint is a post-training update of DeepSeek-V3 that improves reasoning, coding, and function calling over the original release.

The full FP8 weights are around 700 GB, so running the model unquantized requires a multi-GPU server with roughly 700 GB or more of combined VRAM (for example 8 to 16 high-memory data-center GPUs). Community 4-bit and 1.58-bit GGUF quantizations from projects like Unsloth bring the footprint down to roughly 130 to 400 GB, which can run across high-RAM workstations or smaller GPU clusters at reduced speed. This is not a model that fits on a single consumer card.

Yes. The repository and the model weights are released under the MIT License, which permits commercial use, modification, and redistribution. You can download the weights from Hugging Face at no cost, or access the model through DeepSeek's API and third-party inference providers, which charge per token.

DeepSeek-V3-0324 supports a 128K token context window. That capacity lets it work over long documents, large codebases, and extended multi-turn conversations within a single request.

DeepSeek-V3-0324 is a refreshed post-training checkpoint built on the same architecture as DeepSeek-V3. DeepSeek reports clear benchmark gains, including MMLU-Pro rising from 75.9 to 81.2 and GPQA from 59.1 to 68.4, along with stronger front-end code generation, more reliable multi-turn rewriting, and improved function calling. It is not a separate reasoning model like DeepSeek-R1; it remains a general-purpose chat and instruction model.