DeepSeek-Coder-V2-Lite-Instruct

Updated

Code

Reasoning

Tools

A 16B Mixture-of-Experts code model from DeepSeek AI with 2.4B active params, 128K context, and support for 338 programming languages.

pip install -U transformers
huggingface-cli download deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
    "messages": [{"role": "user", "content": "Write a quicksort in Python."}]
  }'

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
messages = [{"role": "user", "content": "Write a quicksort in Python."}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][len(inputs[0]):], skip_special_tokens=True))

import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "not-needed" });
const res = await client.chat.completions.create({
  model: "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
  messages: [{ role: "user", content: "Write a quicksort in Python." }],
});
console.log(res.choices[0].message.content);

More models

View all

Name	Size / Usage	Context	Input
DeepSeek-V4-Pro		1M	Text
DeepSeek-V4-Flash		1M	Text
DeepSeek-R1-0528	Reasoning, math, coding	128K	Text
DeepSeek-V3-0324	General chat, coding, agents	128K	Text
DeepSeek-V3.2		128K	Text
DeepSeek-R1		128K	Text

At a glance

License: DeepSeek License (commercial use allowed)
Context length: 128K tokens
Architecture: Mixture-of-Experts, 16B total / 2.4B active
Languages: 338 programming languages
Minimum hardware: ~10-12 GB VRAM at 4-bit
Strengths: code completion, generation, FIM insertion, math reasoning

Overview

DeepSeek-Coder-V2-Lite-Instruct is the smaller member of the DeepSeek-Coder-V2 family, released by DeepSeek AI in mid-2024. It is a Mixture-of-Experts model with 16B total parameters, of which only 2.4B are active for any given token. DeepSeek built it by continuing the pretraining of an intermediate DeepSeek-V2 checkpoint on an additional 6 trillion tokens, heavily weighted toward source code and math. The Instruct variant is the chat-tuned version meant for interactive coding help; a Base variant ships alongside it for raw completion.

What it's good at

The model is built for programming. It supports 338 programming languages and handles code completion, fill-in-the-middle insertion, generation, and debugging. On HumanEval it scores around 81% pass@1, and it does well on MBPP and math benchmarks such as GSM8K thanks to the math-heavy pretraining. The 128K context window means it can read large files or several files at once, which helps with repository-level questions. The much larger 236B sibling reaches GPT-4-Turbo-level results on code tasks; the Lite model gives up some of that accuracy in exchange for running on modest hardware.

Running locally

The MoE design keeps inference cheap. Full BF16 weights need about 32 GB of memory, but 4-bit GGUF quants drop that to roughly 10-12 GB, so the Lite model runs on a single 16 GB GPU or an Apple Silicon Mac with enough unified memory. You can serve it with Hugging Face Transformers (set trust_remote_code=True), vLLM for higher throughput, or llama.cpp and Ollama using community GGUF builds. The chat template uses User:/Assistant: turns with DeepSeek's special sentence tokens.

License

The code repository is MIT-licensed, and the weights fall under the DeepSeek Model License. That license permits commercial use, so teams can deploy the model in products, subject to the terms in the agreement. Review the model license before shipping it commercially.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

DeepSeek-Coder-V2-Lite-Instruct is an open-source code language model from DeepSeek AI. It uses a Mixture-of-Experts (MoE) design with 16B total parameters but only 2.4B active per token, and it was continued-pretrained from DeepSeek-V2 on an extra 6 trillion tokens. The instruct variant is tuned for chat-style coding assistance and supports a 128K context window.

Because only 2.4B of the 16B parameters are active per token, the Lite model runs far lighter than its size suggests. In BF16 it needs roughly 32 GB of memory, but 4-bit GGUF quants bring it down to around 10-12 GB, so it fits on a single 16 GB GPU or a Mac with enough unified memory. The larger 236B DeepSeek-Coder-V2 model, by contrast, requires 8x80 GB GPUs for BF16 inference.

Yes. The weights are published on Hugging Face and can be downloaded at no cost. The code repository is MIT-licensed, and the model weights are released under the DeepSeek Model License, which explicitly permits commercial use. You should review the model license terms before deploying it in a product.

DeepSeek-Coder-V2 supports 338 programming languages, up from 86 in the original DeepSeek-Coder. The same training applies to the Lite variant. It handles code completion, code insertion (fill-in-the-middle), generation, and debugging across that language set, and the 128K context lets it work over large files and repositories.

For its small active footprint it is strong. The Lite-Instruct model scores around 81% pass@1 on HumanEval and performs well on MBPP and math benchmarks like GSM8K. The full 236B DeepSeek-Coder-V2 reaches performance comparable to GPT-4-Turbo on code-specific tasks, while the Lite model trades some of that accuracy for the ability to run on a single consumer GPU.