Kimi-K2-Instruct-0905

Updated
25.06.2026
Tools
Reasoning
Code

huggingface-cli download moonshotai/Kimi-K2-Instruct-0905
from transformers import AutoModel
model = AutoModel.from_pretrained("moonshotai/Kimi-K2-Instruct-0905")

More models

NameSize / UsageContextInput
Kimi-K2.7-Code
256KText, Image

At a glance

  • License: Other
  • Context length: 256K tokens
  • Languages: Multilingual
  • Minimum hardware: ~560 GB VRAM
  • Strengths: agentic coding and very long context

Overview

Kimi-K2-Instruct-0905 is a large language model from Moonshot AI, the September 2025 refresh of the Kimi K2 line. It uses a mixture-of-experts (MoE) design with about 1 trillion total parameters and roughly 32 billion active per token, so only a slice of the network fires on each request. The instruct variant is tuned for chat, tool use, and agentic coding rather than raw pretraining.

In Atomic Chat the appeal is keeping all of that on your own machine. Once the weights are downloaded the model runs on-device, with no request leaving your hardware and no account or connection required to use it. Your prompts, code, and documents stay local.

What it is good at

The model carries the code, tools, and reasoning capabilities, and its agentic and long-context tags point at where it earns its keep:

  • Agentic coding — it can plan a multi-step change, call tools, and work through a task across many turns, which is the use case Moonshot AI pushed hardest in the 0905 update, including frontend work.
  • Tool calling — you pass a list of available functions and the model decides when and how to invoke them, so it slots into agent loops and local automation.
  • Long-context work — the 256K-token window lets it hold a large codebase, a long transcript, or several documents in one session without losing the thread.

Running it locally

This is a server-grade model. At 1026.5B total parameters even quantized builds are heavy: community GGUF quants land around 250GB+ of combined system RAM and VRAM, and higher-precision versions need far more. The 256K context also adds memory on top of the weights, so plan for a workstation with a lot of RAM rather than a single consumer GPU.

huggingface-cli download moonshotai/Kimi-K2-Instruct-0905

From there you can serve the weights through an inference engine like vLLM or SGLang, or load a quantized build through Atomic Chat to run it on-device without wiring up a server yourself.

License

Kimi-K2-Instruct-0905 ships under a custom license (listed as "other"), which Moonshot AI describes as a Modified MIT License. The weights are openly available to download and run, including local and commercial use, with the modified terms attached by the publisher — check the license text on the model's Hugging Face page before deploying at scale.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

It is the September 2025 release of Moonshot AI's Kimi K2 instruct model, a mixture-of-experts LLM with about 1 trillion total parameters and roughly 32 billion active per token. It is tuned for chat, tool calling, and agentic coding, and supports a 256K-token context window. The 0905 update focused on stronger coding-agent performance and better frontend code generation.

It is demanding. Quantized GGUF builds need roughly 250GB or more of combined system RAM and VRAM, and higher-precision versions need much more, so it targets workstations and servers rather than a typical laptop. A common rule is that your RAM plus VRAM should about match the quant size; if it falls short the model still runs but slows down as layers offload to disk.

Yes. The weights are openly published on Hugging Face and free to download, under a Modified MIT License that Moonshot AI lists as a custom ("other") license. Running it locally costs only your own hardware and electricity, with no per-token API fees.

Yes, once you have downloaded the weights it runs fully offline with no internet connection. In Atomic Chat the model executes on-device, so your prompts and data never leave your machine. The only step that needs a connection is the initial download of the model files.

It is strongest at agentic coding, where it plans and carries out multi-step programming tasks while calling tools, including frontend work that improved in the 0905 release. The 256K context also makes it well suited to long sessions over large codebases or document sets. Its tool-calling support lets it drive local agents and automations.