Kimi-K2.7-Code

Updated
24.06.2026
Tools
Thinking
Vision
Reasoning
Code

huggingface-cli download moonshotai/Kimi-K2.7-Code
from transformers import AutoModel
model = AutoModel.from_pretrained("moonshotai/Kimi-K2.7-Code")

More models

No items found.
NameSize / UsageContextInput

At a glance

  • License: Other
  • Context length: 256K tokens
  • Languages: Multilingual
  • Minimum hardware: ~592 GB VRAM
  • Strengths: reasoning, coding and on-device inference

Overview

Kimi-K2.7-Code is a coding-focused large language model from Moonshot AI, part of the Kimi K2 family. It is a Mixture-of-Experts model with roughly 1058.6B total parameters, of which only a fraction activate per token, and it handles a 262144-token (256K) context window. The release is purpose-built for software engineering: multi-step code generation, agentic tool use, and reasoning across large codebases.

Because the open weights are published on Hugging Face, you can pull them down and run Kimi-K2.7-Code on your own machine through Atomic Chat. Nothing leaves your hardware, the model works with no internet connection once downloaded, and your code and prompts stay private on-device.

What it is good at

Moonshot tuned this version for end-to-end coding agents, and its capabilities reflect that. It reports +21.8% on Kimi Code Bench v2 over the previous K2.6, uses about 30% fewer thinking tokens, and beats Claude Opus 4.8 on the MCP Mark Verified agent benchmark.

  • Agentic tool calling — it drives multi-step workflows, calling tools and chaining actions to finish a task rather than answering in one shot.
  • Code reasoning and generation — it works across 10+ languages and a full production stack, from backend services and infrastructure to frontend and ML/data engineering.
  • Long-context vision — with vision and a 256K window, it can read large files, screenshots, or diagrams and reason over them in a single pass.

Running it locally

This is a heavy model. The full-precision weights run well past 600GB, so most local setups use a quantized build. A 2-bit quant lands around 350GB and needs roughly that much combined system RAM and VRAM; people run it on a single 24GB GPU with 256GB RAM via CPU offloading at low token rates, or on multi-GPU rigs for faster output. The 262144-token context adds further memory on top during long sessions.

huggingface-cli download moonshotai/Kimi-K2.7-Code

You can serve the weights with vLLM or llama.cpp, or skip the setup and load Kimi-K2.7-Code in Atomic Chat with one click, which handles the download and RAM offloading for you.

License

Kimi-K2.7-Code is released by Moonshot AI under a Modified MIT License (listed here as "other"). It permits free use, modification, and redistribution of the weights, including for commercial projects, with the terms of that license applying.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

Kimi-K2.7-Code is an open-weight coding model from Moonshot AI in the Kimi K2 family. It is a Mixture-of-Experts model with about 1058.6B total parameters and a 262144-token context window, tuned for agentic software engineering tasks like multi-step code generation and tool use. Moonshot reports it beats Claude Opus 4.8 on the MCP Mark Verified agent benchmark.

This is a large model, so plan for a lot of memory. A 2-bit quantized build is around 350GB and needs roughly that much combined system RAM and VRAM. People run it on a single 24GB GPU paired with 256GB of system RAM using CPU offloading at a few tokens per second, while multi-GPU rigs run it faster.

Yes. The weights are published openly on Hugging Face under a Modified MIT License, so you can download and run them at no cost. The license permits free use, modification, and redistribution, including commercial use, under its terms. Running it locally in Atomic Chat is free; the only cost is the hardware to host it.

Yes. Once you download the weights, the model runs fully on-device with no internet connection. Loading it in Atomic Chat keeps every prompt and response on your own machine, so your code stays private. You only need a connection for the initial download.

It is built for coding and agentic work: multi-step code generation, tool calling, and reasoning across large codebases in 10+ languages. It reports +21.8% on Kimi Code Bench v2 over K2.6 while using about 30% fewer thinking tokens. For general writing and conversation, Moonshot recommends the more well-rounded K2.6 instead.