Overview

MiMo-V2.5-Pro is Xiaomi's (XiaomiMiMo) flagship open-weight language model, built as a Mixture-of-Experts (MoE) network with about 1023.2B total parameters and roughly 42B active per token. It uses a hybrid attention design that interleaves Sliding Window Attention and Global Attention at a 6:1 ratio, plus Multi-Token Prediction, which keeps the KV cache small enough to sustain a 1,048,576-token context window. The weights are public on Hugging Face.

Atomic Chat runs MiMo-V2.5-Pro on your own machine. Nothing is sent to a server, the model works without a network connection, and your prompts stay on local disk. That makes it a fit for code, documents, and agent runs you would rather not hand to a hosted API.

What it is good at

MiMo-V2.5-Pro was trained for long, multi-step work rather than one-shot chat. Its capability tags point to a few clear strengths.

Agentic tool use — with native tool_calling, it can drive long task chains spanning many tool calls while staying coherent across the run.
Software engineering — strong code generation and editing, the area Xiaomi positioned it for alongside complex repo-level tasks.
Long-context reasoning — the thinking and reasoning capabilities pair with the 1M-token window to work over large codebases or long document sets in one pass. It handles English and Chinese.

Running it locally

This is a large model. At 1023.2B total parameters the full weights need a multi-GPU server (80GB-class cards such as H100, or several RTX 4090s) to hold the model and a meaningful slice of the 1,048,576-token context; quantized builds lower that bar but still demand serious memory. Pull the weights from Hugging Face:

huggingface-cli download XiaomiMiMo/MiMo-V2.5-Pro

From there you can serve it with vLLM or SGLang, load it through Transformers, or open it in Atomic Chat with one click once the weights are on disk.

License

MiMo-V2.5-Pro is released under the MIT license. You can use, modify, redistribute, and build commercial products on it, as long as the copyright and license notice travel with the code.

Frequently asked questions

MiMo-V2.5-Pro is an open-weight Mixture-of-Experts language model from Xiaomi (XiaomiMiMo), with about 1023.2B total parameters and roughly 42B active per token. It was built for agentic tool use, software engineering, and long-horizon reasoning, with a 1,048,576-token context window. The weights are public on Hugging Face under the MIT license.

The full model is large, so a single consumer GPU is not enough. Running the full weights realistically calls for a multi-GPU setup with 80GB-class cards (such as 2x H100) or several RTX 4090s. Quantized builds cut the memory needed, but this remains a heavy model compared to small local LLMs.

Yes. The weights are released under the MIT license, so downloading and running the model yourself costs nothing in license fees. Your only cost is the hardware or electricity to run it. The MIT license also permits commercial use and modification.

Yes. After you download the weights from Hugging Face once, all inference happens on your own machine with no network connection required. In Atomic Chat the model loads and runs entirely on-device, so prompts and outputs never leave your computer.

It is aimed at long, multi-step agent workflows, coding tasks, and reasoning over large inputs. It supports native tool calling and can sustain task chains across many tool calls, and the 1M-token context lets it work over big codebases or document sets in one pass. It handles both English and Chinese.

MiMo-V2.5-Pro

More models

At a glance

Overview

What it is good at

Running it locally

License

Frequently asked questions