MiniMax-M2.7

Updated
25.06.2026
Thinking
Tools
Reasoning
Code

huggingface-cli download MiniMaxAI/MiniMax-M2.7
from transformers import AutoModel
model = AutoModel.from_pretrained("MiniMaxAI/MiniMax-M2.7")

More models

NameSize / UsageContextInput
MiniMax-M3
1MText, Image

At a glance

  • License: Other
  • Context length: 200K tokens
  • Languages: Multilingual
  • Minimum hardware: ~130 GB VRAM
  • Strengths: agentic coding and long-context reasoning

Overview

MiniMax-M2.7 is a 228.7B-parameter Mixture-of-Experts (MoE) language model from MiniMaxAI, released as open weights on Hugging Face. The MoE design activates only a fraction of those parameters per token (around 10B), so it reasons like a large model while keeping inference closer to the cost of a much smaller one. It was built for agentic and coding work, with native support for tool calls, long-horizon planning, and a 200K-token context window.

On atomic.chat the appeal is local control. Atomic Chat is a free, open-source desktop app that runs open-weight models fully on your own hardware, so MiniMax-M2.7 executes on-device. Your prompts, code, and files never leave the machine, and once the weights are downloaded the model works offline.

What it is good at

The model's published capabilities are thinking, reasoning, tool use, and code. Those map to a few concrete jobs:

  • Agentic coding loops — handles multi-file edits and code-run-fix cycles, calling shell, test runners, and other tools to validate its own changes before finishing.
  • Long-context reasoning — the 200K window lets it hold a large repository, a long spec, or an extended chat history in view while it works through a problem step by step.
  • Tool-driven workflows — it plans and chains calls across tools like browsers, retrieval, and code execution, using <think> tags to keep its internal reasoning separate from the final answer across multiple turns.

Running it locally

At 228.7B parameters this is a heavy model. The full bf16 weights are roughly 457GB, and Unsloth's dynamic 4-bit GGUF brings that down to about 108GB, which fits on a machine with 128GB of RAM. A realistic local setup is 128GB or more of combined memory (VRAM plus system RAM for offloading); a Mac Studio needs at least 128GB unified memory for the MLX build. Pull the weights from Hugging Face:

huggingface-cli download MiniMaxAI/MiniMax-M2.7

From there you can serve it with vLLM or SGLang, load it through Transformers, or open it in Atomic Chat with a one-click load that runs the model on-device.

License

MiniMax-M2.7 ships under an "other" license. It permits non-commercial use, but commercial use requires prior written authorization from MiniMaxAI, and any commercial deployment must show prominent "Built with MiniMax M2.7" attribution. Read the license on the Hugging Face model page before using it in a paid product.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

MiniMax-M2.7 is an open-weight 228.7B-parameter Mixture-of-Experts model from MiniMaxAI, built for coding and agentic workflows. It activates only about 10B parameters per token, supports tool calls and a 200K-token context, and uses <think> tags to separate its reasoning from its final output. You can run it locally in Atomic Chat instead of through a hosted API.

Plan for at least 128GB of combined memory (VRAM plus system RAM for offloading). The full bf16 weights are around 457GB, while a 4-bit GGUF quant drops to roughly 108GB and fits on a 128GB-RAM machine. On Apple Silicon, the MLX build needs a Mac Studio with 128GB or more of unified memory.

The weights are free to download from Hugging Face and free for personal, non-commercial use. Commercial use is different: the license requires prior written authorization from MiniMaxAI and a visible "Built with MiniMax M2.7" attribution. Running it in Atomic Chat on your own machine costs nothing beyond your hardware.

Yes. Once you download the weights, the model runs entirely on your hardware with no internet connection required. In Atomic Chat the model executes on-device, so your prompts, code, and files stay local and are never sent to an external server.

It is built for agentic coding: multi-file edits, code-run-fix loops, and test-validated repairs, plus planning and tool use across shell, browser, retrieval, and code runners. The 200K context window suits large codebases and long documents. Its published capabilities are thinking, reasoning, tool use, and code.