MiniMax-M3

Updated
24.06.2026
Thinking
Vision
Reasoning
Code

huggingface-cli download MiniMaxAI/MiniMax-M3
from transformers import AutoModel
model = AutoModel.from_pretrained("MiniMaxAI/MiniMax-M3")

More models

No items found.
NameSize / UsageContextInput

At a glance

  • License: Other
  • Context length: 1M tokens
  • Languages: Multilingual
  • Minimum hardware: ~239 GB VRAM
  • Strengths: reasoning, coding and on-device inference

Overview

MiniMax-M3 is an open-weight large language model from MiniMaxAI, the Chinese AI lab behind the MiniMax series. It is a Mixture-of-Experts (MoE) model with about 427B total parameters but only roughly 23B activated per token, so it runs far cheaper than its full size suggests. The architecture pairs MoE routing with MiniMax Sparse Attention (MSA), which is how it sustains a context window of 1,048,576 tokens.

The model is natively multimodal, trained from the start to handle interleaved text and images rather than bolting vision on afterward. On atomic.chat you can pull the weights and run MiniMax-M3 on your own hardware, so prompts, code, and documents stay on your machine, fully offline, with nothing sent to a remote API.

What it is good at

MiniMax-M3 was built around coding and agentic work, with reasoning and vision as first-class capabilities. Three things it handles well:

  • Coding and agentic tasks — it posts strong autonomous-agent scores (around 59% on SWE-Bench Pro in MiniMaxAI's reporting), so it can plan, edit across files, and run multi-step tool loops.
  • Long-context reasoning — the 1M-token window plus sparse attention lets it read entire codebases or long document sets in one pass and reason over them with the thinking mode.
  • Vision and visual understanding — being natively multimodal, it reads screenshots, diagrams, and UI mockups alongside text, which suits debugging from an image or extracting data from a chart.

Running it locally

This is a heavy model. At 427B parameters even quantized builds run large: the smallest GGUF quant lands near 128GB, and a comfortable quant wants 130GB or more of RAM or unified memory. A 512GB Mac Studio M3 Ultra can drive long generations; multi-GPU servers use 8-way tensor parallelism. The full 1,048,576-token context also needs room for the KV cache on top of the weights.

huggingface-cli download MiniMaxAI/MiniMax-M3

Once the weights are local you can serve them through vLLM, SGLang, or llama.cpp, or load the model in Atomic Chat with one click and start chatting without touching a config file.

License

MiniMax-M3 ships under a custom "other" license rather than a standard OSI license, so the exact terms come from MiniMaxAI's own agreement on the model's Hugging Face page. The weights are openly downloadable for local and offline use; check that license text before any commercial deployment to confirm what redistribution and production use it permits.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

MiniMax-M3 is an open-weight Mixture-of-Experts language model from MiniMaxAI, with about 427B total parameters and roughly 23B active per token. It is natively multimodal, handles a 1,048,576-token context through MiniMax Sparse Attention, and targets coding, agentic, and reasoning tasks. You can download the weights and run it locally through Atomic Chat.

A lot. The smallest GGUF quant is around 128GB on disk, so plan for at least 130GB or more of RAM or unified memory once you add the KV cache. A 512GB Mac Studio M3 Ultra can run long generations, and GPU servers typically use 8-way tensor parallelism. This is not a model for an average laptop or a single consumer GPU.

The weights are openly available to download and run yourself, so local use through Atomic Chat has no per-token cost. It is released under a custom "other" license, so read MiniMaxAI's terms on Hugging Face before commercial use. MiniMaxAI's own hosted API is separate and priced per token.

Yes. Once you download the weights from Hugging Face, the model runs entirely on your own hardware with no internet connection required. In Atomic Chat your prompts and files stay on-device, which is the main reason to self-host rather than call a cloud API.

Its strongest areas are coding and agentic workflows, where MiniMaxAI reports about 59% on SWE-Bench Pro. The 1M-token context makes it useful for reasoning over whole codebases or large document sets, and its native vision support lets it read screenshots, diagrams, and charts alongside text.