Overview
North-Mini-Code-1.0 is CohereLabs' first model built for developers, and it targets agentic software engineering rather than general chat. The cohere2moe tag points to its architecture: a decoder-only Transformer-based sparse Mixture-of-Experts model with 30.5B total parameters but only about 3B active per token, drawn from 128 experts that activate 8 at a time. That sparse design is what lets a 30B-class coding model run on a single workstation GPU instead of a server rack.
The local-AI angle is the point. With Atomic Chat you load the weights once and the model runs fully on your own hardware: no API key, no per-token billing, no code leaving the machine. It works offline after download, which matters for proprietary repositories and air-gapped setups where sending source files to a hosted endpoint isn't allowed.
What it is good at
North-Mini-Code-1.0 was post-trained on real software engineering and terminal tasks, with native tool use and interleaved thinking. That shapes what it does well:
- Agentic coding — it plans, edits files, and runs terminal commands across long task sequences, which suits it to coding agents like OpenCode rather than one-shot snippet generation.
- Tool calling and reasoning — built-in
tool_callingandthinkinglet it decide when to call a function, read the result, and reason through the next step instead of guessing in one pass. - Repository-scale work — the long context window keeps many files and a full agent trajectory in view at once, so it can trace a bug across modules or refactor against the whole codebase.
Running it locally
The model is 30.5B parameters with a context length of 500,000 tokens. The full-precision weights want a single H100 80GB (FP8) or 2x A100 40GB (BF16), but the community has published quantized builds: Unsloth GGUFs range from roughly 9GB up to full BF16, so smaller machines can load a lower-bit quant. Download the official weights with:
huggingface-cli download CohereLabs/North-Mini-Code-1.0
For serving, vLLM and SGLang support the cohere2moe architecture today; llama.cpp and Ollama need a build that includes the 128-expert support. In Atomic Chat you pick the quant that fits your VRAM and load it with one click, then chat or wire it into a coding agent locally.
License
North-Mini-Code-1.0 is released under the apache-2.0 license. That permits commercial use, modification, and redistribution, with patent protection and only an attribution requirement. Cohere also asks users to follow its Acceptable Use Policy alongside the license.
