Nex-N2-mini

Updated
25.06.2026
Tools
Thinking
Vision
Reasoning
Code

huggingface-cli download nex-agi/Nex-N2-mini
from transformers import AutoModel
model = AutoModel.from_pretrained("nex-agi/Nex-N2-mini")

More models

NameSize / UsageContextInput
Nex-N2-Pro
256KText, Image

At a glance

  • License: Apache 2.0
  • Context length: 256K tokens
  • Languages: Multilingual
  • Minimum hardware: ~20 GB VRAM
  • Strengths: reasoning and on-device inference

Overview

Nex-N2-mini is a 35.1B-parameter open-weight model from nex-agi, built on the Qwen3.5-35B-A3B-Base. It uses a Mixture-of-Experts (MoE) design: the full model holds 35B parameters, but only about 3B activate per token, so it runs far lighter than its total size suggests. nex-agi tuned it as an agent model for coding, tool calling, and long-horizon tasks, with an "Agentic Thinking" approach that decides when to reason and how deeply.

The local-AI angle is the point of listing it on Atomic Chat. You download the weights once and run the model on your own machine, so prompts and code never leave your hardware. It works fully offline after the download, which keeps private repos, documents, and chats on-device.

What it is good at

Nex-N2-mini carries the tools, thinking, vision, reasoning, and code capabilities of the Nex-N2 line. Its strongest results are in agentic coding and tool use.

  • Agentic coding — it scores 74.4 on SWE-Bench Verified and 60.7 on Terminal-Bench 2.1, so it can read a repo, edit files, and run terminal commands across multi-step tasks.
  • Tool calling and reasoning — function calling plus explicit reasoning traces let it plan, call a tool, read the result, and adjust. It reaches 82.6 on GPQA Diamond for hard reasoning questions.
  • Vision and long context — the model accepts image-text-to-text input and handles a 256K-token context, useful for screenshots, diagrams, and large codebases or documents in one session.

Running it locally

The model is 35.1B parameters total with a 256K context window, but the 3B active-parameter MoE keeps memory modest. At a Q4 quantization the weights sit around 21 GB, which fits a 24 GB GPU like an RTX 3090, 4090, or 5090, or a Mac with 32 GB or more of unified memory. With CPU offloading through llama.cpp it has been run on far smaller cards at lower speeds.

huggingface-cli download nex-agi/Nex-N2-mini

You can serve the full-precision weights with Transformers or vLLM, or use a GGUF quant in a llama.cpp-based runtime. In Atomic Chat the model loads with one click, with no server setup or command line needed.

License

Nex-N2-mini is released under the Apache-2.0 license. That permits commercial use, modification, redistribution, and private deployment, as long as you keep the license and attribution notices. You can fine-tune it and ship it inside your own products without a separate agreement.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

Nex-N2-mini is a 35.1B-parameter open-weight model from nex-agi, built on Qwen3.5-35B-A3B-Base. It is a Mixture-of-Experts model with about 3B active parameters per token, tuned as an agent for coding, tool calling, reasoning, and long-horizon tasks. It also accepts image input and supports a 256K-token context.

At a Q4 quantization the weights are roughly 21 GB, so a 24 GB GPU such as an RTX 3090, 4090, or 5090 handles it, as does a Mac with 32 GB or more of unified memory. Higher-precision Q8 weights need closer to 37 GB. With CPU offloading through llama.cpp it can run on smaller cards at reduced speed.

Yes. The model is released under the Apache-2.0 license, which allows free use including commercial deployment, modification, and redistribution. Running it locally in Atomic Chat has no usage fees, since the model executes on your own hardware rather than a paid API.

Yes. After you download the weights once, the model runs entirely on your machine with no internet connection. Prompts, code, and files stay on-device, which is the reason it is offered for local use in Atomic Chat.

Its strongest area is agentic coding and tool use. It scores 74.4 on SWE-Bench Verified and 60.7 on Terminal-Bench 2.1, so it can edit code across a repo and run terminal commands over multi-step tasks. It also reaches 82.6 on GPQA Diamond for hard reasoning and handles vision input and long documents.