gpt-oss-120b

Updated
25.06.2026
Tools
Thinking
Reasoning
Code

huggingface-cli download openai/gpt-oss-120b
from transformers import AutoModel
model = AutoModel.from_pretrained("openai/gpt-oss-120b")

More models

NameSize / UsageContextInput
gpt-oss-20b
128KText

At a glance

  • License: Apache 2.0
  • Context length: 128K tokens
  • Languages: Multilingual
  • Minimum hardware: ~80 GB VRAM
  • Strengths: frontier open-weight reasoning and tool use

Overview

gpt-oss-120b is an open-weight language model from OpenAI, released under the Apache-2.0 license. It uses a Mixture-of-Experts (MoE) design: of its ~120B total parameters, only about 5B are active per token, which is what lets a model this large run on a single workstation-class GPU. It handles a 128K-token context window and ships with built-in reasoning, tool calling, and code capabilities.

Because the weights are public, you download the model once and run every prompt on your own machine inside Atomic Chat. Nothing leaves the device, there is no API key, and the model keeps working with the network turned off. That makes gpt-oss-120b a fit for private documents, regulated environments, and offline use.

What it is good at

The capability tags on gpt-oss-120b are tools, thinking, reasoning, and code. Those map onto a few concrete jobs:

  • Agentic tool use — native function calling lets the model trigger searches, run code, or hit local tools and fold the results back into its answer.
  • Step-by-step reasoning — it exposes its chain of thought and supports adjustable reasoning depth, useful for math, planning, and multi-step problems. OpenAI reports it lands near o4-mini on core reasoning benchmarks.
  • Coding help — writing, explaining, and debugging code, including longer files that benefit from the 128K context.

Running it locally

gpt-oss-120b reports a 120.4B parameter count, and OpenAI ships the checkpoint in MXFP4 form at roughly 61 GiB. The clean target for local serving is a single 80 GB GPU such as an H100; setups with less VRAM can run it through CPU offloading at lower speed. The 128K context window means you can feed it large documents or long chat histories.

huggingface-cli download openai/gpt-oss-120b

From there you can serve it with Transformers or vLLM, or load it in Atomic Chat with one click and start chatting fully on-device.

License

gpt-oss-120b is released under Apache-2.0. You can use it commercially, modify it, fine-tune it, and ship products built on it, with no copyleft requirement and an explicit patent grant. Keeping the license notice is the main obligation.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

gpt-oss-120b is an open-weight language model from OpenAI built on a Mixture-of-Experts architecture, with about 120B total parameters and roughly 5B active per token. It is tuned for reasoning, tool use, and code, supports a 128K context window, and is released under the Apache-2.0 license so anyone can download and run it.

The practical target is a single GPU with 80 GB of VRAM, such as an NVIDIA H100, since OpenAI ships the model in MXFP4 form at around 61 GiB. You can run it on less VRAM by offloading layers to system RAM, but expect slower generation. Plan for ample system memory as well to handle long contexts.

Yes. The weights are published under the Apache-2.0 license, so the model is free to download, run, and even use commercially. Running it locally in Atomic Chat means there is no API key and no per-token cost.

Yes. Once the weights are downloaded to your machine, gpt-oss-120b runs entirely on-device with no internet connection required. Your prompts and data stay local, which is the point of running it in Atomic Chat instead of a hosted API.

It is strongest at reasoning, agentic tool use, and coding. OpenAI reports it reaches near-parity with o4-mini on core reasoning benchmarks, and its native function calling lets it drive external tools. The 128K context window also helps with long documents and extended code files.