gpt-oss-20b

Updated
25.06.2026
Tools
Thinking
Reasoning
Code

huggingface-cli download openai/gpt-oss-20b
from transformers import AutoModel
model = AutoModel.from_pretrained("openai/gpt-oss-20b")

More models

NameSize / UsageContextInput
gpt-oss-120b
128KText

At a glance

  • License: Apache 2.0
  • Context length: 128K tokens
  • Languages: Multilingual
  • Minimum hardware: ~16 GB VRAM
  • Strengths: agentic reasoning and tool use, runs on 16 GB

Overview

gpt-oss-20b is an open-weight language model from OpenAI, released in 2025 under the gpt-oss family. It uses a Mixture-of-Experts (MoE) design with about 21.5B total parameters, of which roughly 3.6B are active per token. That routing keeps inference light while the model still reaches across a large parameter pool, and it carries a 128K-token context window for long documents and extended chats.

The model runs fully on your own hardware through Atomic Chat. Nothing leaves your machine, so prompts, files, and outputs stay private, and it keeps working with the network off. For anyone who wants a capable reasoning model without sending data to a cloud API, gpt-oss-20b is built for that local-first setup.

What it is good at

gpt-oss-20b ships with tool calling, chain-of-thought reasoning, and code capabilities, which map to a few clear jobs:

  • Tool and function calling — the model can emit structured calls to external functions, so you can wire it into agents, scripts, or local apps that fetch data and run actions.
  • Step-by-step reasoning — it produces visible chain-of-thought and supports adjustable reasoning effort, useful for math, logic, and multi-step problems where you want to see the working.
  • Code generation and review — it writes functions, explains snippets, and helps debug across common languages, with the 128K context holding a sizable codebase in one session.

Running it locally

At 21.5B total parameters, gpt-oss-20b is sized for consumer hardware. With its 4-bit MXFP4 quantization it fits in about 16GB of memory, so a GPU like the RTX 5080 (16GB) handles it, and a 24GB card such as the RTX 4090 gives more headroom for longer context and the full 128K window. Around 24GB of system RAM works as a CPU fallback, though a GPU is much faster.

huggingface-cli download openai/gpt-oss-20b

You can load the weights with Hugging Face Transformers or serve them through vLLM, or skip the setup entirely and open gpt-oss-20b in Atomic Chat with a one-click download.

License

gpt-oss-20b is released under the Apache 2.0 license. That permits commercial use, modification, redistribution, and fine-tuning, with no fee and no restriction on building it into your own products, as long as you keep the license and attribution notices.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

gpt-oss-20b is an open-weight language model released by OpenAI in 2025. It uses a Mixture-of-Experts architecture with about 21.5B total parameters (roughly 3.6B active per token) and a 128K-token context window. It is built for local, on-device use with reasoning, tool calling, and code generation.

With its built-in 4-bit MXFP4 quantization, gpt-oss-20b fits in about 16GB of memory. A 16GB card like the RTX 5080 is the practical minimum, and a 24GB GPU such as the RTX 4090 gives more room for longer context. If you don't have a suitable GPU, around 24GB of system RAM lets it run on CPU, though slower.

Yes. gpt-oss-20b is released under the Apache 2.0 license, so the weights are free to download, run, modify, and use commercially. Running it locally through Atomic Chat means there are no API fees or per-token charges.

Yes. Once the weights are downloaded, gpt-oss-20b runs entirely on your own machine and needs no internet connection. Prompts and outputs stay on-device, which keeps your data private. Atomic Chat loads it locally so it keeps working with the network off.

It is strong at chain-of-thought reasoning, tool and function calling, and writing or debugging code. OpenAI reports it performs near o3-mini on common benchmarks while staying small enough for consumer hardware. The 128K context window also lets it handle long documents and larger codebases in a single session.