GPT-OSS-20B is OpenAI’s smaller, open-weight language model optimized for low-latency, agentic tasks, and local deployment. With 21B total parameters and 3.6B active parameters (MoE), it fits within 16GB of memory thanks to native MXFP4 quantization. Designed for high-performance reasoning, it supports Harmony response format, function calling, web browsing, and code execution. Like its larger sibling (gpt-oss-120b), it offers adjustable reasoning depth and full chain-of-thought visibility for better interpretability. It’s released under a permissive Apache 2.0 license, allowing unrestricted commercial and research use. GPT-OSS-20B is compatible with Transformers, vLLM, Ollama, PyTorch, and other tools. It is ideal for developers building lightweight AI agents or experimenting with fine-tuning on consumer-grade hardware.
Features
- 21B parameters, 3.6B active (MoE architecture)
- Optimized for low-latency and local use
- Harmony-format support with chain-of-thought output
- Apache 2.0 license for commercial freedom
- Native MXFP4 quantization for memory efficiency
- Fine-tuning support on consumer GPUs
- Compatible with Transformers, vLLM, Ollama, and LM Studio
- Agentic functions: browsing, code execution, and structured outputs Preguntar a ChatGPT