gpt-oss-safeguard is an open-weight reasoning model family released by OpenAI designed specifically for content safety and moderation tasks. Rather than just outputting a numeric “safety score,” it is trained to reason about content with respect to a user-provided policy, allowing flexible, customizable moderation definitions rather than fixed rules — ideal when different platforms have different safety standards. The model comes in at least two variants: a large 120B-parameter version for heavy-duty, high-accuracy reasoning, and a 20B-parameter version optimized for lower latency or smaller compute resources. At inference time you supply both the content and your own safety policy (written in a structured prompt), and the model will evaluate the content and return its justification — enabling transparent, auditable moderation decisions. It supports running fully locally or in private infrastructure (no mandatory cloud dependence).
Features
- Open-weight reasoning model tuned for safety and content moderation use cases
- Supports “bring-your-own-policy”: developers supply custom safety rules for content evaluation
- Returns not just classifications but also reasoning / justification for decisions — useful for audits or transparency
- Available in multiple model sizes (e.g. 120B and 20B parameters) to suit different resource constraints and latency needs
- Fully open-source under a permissive license (Apache 2.0), enabling free use, modification, and integration
- Can run locally or on private infrastructure — giving privacy, control, and cloud-independence