← Back to Model Beat
10Models·Oct 29

gpt-oss-safeguard technical report

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using the underlying gpt-oss models as a baseline. For more information about the development and architecture of the underlying gpt-oss models, see the original gpt-oss model model card⁠.

Covered by 1 source

Related stories

ModelsHow we built OWL, the new architecture behind our ChatGPT-based browser, AtlasOct 30ModelsIntroducing gpt-oss-safeguardOct 29ModelsDoppel’s AI defense system stops attacks before they spreadOct 28ModelsIntroducing Aardvark: OpenAI’s agentic security researcherOct 30