← Back to Model Beat
10Policy·Aug 5

Estimating worst case frontier risks of open weight LLMs

In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.

Covered by 1 source

Related stories

PolicyVision Language Model Alignment in TRL ⚡️Aug 7PolicyStability AI Achieves SOC 2 Type II and SOC 3 Compliance, Reaching New Industry Standard for Enterprise-Grade Security - Stability AIAug 4