4Models·5h ago
HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety
Researchers have released HaloGuard 1.0, an open-weights safety classifier designed to detect harmful prompts across multiple languages. By utilizing a constitutional AI framework, the model achieves high performance while requiring only a fraction of the computational resources typically needed for such safety systems.
Covered by 1 source
- AarXiv CS.AI↗Navaneeth Sangameswaran, Preetham S, Ashmiya Lenin5h ago