← Back to Model Beat
2Policy·Apr 16

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

arXiv:2601.07177v3 Announce Type: replace-cross Abstract: Federated learning (FL) addresses privacy and data-silo issues in the training of large language models (LLMs). Most prior work focuses on improving the efficiency of federated learning for LLMs (FedLLM). However, security in open federated environments, particularly defenses against malicious clients, remains underexplored. To investigate the security of FedLLM, we conduct a preliminary study to analyze potential attack surfaces and defensive characteristics from the perspective of LoRA updates. We find two key properties of FedLLM: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA updates exhibit distinct behavioral patterns that can be effectively distinguished by lightweight classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for FedLLM, which constructs defenses across three levels: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on each client's local LoRA updates, treating them as high-dimensional…

Covered by 2 sources

  • AarXiv CS.AIMingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan TangApr 16
  • AarXiv CS.AIMohamed Shaaban, Mohamed ElmahallawyApr 17

Related stories

PolicyMaking AI operational in constrained public sector environmentsApr 16PolicyThe Specification Trap: Why Static Value Alignment Alone Is Insufficient for Robust AlignmentApr 17PolicyGoogle Told to Share Search Data With AI Rivals in EU ProposalApr 16PolicyAutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning ModelsApr 17