4Models·5h ago
Conditional Co-Ablation: Recovering Self-Repair Backups in Transformer Circuits
Researchers have identified a phenomenon in transformer models called conditional co-ablation, where removing a specific component triggers internal self-repair mechanisms that mask the circuit's original function. This finding suggests that traditional ablation methods may underestimate the actual contribution of certain neural pathways by failing to account for how models dynamically redistribute information when parts are disabled.
Covered by 1 source
- AarXiv CS.AI↗Zhiren Gong, Zihao Zeng, Chau Yuen, Wei Yang Bryan Lim5h ago