4Research·5h ago
Denser $\neq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training
Researchers have found that using denser models for on-policy self-distillation does not consistently improve performance when foundation models undergo continual post-training. This study demonstrates that increasing complexity during this process does not effectively resolve the problem of catastrophic forgetting, suggesting that current methods for acquiring new knowledge without losing previous capabilities require further refinement.
Covered by 1 source
- AarXiv CS.AI↗Meng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu5h ago