4Research·5h ago

Denser $\neq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training

Researchers have found that using denser models for on-policy self-distillation does not consistently improve performance when foundation models undergo continual post-training. This study demonstrates that increasing complexity during this process does not effectively resolve the problem of catastrophic forgetting, suggesting that current methods for acquiring new knowledge without losing previous capabilities require further refinement.

Covered by 1 source

AarXiv CS.AI↗Meng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu5h ago

Denser $\neq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training

Covered by 1 source

Related stories