← Back to Model Beat
4Research·5h ago

Denser $\neq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training

Researchers have found that using denser models for on-policy self-distillation does not consistently improve performance when foundation models undergo continual post-training. This study demonstrates that increasing complexity during this process does not effectively resolve the problem of catastrophic forgetting, suggesting that current methods for acquiring new knowledge without losing previous capabilities require further refinement.

Covered by 1 source

  • AarXiv CS.AIMeng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu5h ago

Related stories

ResearchOn Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMsJun 29 · 13 sourcesResearchAnti-Causal Domain Generalization: Leveraging Unlabeled DataJul 1 · 2 sourcesResearchLearning Unmasking Policies for Diffusion Language ModelsJun 29 · 6 sourcesResearchRedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttentionJun 29