4Research·5h ago
Distributionally Robust Listwise Preference Optimization
Researchers have introduced a new training method called Distributionally Robust Listwise Preference Optimization to improve how language models align with human rankings. While traditional techniques rely on comparing just two options at a time, this approach handles uncertainty across entire lists of ranked preferences. By focusing on robustness at the ranking level, the model aims to produce more stable performance when training data contains inconsistent or noisy human feedback.
Covered by 1 source
- AarXiv CS.AI↗Xudong Wu, Jian Qian, Pangpang Liu, Vaneet Aggarwal, Jiayu Chen5h ago