6Research·4d ago
On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs
Researchers have introduced a new reinforcement learning framework called Be Faithful When Responding, designed to improve the reliability of vision-language models during multimodal reasoning tasks. This approach addresses common instability issues by preventing models from exploiting language patterns rather than accurately interpreting visual information.
Covered by 13 sources
- AApple Machine Learning Blog↗1d ago
- AarXiv CS.AI↗Xin Zou, Haolin Deng, Yibo Yan, Shuliang Liu, Kening Zheng, Zhiwei Jin, Chen Chen, Haonan Lu, Xuming Hu3d ago
- AarXiv CS.AI↗Eric Peh, Debaditya Roy, Basura Fernando3d ago
- AarXiv CS.AI↗Shuimu Chen, Yuteng Chen, Yuanshen Guan, Zebang Cheng, Zeyu Zhang, Shengqian Qin, Bin Xia, Jiaran Li, Wenming Yang, Fei Ma4d ago
- AarXiv CS.AI↗Tao Cheng, Shi-Zhe Chen, Hao Zhang, Yixin Qin, Jinwen Luo, Zheng Wei4d ago
- AarXiv CS.AI↗Kaitao Chen, Weiqian Zhao, Jiamin Wu, Qihao Zheng, Shangquan Sun, Chunfeng Song, Xiaosong Wang, Mu Zhou, Mianxin Liu2d ago
- AarXiv CS.AI↗Junha Jung, Minbyul Jeong, Suhyeon Lim, Sungwook Jung, Jaehoon Yun, Taeyun Roh, Mujeen Sung, Jaewoo Kang2d ago
- AarXiv CS.AI↗Weixin Chen, Antonio Vergari, Han Zhao2d ago
- AarXiv CS.AI↗Yutao Sun, Yanting Miao, Hao-Xuan Ma, Mengyu Zhou, Mingshuai Chen, Tiancheng Zhao, Dexin Wang, Lei Lv, Li Xu, Xiaoxi Jiang, Guanjun Jiang3d ago
- AarXiv CS.AI↗Peng, Lee, Yin Zhang, Yanglin Zhang, Haonan Wu, Zishan Liu, Ruoxi Zang, Xin Zhu, Jiayin Zheng, Jian Yao, Zefeng Ji, Fei Ma3d ago
- AarXiv CS.AI↗Wenhao Zhang, Kuanwei Lin, Xuyi Yang, Wei Gao, Ge Li1d ago
- AarXiv CS.AI↗Lingxiao Li, Yifan Wang, Xinyan Gao, Chen Tang, Xiangyu Yue, Chenyu You1d ago
- AarXiv CS.AI↗Hongxing Li, Xiufeng Huang, Dingming Li, Wenjing Jiang, Zixuan Wang, Haolei Xu, Hanrong Zhang, Haiwen Hong, Longtao Huang, Hui Xue, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen1d ago