Xiaomi-GUI-0 Technical Report
Xiaomi researchers have introduced Xiaomi-GUI-0, an open-source framework designed to train vision-language models for executing tasks within graphical user interfaces. The project addresses current limitations in how agents perceive and interact with mobile applications by providing a structured approach for mapping screen inputs to specific navigation and text-entry actions.
Covered by 1 source
- AarXiv CS.AI↗Wanxia Cao, Chengzhen Duan, Pei Fu, Pengzhi Gao, Niu Lian, Fazhan Liu, Hui Liu, Heng Qu, Qinzhuo Wu, Zhehao Yu, Tongbo Chen, Shiqi Cui, Anan Du, Shukai Jia, Yuanfa Li, Yike Liu, Wenchao Lu, Haoyuan Sun, Jiatong Sun, Cheng Tan, Yajie Wang, Changqiao Wu, Tao Xiong, Jiahui Yang, Yuxuan Yuan, Ruoceng Zhang, Shaojie Zhang, Jian Zhu, Jian Luan, Cong Zou4d ago