4Models·5h ago
ReQuest: Rethinking-based Question-Aware Frame Selection for Long-Form Video QA
Researchers have introduced ReQuest, a method designed to improve how multimodal AI models process long-form videos by selectively choosing the most relevant frames rather than sampling uniformly. By focusing on evidence-based frame selection, this technique aims to overcome the limitations of fixed input token budgets that often hinder the performance of large language models when answering questions about extended video content.
Covered by 1 source
- AarXiv CS.AI↗Minkuk Kim, Suyong Yun, Young Tae Kim, Jinyoung Moon, Jinwoo Choi, Seong Tae Kim5h ago