← Back to Model Beat
4Research·5h ago

Rank-Then-Act: Reward-Free Control from Frame-Order Progress

Researchers have introduced Rank-Then-Act, a new method for training AI control policies using only expert video demonstrations rather than environment rewards. By utilizing a vision-language model to score the chronological progress of these videos, the system learns to perform tasks through observation alone. This approach potentially simplifies the development of robotic agents in scenarios where defining precise numerical reward functions is difficult or impossible.

Covered by 1 source

  • AarXiv CS.AIYuriy Maksyuta, George Bredis, Ruslan Rakhimov, Daniil Gavrilov5h ago

Related stories

ResearchOn Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMsJun 29 · 13 sourcesResearchAnti-Causal Domain Generalization: Leveraging Unlabeled DataJul 1 · 2 sourcesResearchLearning Unmasking Policies for Diffusion Language ModelsJun 29 · 6 sourcesResearchRedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttentionJun 29