← Back to Model Beat
4Policy·Jun 12

From Verdict to Process: Agentic Reinforcement Learning for Multi-Stage Fact Verification

arXiv:2606.13262v1 Announce Type: new Abstract: Recent approaches combining Large Language Models (LLMs) with retrieval-augmented reasoning have shown promise for automated fact verification. To process complex claims, these verification pipelines typically execute multi-stage workflows that coordinate tightly coupled modules, including claim decomposition, evidence gathering, and verdict prediction. However, existing methods optimize individual stages in isolation or rely on fixed heuristics, which limits adaptive coordination among stages and can lead to suboptimal outcomes. In this work, we propose ProFact, an agentic reinforcement learning framework for end-to-end optimization of multi-stage fact verification trajectories. ProFact trains a unified policy to coordinate claim decomposition, evidence seeking, answer generation, and verdict prediction. To address the sparse and delayed supervision provided by final veracity labels, ProFact introduces process-aware rewards that provide stage-level learning signals throughout the verification process. Empirical evaluation shows that ProFact consistently outperforms strong baselines in both verification performance and inference efficiency. These results highlight…

Covered by 1 source

Related stories

PolicyStatement on the US government directive to suspend access to Fable 5 and Mythos 5Jun 13 · 15 sourcesPolicyAnthropic Restricts Mythos After US OrderJun 13 · 44 sourcesPolicyBig Tech’s desperate last push at AI regulationJun 12 · 10 sourcesPolicyDOJ invokes national security to defend xAI's unpermitted gas turbines in NAACP lawsuitJun 16 · 2 sources