← Back to Model Beat
3Research·Apr 17

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

arXiv:2604.15239v1 Announce Type: new Abstract: In this work, we revisit several key design choices of modern Transformer-based approaches for feed-forward 3D Gaussian Splatting (3DGS) prediction. We argue that the common practice of regressing Gaussian means as depths along camera rays is suboptimal, and instead propose to directly regress 3D mean coordinates using only a self-supervised rendering loss. This formulation allows us to move from the standard encoder-only design to an encoder-decoder architecture with learnable Gaussian tokens, thereby unbinding the number of predicted primitives from input image resolution and number of views. Our resulting method, TokenGS, demonstrates improved robustness to pose noise and multiview inconsistencies, while naturally supporting efficient test-time optimization in token space without degrading learned priors. TokenGS achieves state-of-the-art feed-forward reconstruction performance on both static and dynamic scenes, producing more regularized geometry and more balanced 3DGS distribution, while seamlessly recovering emergent scene attributes such as static-dynamic decomposition and scene…

Covered by 2 sources

  • AarXiv CS.AIJiawei Ren, Michal Jan Tyszkiewicz, Jiahui Huang, Zan GojcicApr 17
  • AarXiv CS.AIRoni Itkin, Noam Issachar, Yehonatan Keypur, Yehonatan Keypur, Anpei Chen, Sagie BenaimApr 17

Related stories

ResearchMixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM MidtrainingApr 16 · 2 sourcesResearchAutomated Alignment Researchers: Using large language models to scale scalable oversight - AnthropicApr 14 · 2 sourcesResearchAI as scientist? Machine-written papers clear academic reviews, raise questions - MSNApr 13 · 2 sourcesResearchNvidia wants to scale robot simulation training with Lyra 2.0Apr 16