← Back to Model Beat
5Opinion·Mar 29

Why Are Large Language Models So Terrible at Video Games?

Large language models (LLMs) have improved so quickly that the benchmarks themselves have evolved, adding more complex problems in an effort to challenge the latest models. Yet LLMs haven’t improved across all domains, and one task remains far outside their grasp: They have no idea how to play video games. While a few have managed to beat a few games (for example, Gemini 2.5 Pro beat Pokemon Blue in May of 2025), these exceptions prove the rule. The eventually victorious AI completed games far more slowly than a typical human player, made bizarre and often repetitive mistakes, and required custom software to guide their interactions with the game. Julian Togelius , the director of New York University’s Game Innovation Lab and co-founder of AI game-testing company Modl.ai, explored the implications of LLMs’ limitations in video games in a recent paper . He spoke with IEEE Spectrum about what this lack…

Covered by 1 source

Related stories

OpinionThe Future of AI Is Open and ProprietaryMar 25OpinionThis Is How To Tell if Writing Was Made by AI | Odd LotsApr 2