NOTE: The study “Deep Reinforcement Learning Agents are not even close to Human Intelligence” (https://arxiv.org/pdf/2505.21731) by Quentin Delfosse et al. (2025) represents, in my view, an essential milestone in our understanding of artificial intelligence. I know Quentin personally, and I appreciate both the rigor of his work and the finesse of his perspective on AI. This publication reminds us that the real challenge isn’t beating humans, but understanding them.
The genius of doing the opposite
Quentin Delfosse and Jannis Blüml, two young researchers from the Technical University of Darmstadt, had a brilliant idea. Rather than following the usual trend of making challenges more complex to test AI, they did exactly the opposite. They took AI agents that dominate Atari games (systems capable of superhuman performance on Pong, Space Invaders, or Ms. Pac-Man) and gave them an unusual test: playing easier versions of these same games.
The result? Complete collapse. These “genius” AIs suddenly become incompetent as soon as you reduce the difficulty.
This counterintuitive approach reveals a disturbing truth about our AIs. Delfosse and Blüml created HackAtari, a collection of over 224 variations of classic Atari games, primarily simplifications. The technical innovation is remarkable: by directly manipulating the games’ RAM in real-time, they can modify any aspect without touching the proprietary source code.
The changes would seem trivial to a human. In Pong, the opponent becomes “lazy” and stays still after returning the ball. In Freeway, all cars stop, transforming the perilous crossing into a leisurely stroll. In Kangaroo, the dangerous monkeys and coconuts simply disappear.
These modifications aren’t traps, they objectively make the games more accessible. And yet… What could have been just an experimental wink turned into a damning assessment. Because when these simplified variants are put to the test, the facade of intelligence crumbles.
The spectacular collapse of “champions”
The results are unequivocal. All tested agents collapse, regardless of their sophistication. Whether it’s DQN (Deep Q-Networks, the pioneering algorithm that revolutionized AI on Atari), PPO (Proximal Policy Optimization, a very popular policy optimization method), or C51 (an algorithm that models reward distributions rather than their simple average), all suffer dramatic performance drops of 50% or more.
IMPALA stands out slightly by maintaining “superhuman” performance on average, this distributed algorithm is designed for large-scale training, but this apparent robustness masks a more nuanced reality: even this agent suffers drops of over 50% on 10 out of 15 games.
The case of Pong perfectly illustrates the problem. The agent seems to follow the ball perfectly, but in reality it exploits hidden correlations that HackAtari reveals. Once these shortcuts are disrupted, the agent suddenly becomes blind.
Even sophisticated approaches fail. The researchers tested “object-centric” agents (AIs supposed to reason in terms of objects rather than raw pixels). These more advanced systems resist slightly better, but all still fail on essential gameplay modifications.
But to be sure these drops don’t come from increased difficulty or a test quirk, the researchers had a simple and formidable idea: compare AIs… to humans.
Humans vs Machines
To validate that their modifications truly constituted simplifications, the researchers conducted a rigorous study with 128 participants. Each participant first learned to play the original game (10-15 minutes), then was evaluated on the standard version (15 minutes) and finally on the modified version (15 minutes).
The human results are astounding. On 13 of the 15 games tested, participants maintain or improve their performance on the simplified versions: +957% average improvement on Asterix, +1658% on Kangaroo when dangers disappear, +1013% on Riverraid with restricted shooting, +472% on Freeway with stopped cars.
This adaptability reveals what fundamentally distinguishes us: we understand the game’s intention, where machines have only memorized its surface.
If humans adapt where AIs collapse, it’s not just a difficulty problem, but a deeper learning bias… A structural flaw that this experiment brings to light.
But how do we explain this striking gap between human adaptability and machine rigidity? The answer lies in the very way our AIs learn.
Shortcut Learning or the Mechanics of Failure
What HackAtari reveals is that our “intelligent” AIs are actually machines for exploiting superficial correlations. They spot regularities in their training environment, a movement, a color, an object’s position, and cling to them like cognitive crutches.
This phenomenon, dubbed “shortcut learning” by researchers, isn’t limited to video games. In computer vision, networks can classify images of wolves versus dogs based solely on the presence of snow, without ever “seeing” the animal.
It’s the fundamental difference between “playing well” and “understanding why you play well.” Our AIs excel at the former, fail miserably at the latter.
This research shakes a fundamental belief in the field: that performance equality implies intelligence equality. As the authors write, “achieving human-level performance in training settings does not imply human-like reasoning capabilities.“
Traditional metrics (average scores, superhuman performance) mask this fragility. An agent can dominate an environment while being unable to adapt to its simplified version.
This discovery raises worrying questions for all domains where AI is deployed:
- Autonomous vehicles: If a driving AI has learned to recognize stop signs based on superficial correlations (red color, for example), what happens when it encounters a sign faded by the sun or partially masked? The vehicle could lose its bearings in situations actually simpler than its initial training. The agent might have memorized thousands of driving scenarios without ever truly “understanding” traffic rules.
- Medical diagnosis: A medical AI could achieve impressive performance by focusing on irrelevant details in medical images (lighting quality, type of scanner used) rather than true pathological indicators. Faced with images acquired under slightly different conditions, even clearer ones, it could completely fail in its diagnosis.
- Critical systems: In sensitive domains like finance, security, or energy, our AIs could rely on fragile correlations that seem robust under normal conditions but collapse as soon as conditions change, even favorably. A fraud detection system could, for example, base itself on irrelevant technical details rather than true fraudulent patterns.
The fundamental problem is that we deploy these systems assuming they’ve “understood” their task, when they’ve only memorized local solutions to specific problems. This difference isn’t just academic, it’s potentially dangerous.
Paths toward truly intelligent AI
Delfosse and Blüml identify several promising directions, all inspired by our natural intelligence. Rather than continuing to improve raw computational power, we need to fundamentally rethink our AIs’ architecture:
- See the world in “objects” rather than pixels: Force agents to reason in terms of objects and relationships. When you look at Pac-Man, you don’t see 21,168 colored pixels but Pac-Man (round yellow object), ghosts (colored mobile objects), walls (static objects). This natural decomposition allows you to instantly understand the game’s rules.
- Understand the “why” rather than the “what”: Integrate understanding of cause-and-effect relationships. In Frogger, a human intuitively understands: “If I touch a car, I die BECAUSE cars are dangerous.” A traditional agent only learns: “When these pixels overlap, the reward becomes negative.”
- Build reusable modular skills: Develop sub-skills that transfer from one context to another. When you learn a new platform game, you automatically reuse acquired skills like “jumping over obstacles,” “avoiding enemies,” “collecting bonuses.”
- Harness the power of large language models: A particularly promising path involves using LLMs to generate symbolic representations from visual inputs. These models excel at manipulating abstract concepts and reasoning about causal relationships, exactly what current agents lack.
The idea would be to create hybrid agents. Imagine a system that would use an LLM to “describe” a Pong scene: “There’s a ball moving to the right, an opponent at the top who just returned the ball, and my paddle at the bottom left.” When the opponent becomes “lazy,” the description naturally changes, and the LLM intuitively understands this situation is simpler.
Let’s stop overinterpreting AI capabilities
This research reminds us how easy it is to overinterpret our AIs’ capabilities. It’s a deeply human bias: faced with behavior that seems intelligent, we automatically project our own cognitive processes.
Take the example of Pong in more detail. We see the agent perfectly following the ball and attribute to it an understanding of physics. But HackAtari reveals the deception: the agent never “saw” the ball, it discovered that the opponent’s position correlates with its future position. A useful correlation, but not understanding. When the enemy becomes lazy in HackAtari, this shortcut no longer works and the agent becomes blind.
It’s as if we observed someone “predicting” the weather by watching ants. As long as ants are a good indicator, the prediction works perfectly. But as soon as you move the ants or change their behavior, our “meteorologist” becomes blind. He never understood clouds, atmospheric pressure, or weather systems.
This cognitive illusion has deep roots. We’re social beings, programmed to detect intention and understanding in others. When we see complex and adaptive behavior, our brain automatically activates what psychologists call “theory of mind”—this ability to attribute mental states to others.
But today’s AIs don’t have mental states in the way we understand them. They have synaptic weights, loss functions, gradients. Reality is more prosaic and more fascinating at the same time: they exploit statistical regularities with formidable efficiency, without ever building a mental model of the world.
The Space Invaders agent that seems to “strategically” avoid enemy projectiles isn’t really planning. It learned that certain pixel patterns correlate with negative rewards. The Pac-Man agent that “understands” it must avoid ghosts has no concept of danger, it optimizes a complex mathematical function that indirectly encodes this notion.
This distinction isn’t just academic. It explains why these same agents, so brilliant in their original environment, suddenly become incompetent faced with HackAtari modifications. They never really understood the rules; they memorized local solutions to specific problems.
AI: the student who recites without understanding
Imagine a student in class. He perfectly recites his physics lesson: formulas, definitions, examples. Everything’s there, well organized, learned by heart. The teacher, impressed, congratulates him. On paper, he checks all the boxes for success.
But one day, he’s asked a slightly different question. Not harder, just shifted: “If the moon suddenly disappeared, what would happen to the tides?” And then, silence. A blank stare. The brain spinning its wheels.
Because he never understood. He memorized words without connecting them, stacked facts without grasping their meaning. He answers well as long as the question conforms to what he learned. But as soon as you slightly change the context, he collapses.
This is exactly what HackAtari reveals with our artificial intelligences. They too passed the standard exam. They learned to shine in calibrated, repetitive environments where finding a shortcut suffices. But when you simplify the task, when you reformulate it, they lose their footing. Not because it’s harder, but because they never understood deeply.
HackAtari is a truth test. A way of asking our AIs: “Explain to me what you’re doing, but without the cheat sheets.”
And like the student who cheated with ready-made answers, they find themselves naked, helpless, unmasked.
This thread allows us to follow, throughout the article, a familiar and meaningful comparison. It gives substance to the idea that true intelligence isn’t measured by raw performance, but by the ability to transfer, understand, adapt. Like a good student—not one who recites, but one who thinks.
The Feynman test for AI
This is exactly what Richard Feynman understood, long before machines. He said: “If you can’t explain it simply, you don’t understand it well enough.”
HackAtari applies this principle to AIs. And the verdict is unequivocal: if they can’t adapt to a simplified version of a game, it’s because they never really understood what they were doing.
Like the brilliant student who flounders before a slightly off-kilter question, our AIs reveal their limits here. They optimized strategies, spotted useful correlations, but without ever building a mental representation of the task. They just play well—as long as the game doesn’t change.
It’s a valuable lesson: achieving high performance proves nothing about understanding. It’s just an indicator, sometimes misleading. And HackAtari acts as a revealer of these illusions.
What HackAtari really reveals
HackAtari doesn’t just test machines; it acts like a good teacher. One who, instead of settling for a well-recited assignment, poses a slightly askew question. A question that forces thinking, not repetition.
And there, like that model student who loses his bearings as soon as you go off-script, our AIs freeze. Not because we made the task harder, but because we changed its form. And their knowledge, in reality, wasn’t knowledge at all. They had learned to answer correctly, not to understand why.
This is what HackAtari reveals: this invisible gap between performance and understanding. It’s not enough to “win” to be intelligent. You must be able to adapt, reason, shift.
So, if we want to build artificial intelligences worthy of the name, we’ll have to teach them differently. And above all, ask them better questions.
Not those from the curriculum. Those that come after.