Richard Sutton Says LLMs Are a Dead End. He Might Be Right.
Quick summary
Richard Sutton, the father of reinforcement learning and author of the famous Bitter Lesson, has argued that large language models are not the path to general intelligence. He thinks RL is. Here is the argument, what it gets right, and where the debate actually stands.
Richard Sutton wrote a blog post in 2019 that the AI research community has been arguing about ever since. He called it the Bitter Lesson, and the core claim is simple: every time researchers tried to build human knowledge into AI systems, they lost. Every time they trusted computation and learning instead, they won.
Chess engines that encoded human chess strategy lost to engines that searched more positions. Speech recognition systems designed around phoneme models lost to systems that trained on raw audio. Computer vision systems built around edge detection and object part hierarchies lost to convolutional networks trained on pixels.
The lesson is bitter because it means a lot of careful, intelligent human work goes into building systems that brute-force learning eventually supersedes. The implication is uncomfortable: the structure you add is often the limitation, not the strength.
Now Sutton has extended this argument to large language models, and the AI world is divided on whether he is right.
What Sutton Actually Argues
Sutton's position is not that LLMs are useless or that they fail to demonstrate impressive capabilities. He acknowledges that they are impressive. His claim is more specific.
LLMs are trained to predict the next token in a sequence. They learn statistical regularities across enormous amounts of text. This makes them very good at generating fluent language, answering questions whose answers appear somewhere in training data, and combining information in ways that look intelligent.
What they do not have, in Sutton's view, is genuine world models, the ability to reason about hypothetical situations that are genuinely novel, or the kind of adaptive goal-directed behavior that he thinks constitutes general intelligence.
He thinks LLMs are a very impressive form of pattern matching. He thinks general intelligence requires something more like an agent that acts in the world, receives feedback, forms goals, and updates its beliefs based on the outcomes of its actions. That is reinforcement learning. And he thinks the field's heavy investment in LLMs represents a detour from the path that actually leads to general intelligence.
The Bitter Lesson Applied to LLMs
Here is where it gets interesting. You could argue that the LLM critics, including Sutton, are making exactly the mistake the Bitter Lesson warns against. They are saying that LLMs lack the structure we think intelligence requires: reasoning, grounding, goal-directedness. And history suggests that when people say a system lacks the right structure, the system trained on more data and more compute often wins anyway.
Sutton would push back on this in an interesting way. He is not saying LLMs should encode more structure. He is saying LLMs are not the right paradigm. The right paradigm involves an agent taking actions in an environment and learning from the consequences. More compute on the wrong approach still gives you the wrong approach.
This is a live debate, and it is genuinely unresolved.
The Case for Sutton Being Right
There are things LLMs struggle with that seem to require more than token prediction. Long-horizon planning in genuinely novel environments. Consistent goal maintenance across a conversation. Updating beliefs correctly when shown contradicting evidence. Embodied reasoning about physical causality.
OpenAI's o1 and o3 models introduced chain-of-thought reasoning that improved performance on many of these tasks significantly. The improvement is real. Whether it counts as genuine reasoning or sophisticated pattern matching over reasoning-like text is one of the central philosophical disputes in AI right now.
The reinforcement learning argument is also supported by the trajectory of AlphaGo and its descendants. AlphaZero, which learned chess, Go, and shogi purely through self-play with no human game data at all, played better than any system that incorporated human knowledge. If you believe the Bitter Lesson applies there, it is at least reasonable to ask whether the same principle applies at the level of general intelligence, where the "game" is the world itself.
The Case Against Sutton
The most obvious counterargument is the empirical one. LLMs keep doing things that critics said they could not do. Early critics said they could not reason. Models trained with RLHF and chain-of-thought prompting do something that looks a lot like reasoning. Critics said they could not write code. GitHub Copilot and Claude are writing production code. Critics said they could not understand context. Long-context models handle hundred-thousand-token documents.
There is also a version of the argument that Sutton's framing presents a false dichotomy. Reinforcement learning from human feedback is already part of how the major LLMs are trained. Systems like the OpenAI o-series models use reinforcement learning explicitly to improve reasoning. The boundaries between "LLM" and "RL agent" are blurring rather than sharpening.
The most generous read of the current moment is that both camps are partly right. LLMs are not sufficient on their own for general intelligence. RL-trained agents operating purely in abstract environments have their own limitations. The thing that eventually works will probably combine both, and several labs are already moving in that direction.
Why This Debate Matters
The reason to care about the Sutton argument is not that it tells you definitively who is right. It is that it sharpens your thinking about what current AI systems can and cannot do.
If you are building a product on top of an LLM, understanding the limitations that Sutton identifies helps you design better systems. LLMs are unreliable for tasks that require persistent memory, for tasks that require genuine novelty without any training analogues, and for tasks where being wrong by a small amount leads to catastrophic outcomes.
If you are thinking about the path to AGI, the debate forces you to ask a question that does not have a clean answer yet. Is the path to general intelligence scaling LLMs further, adding RL on top of them, or something architecturally different that we have not fully built yet?
Sutton has been right before. His 2019 Bitter Lesson turned out to be prescient about the rise of scale-based approaches over hand-engineered ones. Whether he is right again about LLMs, or whether the people scaling LLMs with RL are already converging on what he is asking for, is the most important open question in AI research right now.
The honest answer is that nobody knows. And anyone who tells you they do is more confident than the evidence warrants.
Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Abhishek Gautam
Full Stack Developer & Software Engineer based in Delhi, India. Building web applications and SaaS products with React, Next.js, Node.js, and TypeScript. 8+ projects deployed across 7+ countries.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
You might also like
Andrej Karpathy on AGI, Software 3.0, and Why the Next Decade Changes Everything
Andrej Karpathy sat down with Dwarkesh Patel and laid out his vision for where AI is headed. He thinks AGI is roughly a decade away, and he introduced a framework called Software 3.0 that reframes what programming even means in the age of large language models.
9 min read
"The Era of Scaling is Over": Ilya Sutskever's Interview Explained Simply
Ilya Sutskever said "the era of scaling is over" and it went viral. Here is what he actually meant, why it matters, what comes next in AI development, and whether he is right.
7 min read
What Is AGI? The Honest Explanation Nobody Else Will Give You
AGI — Artificial General Intelligence — is the most debated term in tech. Here is a plain English explanation of what it actually means, why experts disagree, how close we are, and what it would actually change.
9 min read