LLMs are much worse than humans at learning from experience
Could test-time training give AI models this important capability?
It’s Future of Transformers Week at Understanding AI!
An important question about AI is whether—and how quickly—transformer-based foundation models will achieve human-like reasoning abilities. Some people believe we can get there simply by scaling up conventional LLMs. Others believe we’ll need to augment LLMs with the ability to search through possible solutions to difficult problems—as OpenAI did with o1.
I was fairly impressed with o1, but I still suspect something more fundamental is missing. I predict that it will take at least one—and possibly several—transformer-sized breakthroughs to get AI models to reason like humans.
In a series of posts this week, I’ll explore the limitations of today’s LLMs and recent efforts to address those shortcomings.
LLMs—including models like o1 that “think” before responding—seem incapable of learning new concepts at inference time. LLMs learn a great many concepts at training time. But if a concept wasn’t represented in an LLM’s training data, the LLM is unlikely to learn it by generalizing from examples in its context window.
In contrast, our brains continue to learn new concepts from everyday experiences long after we finish formal schooling. In other words, we stay in “training mode” throughout our lives. And this seems to make our minds far more adaptable than today’s AI models.
Our brains do “training” and “inference” simultaneously
In July, Google’s DeepMind’s announced a new model called AlphaProof focused on solving problems from the International Math Olympiad (IMO). The announcement caught the eye of Steve Newman, an entrepreneur best known as the cofounder of the startup that became Google Docs.
Newman was a math prodigy in his youth, representing the United States at the IMO in 1983 and 1984. In October, he wrote a couple of fascinating articles about how AI systems approach challenging math problems.
In his first post, Newman walked through the steps he took to solve one particular problem from the 2024 IMO. The problem requires proving that a sequence of numbers defined by a certain algorithm always winds up in a repeating pattern. Here’s how Newman described his process for solving the problem:
Break the problem down into pieces.
Re-read each piece until I fully understand it.
Work through a simple example.
Identify a pattern.
Find an explanation for the pattern.
Try to prove that this pattern will always hold…
…but eventually decide that it’s hopeless.
Work through more examples; discover that the initial pattern did not always hold.
Notice some more general patterns, and prove that they always hold.
Pursue an idea for completing the proof…
…and then abandon it.
At first, Newman attacked the problem more or less at random. He chose some simple starting conditions and worked out the sequence of numbers that resulted. As he worked through more sequences, he started to recognize higher-level patterns, which allowed him to think about the problem more abstractly and rigorously.
During this process, Newman’s brain was operating simultaneously on two levels, which I will cheekily call inference and training.
Start with inference: “To solve the Olympiad problem, I relied on a collection of hard-won strategies for math problems,” Newman wrote in his second article about AI and challenging math problems. “Play with examples, look for patterns. If you can’t prove something is true, look for a counter-example. Assign shorthand names to important ideas.”
These “hard-won strategies” have a paint-by-numbers quality to them. They don’t require any particular insight into the problem being solved. Indeed, it should be straightforward to train a large language model to execute them given enough examples of correct proofs.
What makes a problem like this difficult is that the space of possible approaches is practically infinite. There’s an unlimited number of examples to work through, a vast number of hypotheses that could be proven or disproven, and a practically unbounded number of ways to write a mathematical proof.
And this is where the second level of cognition—which I’m calling training—becomes important. “As we work, we learn more about the problem, forcing us to constantly replan,” Newman wrote.
As Newman worked through examples and tried out possible solutions, he developed an intuition for the problem that allowed him to become more discerning about his next steps. He shifted from exploring the problem in an open-ended way to narrowing in on a solution.
AlphaProof’s brute force approach
Now let’s compare Newman’s problem-solving strategy to that of DeepMind’s AlphaProof. Here’s how Newman describes AlphaProof’s approach:
Google has not said much about how AlphaProof works. My understanding is as follows: when presented with a problem, it attempts to simply write out a proof, using a language model trained on millions of proofs. It sounds like it may use a tree-search approach, similar to chess AIs. That means it would pick a few promising candidates for the first step of the proof; for each of those, it would try a few continuations, and then several continuations to each of those continuations, and so forth. It continues exploring possibilities until it finds a valid proof – potentially trying millions (billions?) of paths along the way.
Steve Newman’s exploration of possible solutions to an IMO problem was guided by an intuition for the problem that he developed while working on the problem. Interestingly, AlphaProof has its own process for learning about a problem as it works through it. Here’s how DeepMind described that process in its July announcement:
Keep reading with a 7-day free trial
Subscribe to Understanding AI to keep reading this post and get 7 days of free access to the full post archives.