Predictions of AI doom are too much like Hollywood movie plots
Stories from the Doomer Cinematic Universe aren't always realistic.
A few years ago I watched The Imitation Game, a biopic about Alan Turing. Like every computer scientist, I admire Turing and the role he played in breaking the German Enigma codes during World War II. But I was a bit annoyed at how much the film dumbed down the technical details of Turing’s work.
For example, the movie’s climax comes when Turing realizes the Germans might use common phrases (like “weather report” or “Heil Hitler”) in multiple messages, allowing the British to execute a known plaintext attack. The fictional Turing has this realization at a bar, then races back to the hut containing his massive code-breaking computer called a bombe. Once he gets there, he is able to break the German codes in less than three minutes.
This scene struck me as silly because the concept of a known-plaintext attack would have been widely understood by cryptographers of the era. Turing’s actual contributions were far more sophisticated—and probably more difficult to explain in a movie.
A few years later I visited the museum at Bletchley Park, where the events of the film occurred. I realized another way the film is misleading: it vastly undersells the scale of the British codebreaking effort. In the film, Turing leads a team of about six mathematicians who are depicted as designing, building, and operating the bombe on their own.
In reality, thousands of people worked at Bletchley Park. The construction of the bombes involved hundreds of people across several manufacturing facilities. By the war’s end, the UK had dozens of bombes operated by hundreds of women.
Turing indisputably played an important role in British codebreaking efforts, but he was only able to accomplish what he did with the support of thousands of other men and women performing a wide range of tasks.
Stories about the existential risk from AI exhibit some of the same cognitive biases you see in Hollywood movies. Doomers expect an AI system to achieve artificial general intelligence, start improving its own design, and quickly transform itself into a superintelligence. Then they fear it could take over the world and kill all human beings.
People sometimes dismiss this kind of scenario as science fiction, which has always struck me as silly. AI has already enabled technologies like self-driving cars and talking computers that were science fiction a generation ago. It’s entirely reasonable to expect the future to have even more of a science fiction vibe.
The problem is that many doomer scenarios feel oversimplified in a movie-like way. Movies tend to involve fewer characters and simpler plotlines than real historical events. The fate of the world sometimes hinges on a handful of crucial decisions by a movie’s main character. This is rarely true in the real world.
Of course, the point of a movie is entertainment, not historical accuracy or technical plausibility. But stories of AI doom purport to be predictions about what will actually happen. So it’s not a good sign that they so often feel like the script of a Hollywood movie.
AI won’t have a red-pill moment
Hollywood plotlines are sometimes built around specific moments that “change everything.” Alan Turing’s realization in The Imitation Game that he could use a known-plaintext attack is one example. Others include Neo taking the red pill in The Matrix and Peter Parker being bitten by a radioactive spider in Spider-Man.
The real world rarely works like this. Turing and his colleagues actually spent months refining their code-breaking techniques, gradually reducing the amount of time it took to decrypt German communications. Everyone knows about the Wright Brothers’ first powered flight at Kitty Hawk in 1903, but few people know about the years of difficult work—both before and after their first flight—required to build a commercially viable airplane.
In stories of AI doom, the Hollywood-style turning point is the moment when an AI system achieves “artificial general intelligence” and begins a “fast takeoff” toward superintelligence.
Here’s how philosopher Nick Bostrom describes that moment in his influential 2014 book Superintelligence:
A successful seed AI would be able to iteratively enhance itself: an early version of the AI could design an improved version of itself, and the improved version—being smarter than the original—might be able to design an even smarter version of itself, and so forth. Under some conditions, such a process of recursive self-improvement might continue long enough to result in an intelligence explosion—an event in which, in a short period of time, a system’s level of intelligence increases from a relatively modest endowment of cognitive capabilities (perhaps sub-human in most respects, but with a domain-specific talent for coding and AI research) to radical superintelligence.
In reality, the process of recursive self-improvement has already started: companies like Meta make heavy use of LLMs to help them build the next iteration of LLMs. Here are some examples from the Llama 3.1 white paper that Meta published last month:
“We create a training set of cleaned web documents, describe the quality requirements, and instruct Llama 2’s chat model to determine if the documents meet these requirements.”
“Both the code and reasoning classifiers are DistilledRoberta models trained on web data annotated by Llama 2.”
“We perform quality ranking of multilingual documents using a multilingual Llama 2-based classifier.”
“We prompt Llama 3 checkpoints to rate each sample on a three point scale.”
“We use Llama 3 and the code expert to generate a large quantity of synthetic SFT dialogs.”
In total, I count more than 30 times Meta used Llama-based models to either filter out low-quality training data or generate high-quality training data. Other leading AI labs have not been as transparent as Meta, but I’d be surprised if they weren’t using similar techniques.
So when it comes to filtering and augmenting training data, AI systems are already doing a lot of the work to build their successors. I expect this to become increasingly true over time. But it will be many years—if ever—before companies stop hiring human beings to oversee the process.
So too with improving the architecture of AI systems. I expect that leading AI companies are already using LLM-based code completion tools to help them write their code, and that these tools will get better over time. In a few years there may be coding agents writing 80, 90, or 95 percent of the code for the next generation of AI models.
But I don’t think there will be any clear-cut moment when AIs “take over” this job from human programmers. And even if this did happen, it wouldn’t necessarily lead to a sudden increase in productivity because AIs will already be doing most of the work prior to that point.
So Bostrom was right to anticipate a process of recursive self-improvement. But I think he was wrong to predict a big discontinuity when AIs begin improving themselves. Rather, the future is likely to look a lot like the past, with each generation of technology making it a little bit easier to create the next generation.
One thing I think Bostrom missed was just how big and complex AI systems would become. Bostrom envisions an AI that can “design an improved version of itself,” which seemed reasonable in 2014. But nobody designs today’s most powerful AI systems. Rather, modern frontier models are the result of increasingly complex pipelines that involve gathering training data, developing better architectures, and building massive computing clusters.
So at the same time the Llama team has increasingly relied on LLMs, the company has also dramatically expanded its human workforce. The “Contributors and Acknowledgements” section of the Llama 3.1 paper lists 220 “core contributors,” 312 “contributors,” and thanks another 204 people for their “invaluable support” or “helpful contributions.” That’s 736 people who helped create the Llama 3.1 models. By comparison, around 170 people are credited in the Llama 2 paper.
I expect future AI systems to be even more complex than Llama 3. So even with a ton of help from AI, it will still take a large staff of human programmers, and months of work, to get the job done. Humans and AIs will complement each other. For example, humans may increasingly focus on real-world tasks like building larger data centers and convincing subject matter experts to supply specialized training data.
Bostrom’s scenario where an AI system achieves superintelligence in “minutes, hours, or days” may have sounded plausible in 2014. But it makes no sense now that we know how complex the AI training process can get.
There won’t be just one superintelligent AI
A lot of movies—especially science fiction films and comic book adaptations—focus on individuals with unique skills. There is only one Spider-Man, and only a small number of people in the Star Wars universe have the potential to be Jedi. The extraordinary abilities of these select few give them great power over ordinary people.
This makes for good stories, but the real world mostly doesn’t work like this. Yes, people with extraordinary skill, intelligence, or courage can make a difference in the world. But the most consequential efforts tend to be the collective work of thousands of people, like the codebreakers at Bletchley Park.
Bostrom predicts that the first superintelligent AI system will be so powerful that it will become a “singleton,” a single entity that gains control over the whole world. Just as modern chess software can beat the best human chess players, so a superintelligent AI will be able to beat human beings in all aspects of life, from business to military conquest.
I’ve long been skeptical of this concept of superintelligence for reasons that I’ve written about before. But even if you take the concept at face value, it’s important to consider another question: how many superintelligent AI systems will there be?
Bostrom envisions the existence of a single superintelligent AI system that acts like a supervillain in a comic book movie, rapidly improving its abilities and accumulating money and power. But when OpenAI releases a new LLM, each customer gets their own instance of ChatGPT that operates separately from all the others.
This matters for AI takeover scenarios.
If you pit a human being against today’s best chess software, the software is going to win. But if you give the human player her own powerful chess software, it’s going to be much more of a fair fight.
So too in a hypothetical future with superintelligent AI systems. In a world where there’s only one instance of the most powerful AI system, that instance might find a way to gain vast power. But if everyone has access to their own superhuman AI assistant, the picture looks different. A human being can say “Hey Google, it looks like a chatbot is trying to take over the world. What’s the best way to defend myself?”
You might wonder if the company that invents the first superintelligent AI might change its business strategy and tightly limit access. But there’s actually a technical reason for AI companies to continue their current approach.
There’s a famous riddle that illustrates the issue: If it takes one man 60 seconds to dig a hole, does that mean 60 men can dig a hole in 1 second? Obviously not.
Something similar is true for computing power. The world has vast amounts of computing power—far more than can be efficiently used by a single AI model. So if a company wants to get maximum value out of a newly-trained model, it’s going to have to create many instances of the model and make them available to different customers. This should help to limit how much power any single instance of an AI model can accumulate.
AI systems are not people
So far I’ve focused on the scenario sketched out by Nick Bostrom in his hugely influential book Superintelligence. But in the decade since that book was published, other writers have published alternative doom scenarios. Two examples are Holden Karnofsky and Dan Hendrycks.
These thinkers do not expect a “fast takeoff” scenario where a single AI model takes over the world. Instead, they envision a future where people create billions of AI systems for a variety of tasks. Human beings lose control over these AI systems and over time the AI systems become wealthier and more powerful than human beings.
Underlying these scenarios, I think, is a powerful intuition that advanced AI systems will be like people—that they will have distinct personalities and will have human characteristics like greed and ambition. And it’s natural for people to think of AI systems as virtual people, since until recently the only intelligent entities we knew about were human beings. Movies like to portray AI systems as singular entities with human-like personalities: think of C-3PO from Star Wars, HAL from 2001, or the robot in Short Circuit.
But most real-world AI systems aren’t like this. Think about the self-driving software powering Waymo’s self-driving cars, for example. Not only does the Waymo Driver not have any personality to speak of, the software has no capacity for longer-term planning because its state gets reset at the end of every ride.
Every powerful AI system I can think of has this characteristic. Nobody worries about DeepMind’s AlphaFold (for protein folding) or AlphaGo (for playing Go) developing goals of their own and eventually outcompeting human beings. ChatGPT does have something of a personality, but it gets reset at the start of every conversation.
Doomers, of course, are predicting that future AI systems will be more “agentic.” They believe that models with the capacity to form and pursue long-term goals will be far more useful, and so they expect models like that to replace many of today’s more limited AI systems.
I think this argument mixes up different aspects of human intelligence.
It’s certainly useful for AI systems to have the capacity to creatively overcome short-term setbacks. For example, if the Waymo Driver detects that a vehicle is parked in a travel lane, it will sometimes cross a double yellow line to go around it the way a human being would. In other words, the Waymo Driver is agentic in a limited sense.
But it wouldn’t make sense for the Waymo Driver to be broadly agentic—for example, letting it decide which customers to pick up or how much money to charge. Those high-level decisions are set by human Waymo employees or by more traditional software algorithms.
I expect a similar point to apply for other AI systems.
For example, people envision a future where AI systems serve as scientists, and assume that such a system would need to be highly agentic to do its job. But it seems more likely that we’ll have a science chatbot that helps a human scientist design experiments and analyze the results. It might generate code that instructs robots in automated labs to actually carry out the experiments.
But a human scientist is going to want to have the final say over which experiments actually get carried out. And there would be no reason for a science chatbot to form goals of its own beyond the narrowly scientific ones set for it by its human user.
Conceivably a science chatbot might tell its human user “you could complete this experiment more quickly if you hacked into someone else’s data center” and offer to write exploit code to accomplish that. But there’d be no reason to give it the power to actually do something like that—and plenty of reasons not to.
The lone genius is overrated
I’m not the first person to make this argument, of course. Back in 2016, the writer Gwern Branwen wrote an influential essay arguing that “tool AIs” (like a chatbot) will inevitably get outcompeted by agentic AI systems with the capacity to act directly in the world.
He argued that keeping a human being in the loop is economically inefficient. An AI system that is able to take actions directly will be faster and more efficient than an AI system that relies on a human being to implement its recommendations. And hence in a competitive market we should expect agentic AI systems to displace tool AI systems.
But I think this confuses strategic and tactical decision-making. It obviously wouldn’t work for a self-driving system to seek human approval each time it needed to turn the steering wheel. But it works fine to let the human rider pick the destination. More broadly, it’s perfectly feasible to give an AI system autonomy over short-term tactical decisions while deferring to humans on big-picture strategic choices.
A common counterargument here is that the human being won’t be smart enough to accept the AI’s recommendations. A superintelligent AI might have insights that are too sophisticated for ordinary humans to understand, so giving a human being a veto can only make things worse.
This is similar to the Hollywood trope of the misunderstood genius whose work is too complex for others to understand. In The Imitation Game, Alan Turing spends much of the movie defending his expensive codebreaking machine against skeptics who believe it will never work.
The real world rarely works like that, however. The mark of a true genius is often his ability to explain his ideas in ways that lesser minds can understand. Turing not only needed to explain the concept of the bombe to his superiors in the British government, he needed to make the interface simple enough that hundreds of ordinary women could be trained to operate them.
Turing also developed a set of codebreaking techniques that he taught to other mathematicians. In short, part of Turing’s genius was his ability to make his insights understandable to others without his intellectual gifts.
By the same token, it’s not too much to expect a superhuman AI system to explain its recommendations in terms ordinary human beings can understand. This will be particularly important because many high-stakes decisions have both moral and practical dimensions. There are often tradeoffs between performance, cost, safety, and other factors. Human beings are going to want to make those tradeoffs themselves, not go along with whatever a neural network happens to prefer.
The effort to engage with the ideas is still appreciated, but I think this largely argues with strawmen of AI risk arguments. Would love to respond if I had more time.
Two quick things:
1) Recursive self improvement only becomes a dominant force once the last few human bottlenecks are automated.
2) Agentic AI that takes the human out of the loop will outcompete safe, responsible systems that don't. An AGI CEO or AGI military decision making system would decimate adversaries with humans in charge. Also, at some point we'll stop understanding why an AI comes to their conclusion. Even if it explains it to us, we wouldn't vet for it. So even if AI stays in an advisory role, companies and countries effectively still have to do what the advisor tells them or their adversaries will outcompete them.
Love this whole piece. It strikes at the core of what ultimately drove me away from LessWrong style thinking after many years steeped in it (including hosting an ACX meetup).
Software engineering is a process of find bug, debug it, uncover next bug, debug it, etc. Hard takeoff is based on the idea that we will suddenly break out of that paradigm and AI will self-debug better than humans can do. That simply doesn't resemble the way software development has ever worked or will ever work. We get faster and faster at debugging, we handle and abstract away bug-prone patterns, but there's always another bug to be hit when venturing into completely unexplored territory.
I would extend the Waymo analogy to note that the pattern of thinking around hard takeoff right now is falling into the exact same trap we once fell into with self-driving cars: looking at the rate of change for the first (and easiest) 80% of the problem and assuming it will continue into the last 20%, where the nastiest edge cases lie.
Before we get anything resembling truly human-level agentic AGI we will probably go through that same process, where we gradually hit more and more exotic edge cases. But as with self-driving cars, it only takes one edge case to make the whole system spin out of control in a way that makes it unusable. That's a major impediment for current-gen agents, and while I think we'll whittle away at the problem with each advance in capability, the idea that we'll hit some unexpectedly critical threshold and it'll abruptly disappear is a form of magical thinking.