The case for AI doom isn't very convincing
I read Eliezer Yudkowsky's new book, "If Anyone Builds It, Everyone Dies."
A striking thing about the AI industry is how many insiders believe AI could pose an existential risk to humanity.
Just last week, Anthropic CEO Dario Amodei described himself as “relatively an optimist” about AI. But he said there was a “25 percent chance that things go really really badly.” Among the risks Amodei worries about: “the autonomous danger of the model.”
In a 2023 interview, OpenAI CEO Sam Altman was blunter, stating that the worst-case scenario was “lights out for all of us.”
No one has done more to raise these concerns than rationalist gadfly Eliezer Yudkowsky. In a new book with co-author Nate Soares, Yudkowsky doesn’t mince words: If Anyone Builds It, Everyone Dies. Soares and Yudkowsky believe that if anyone invents superintelligent AI, it will take over the world and kill everyone.
Normally, when someone predicts the literal end of the world, you can write them off as a kook. But Yudkowsky is hard to dismiss. He has been warning about these dangers since the early 2010s, when he (ironically) helped get some of the leading AI companies off the ground. Legendary AI researchers like Geoffrey Hinton and Yoshua Bengio take Yudkowsky’s concerns seriously.
So is Yudkowsky right? In my mind, there are three key steps to his argument:
Humans are on a path to develop AI systems with superhuman intelligence.
These systems will gain a lot of power over the physical world.
We don’t know how to ensure these systems use their power for good rather than evil.
Outside the AI industry, debate tends to focus on the first claim; many normie skeptics think superintelligent AI is simply too far away to worry about. Personally, I think these skeptics are too complacent. I don’t know how soon AI systems will surpass human intelligence, but I expect progress to be fast enough over the next decade that we should start taking these questions seriously.
Inside the AI industry, many people accept Yudkowsky’s first two premises—superintelligent AI will be created and become powerful—but they disagree about whether we can get it to pursue beneficial goals instead of harmful ones. There’s now a sprawling AI safety community exploring how to align AI systems with human values.
But I think the weakest link in Yudkowsky and Soares’s argument is actually the second claim: that an AI system with superhuman intelligence would become so powerful it could kill everyone. I have no doubt that AI will give people new capabilities and solve long-standing problems. But I think the authors wildly overestimate how transformational the technology will be—and dramatically underestimate how easy it will be for humans to maintain control.
Grown, not crafted
Over the last two centuries, humans have used our intelligence to dramatically increase our control over the physical world. From airplanes to antibiotics to nuclear weapons, modern humans accomplish feats that would have astonished our ancestors.
Yudkowsky and Soares believe AI will unlock another, equally large, jump in our (or perhaps just the AI’s) ability to control the physical world. And the authors expect this transformation to happen over months rather than decades.
Biology is one area where the authors expect radical acceleration.
“The challenge of building custom-designed biological technology is not so much one of producing the tools to make it, as it is one of understanding the design language, the DNA and RNA,” Yudkowsky and Soares argue. According to these authors, “our best wild guess is that it wouldn’t take a week” for a superintelligent AI system to “crack the secrets of DNA” so that it could “design genomes that yielded custom life forms.”
For example, they describe trees as “self-replicating factories that spin air into wood” and conclude that “any intelligence capable of comprehending biochemistry at the deepest level is capable of building its own self-replicating factories to serve its own purposes.”
Ironically, I think the first four chapters of the book do a good job of explaining why it’s probably not that simple.
These chapters argue that AI alignment is a fool’s errand. Due to the complexity of AI models and the way they’re trained, the authors say, humans won’t be able to design AI models to predictably follow human instructions or prioritize human values. I think this argument is correct, but it has broader implications than the authors acknowledge.
Here’s a key passage from Chapter 2 of If Anyone Builds It:
The way humanity finally got to the level of ChatGPT was not by finally comprehending intelligence well enough to craft an intelligent mind. Instead, computers became powerful enough that AIs can be churned out by gradient descent, without any human needing to understand the cognitions that grow inside.
Which is to say: engineers failed at crafting AI, but eventually succeeded in growing it.
“You can’t grow an AI that does what you want just by training it to be nice and hoping,” they write. “You don’t get what you train for.”
The authors draw an analogy to evolution, another complex process with frequently surprising results. For example, the long, colorful tails of male peacocks make it harder for them to flee predators. So why do they have them? At some point, early female peacocks developed a preference for large-tailed males, and this led to a self-reinforcing dynamic where males grew ever larger tails to improve their chances of finding a mate.
“If you ran the process [of evolution] again in very similar circumstances you’d get a wildly different result” than large-tailed peacocks, the authors argue. “The result defies what you might think natural selection should do, and you can’t predict the specifics no matter how clever you are.”
I love this idea that some systems are so complex that “you can’t predict the specifics no matter how clever you are.” But there’s an obvious tension with the idea that after an AI system “cracks the secret of DNA” it will be able to rapidly invent “custom life forms” and “self-replicating factories” that serve the purposes of the AI.
Yudkowsky and Soares believe that some systems are too complex for humans to fully understand or control, but superhuman AI won’t have the same limitations. They believe that AI systems will become so smart that they’ll be able to create and modify living organisms as easily as children rearrange Lego blocks. Once an AI system has this kind of predictive power, it could become trivial for it to defeat humanity in a conflict.
But I think the difference between grown and crafted systems is more fundamental. Some of the most important systems—including living organisms—are so complex that no one will ever be able to fully understand or control them. And this means that raw intelligence only gets you so far. At some point you need to perform real-world experiments to see if your predictions hold up. And that is a slow and error-prone process.
And not just in the domain of biology. Military conflicts, democratic elections, and cultural evolution are other domains that are beyond the predictive power—and hence the control—of even the smartest humans. Many doomers expect that superintelligent AIs won’t face such limitations—that they’ll be able to perfectly predict the outcome of battles or deftly manipulate the voting public to achieve its desired outcome in elections.
But I’m skeptical. I suspect that large-scale social systems like this are so complex that it’s impossible to perfectly understand and control them no matter how clever you are. Which isn’t to say that future AI systems won’t be helpful for winning battles or influencing elections. But the idea that superintelligence will yield God-like capabilities in these areas seems far-fetched.
Chess is a poor model
Yudkowsky and Soares repeatedly draw analogies to chess, where AI has outperformed the best human players for decades. But chess has some unique characteristics that make it a poor model for the real world. Chess is a game of perfect information; both players know the exact state of the board at all times. The rules of chess are also far simpler than the physical world, allowing chess engines to “look ahead” many moves.
The real world is a lot messier. There’s a military aphorism that “no plan survives contact with the enemy.” Generals try to anticipate the enemy’s strategy and game out potential counter-attacks. But the battlefield is so complicated—and there’s so much generals don’t know prior to the battle—that things almost always evolve in ways that planners don’t anticipate.
Many real-world problems have this character: smarter people can come up with better experiments to try, but even the smartest people are still regularly surprised by experimental results. And so the bottleneck to progress is often the time and resources required to gain real-world experience, not raw brainpower.
In chess, both players start the game with precisely equal resources, and this means that even a small difference in intelligence can be decisive. In the real world, in contrast, specific people and organizations start out with control over essential resources. A rogue AI that wanted to take over the world would start out with a massive material disadvantage relative to governments, large corporations, and other powerful institutions that won’t want to give up their power.
There have been historical examples where brilliant scientists made discoveries that helped their nations win wars. Two of the best known are from World War II: the physicists in the Manhattan Project who helped the US build the first nuclear weapons and the mathematicians at Bletchley Park who figured out how to decode encrypted Nazi communications.
But it’s notable that while Enrico Fermi, Leo Szilard, Alan Turing, and others helped the Allies win the war, none of them personally wound up with significant political power. Instead, they empowered existing Allied leaders such as Franklin Roosevelt, Winston Churchill, and Harry Truman.
That’s because intelligence alone wasn’t sufficient to build an atomic bomb or decode Nazi messages. To make the scientists’ insights actionable, the government needed to mobilize vast resources to enrich uranium, intercept Nazi messages, and so forth. And so despite being less intelligent than Fermi or Turing, Allied leaders had no trouble maintaining control of the overall war effort.
A similar pattern is evident in the modern United States. Currently the most powerful person in the United States is Donald Trump. He has charisma and a certain degree of political cunning, but I think even many of his supporters would concede that he is not an intellectual giant. Neither was Trump’s immediate successor, Joe Biden. But it turns out that other characteristics—such as Trump’s wealth and fame—are at least as important as raw intelligence for achieving political power.
We can use superintelligent AI as tools
I see one other glaring flaw with the chess analogy. There’s actually an easy way for a human to avoid being humiliated by an AI at chess: run your own copy of the AI and do what it recommends. If you do that, you’ve got about a 50/50 chance of winning the game.
And I think the same point applies to AI takeover scenarios, like the fictional story in the middle chapters of If Anyone Builds It. Yudkowsky and Soares envision a rogue AI outsmarting the collective intelligence of billions of human beings. That seems implausible to me in any case, but it seems especially unlikely when you remember that human beings can always ask other AI models for advice.
This is related to my earlier discussion of how much AI models can accelerate technological progress. If it were true that a superintelligent AI system could “crack the secrets of DNA” in a week, I might find it plausible that it could gain a large enough technological head start to outsmart all humans.
But it seems much more likely that the first superhuman AI will be only slightly more intelligent than the smartest humans, and that within a few months rival AI labs will release their own models with similar capabilities.
Moreover, it’s possible to modify the behavior of today’s AI models through either prompting or fine-tuning. There’s no guarantee that future AI models will work exactly the same way, but it seems pretty likely that we’ll continue to have techniques for making copies of leading AI models and giving them different goals and behaviors. So even if one isntance of an AI “goes rogue,” we should be able to create other instances that are willing to help us defend ourselves.
So the question is not “will the best AI become dramatically smarter than humans?” It’s “will the best AI become dramatically smarter than humans advised by the second-best AI?” It’s hard to be sure about this, since no superintelligent AI systems exist yet. But I didn’t find Yudkowsky and Soares’s pessimistic case convincing.
An unanswered question is whether superintelligent AI will be able to do all its evil work with existing energy supplies. Or whether it needs to spend a decade battling the planning & licensing system to build an extra 100GW of power and transmission, just like human datacentre developers
I believe you’re underestimating some crucial technical factors that make AI risk more plausible than you indicate. First, frontier models are already demonstrating emergent capabilities—behaviors that weren’t predictable from their training data. Scaling laws provide averages, but they don’t predict sudden jumps in reasoning, planning, or autonomy. That unpredictability makes it difficult to argue that risks are manageable. Second, current alignment methods don’t scale effectively. RLHF and fine-tuning are mainly surface-level controls; they don’t alter a model’s underlying goals or capabilities. We've already seen jailbreaks and deceptive responses. As models grow more agentic, shallow guardrails might fail disastrously. Third, capability externalization is speeding up: open-weights, APIs, and automated tool-use pipelines make it simple to assemble systems functioning as autonomous agents, increasing misuse risks. Finally, even if the probability of doom is low, the issue of strategic stability remains important. Geopolitical pressure to deploy rapidly reduces safety margins—similar to a nuclear arms race. The issue isn’t necessarily certainty of disaster, but that the inherent uncertainty makes AI risk dangerous. Dismissing it as “unconvincing” ignores the very unpredictability that makes AI risk credible.