The AI safety debate is focusing on the wrong threats
Singularism vs. physicalism: two very different ways of looking at AI risk.
Geoffrey Hinton is a legendary computer scientist whose work laid the foundation for today’s artificial intelligence technology. He was a co-author of two of the most influential AI papers: a 1986 paper describing a foundational technique (called backpropagation) that is still used to train deep neural networks and a 2012 paper demonstrating that deep neural networks could be shockingly good at recognizing images.
That 2012 paper helped to spark the deep learning boom of the last decade. Google hired the paper’s authors in 2013 and Hinton has been helping Google develop its AI technology ever since then. But last week Hinton quit Google so he could speak freely about his fears that AI systems would soon become smarter than us and gain the power to enslave or kill us.
“There are very few examples of a more intelligent thing being controlled by a less intelligent thing,” Hinton said in an interview on CNN last week.
This is not a new concern. The philosopher Nick Bostrom made similar warnings in his widely read 2014 book Superintelligence. At the time most people saw these dangers as too remote to worry about, but a few people found arguments like Bostrom’s so compelling that they devoted their careers to them. As a result, there’s now a tightknit community convinced that AI poses an existential risk to the human race.
I’m going to call their viewpoint singularism—a nod not only to Verner Vinge’s concept of the singularity, but also to Bostrom’s concept of a singleton, an AI (or other entity) that gains control over the world. The singularists have been honing their arguments for the last decade and today they largely set the terms of the AI safety debate.
But I worry that singularists are focusing the world’s attention in the wrong direction. Singularists are convinced that a super-intelligent AI would become powerful enough to kill us all if it wants to. And so their main focus is on figuring out how to ensure that this all-powerful AI winds up with goals that are aligned with our own.
But it’s not so obvious that superior intelligence will automatically lead to world domination. Intelligence is certainly helpful if you’re trying to take over the world, but you can’t control the world without manpower, infrastructure, natural resources, and so forth. A rogue AI would start out without control of any of these physical resources.
So a better way to prevent an AI takeover may be to ensure humans remain firmly in control of the physical world—an approach I’ll call physicalism. That would mean safeguarding our power plants, factories, and other physical infrastructure from hacking. And it would mean being cautious about rolling out self-driving cars, humanoid robots, military drones, and other autonomous systems that could eventually become a mechanism for AI to take over the world.
The intelligence explosion
In 1997, IBM’s Deep Blue computer beat the reigning chess grandmaster Gary Kasparov. In the years since, chess engines have gotten better and better. Today, the strongest chess software has an Elo rating of 3,500, high enough that we should expect it to win almost every game against the strongest human players (who have Elo ratings around 2,800). Singularists see this as a template for AI mastery of every significant activity in the global economy, including important ones like scientific discovery, technological innovation, and warfare.
A key step on the road to AI dominance will be when AI systems get better than people at designing AI systems. At this point, singularists predict that we’ll get an “intelligence explosion” where AI systems work to recursively improve their own code. Because it’s easy to make copies of computer programs, we could quickly have millions of virtual programmers working to improve AI systems, which should dramatically accelerate the rate of progress in AI technology.
I find this part of the singularist story entirely plausible. I see no reason to doubt that we’ll eventually be able to build computer systems capable of performing cognitive tasks at a human level—and perhaps beyond.
Once an AI achieves superintelligence, singularists envision it building some kind of superweapon to take over the world. Obviously, since none of us possess superhuman intelligence, it’s hard to be sure whether this is possible. But I think a good way to sanity-check it is to think about the history of previous superweapons.
Take the atomic bomb, for example. In 1939, physicist Leo Szilard realized that it would be possible to create a powerful new kind of bomb using nuclear fission. So did he go into his garage, build the first atomic bomb, and use it to become the most powerful person on the planet?
Of course not. Instead, Szilard drafted a letter to President Franklin Roosevelt and got Albert Einstein to sign it. That led to the Manhattan Project, which wound up employing tens of thousands of people and spending billions of dollars over a six-year period. When the first atomic bombs were finished in 1945, it was President Harry Truman, not Szilard or other physicists, who got to decide how they would be used.
Maybe a superintelligent AI could come up with an idea for a powerful new type of weapon. But like Szilard, it would need help to build and deploy it. And getting that help might be difficult—especially if the AI wants to retain ultimate control over the weapon once it’s built.
Taking over the world is hard
When I read Bostrom’s Superintelligence, I was surprised that he devotes less than three pages (starting on page 97 in this version) to discussing how an AI takeover might work in concrete terms. In those pages, Bostrom briefly discusses two possible scenarios. One is for the AI to create “self-replicating biotechnology or nanotechnology” that could spread across the world and take over before humans know what is happening. The other would be to create a supervirus to wipe out the human race.
Bostrom’s mention of nanotechnology is presumably a reference to Eric Drexler’s 1986 book envisioning microscopic robots that could construct other microscopic objects one atom at a time. Twenty years later, in 2006, a major scientific review found that the feasibility of such an approach “cannot be reliably predicted.” As far as I can tell, there’s been no meaningful progress on the concept since then.
We do have one example of a nanoscale technology that’s made significant progress in recent years: integrated circuits now have features that are just a few atoms wide, allowing billions of transistors to be packed onto a single chip. And the equipment required to build these nanoscale devices is fantastically expensive and complex: companies like TSMC and Intel spend billions of dollars to build a single chip fabrication plant.
I don’t know if Drexler-style nano-assemblers are possible. But if they are, building the first ones is likely to be a massive undertaking. Like the atomic bomb, it would likely require many skilled engineers and scientists, large amounts of capital, and large research labs and production facilities. It seems hard for a disembodied AI to pull that off—and even harder to do so while maintaining secrecy and control.
Part of Bostrom’s argument is that superintelligent AI would have a “social manipulation superpower” that would enable the AI to persuade or trick people into helping it accomplish its nefarious ends.
Hinton, the deep learning pioneer, voiced similar concerns in his CNN interview. “If it gets to be much smarter than us, it’ll be very good at manipulation, because it will have learned that from us,” he said.
Again, no one has ever encountered a superintelligent AI, so it’s hard to make categorical statements about what it might be able to do. But I think this misunderstands how persuasion works.
Human beings are social creatures. We trust longtime friends more than strangers, and we are more likely to trust people we perceive as similar to ourselves. In-person conversations tend to be more persuasive than phone calls or emails.
A superintelligent AI would have no friends or family and would be incapable of having an in-person conversation with anybody. Maybe it could trick some gullible people into sending it money or sharing confidential information. But what an AI would really need is co-conspirators: people willing to help out with a project over the course of months or years, while keeping their actions secret from friends and family. It’s hard to imagine how an AI could inspire that kind of loyalty among a significant number of people.
The power of specialization
I expect that nothing I’ve written so far is going to be persuasive to committed singularists. Singularists have a deep intuition that more intelligent entities inevitably become more powerful than less intelligent ones.
“One should avoid fixating too much on the concrete details, since they are in any case unknowable and intended for illustration only,” Bostrom writes in Superintelligence. “A superintelligence might—and probably would—be able to conceive of a better plan for achieving its goals than any that a human can come up with. It is therefore necessary to think about these matters more abstractly.”
Stephen Hawking articulated this intuition in a vivid way a few years ago. “You’re probably not an evil ant-hater who steps on ants out of malice,” Hawking wrote. “But if you’re in charge of a hydroelectric green-energy project and there’s an anthill in the region to be flooded, too bad for the ants. Let’s not place humanity in the position of those ants.”
But it’s worth thinking harder about the relationship between human intelligence and our power over the natural world.
If you put a modern human in a time machine and sent him back 100,000 years, it’s unlikely he could use his superior intelligence to establish dominance over a nearby Neanderthal tribe. Even if he was an expert on modern weaponry, he wouldn’t have the time or resources to make a gun before the Neanderthals killed him or he just starved to death.
Humanity’s intelligence gave us power mainly because it enabled us to create progressively larger and more complex societies. A few thousand years ago, some human civilizations grew large enough to support people who specialized in mining and metalworking. That allowed them to build better tools and weapons, giving them an edge over neighboring civilizations.
Specialization has continued to increase, century by century, until the present day. Modern societies have thousands of people working on highly specialized tasks from building aircraft carriers to developing AI software to sending satellites into space. It’s that extreme specialization that gives us almost godlike powers over the natural world.
My favorite articulation of this point came from entrepreneur Anton Troynikov in a recent episode of the Moment of Zen podcast.
“The modern industrial world requires actuators starting from the size of an oil refinery and going down to your scanning electron microscope,” Troynikov said. “The reason that we need all of this vast array of things is that the story of technology is almost the story of tool use. And every one of those tools relies on another layer of tools below them.”
The modern world depends on infrastructure like roads, pipelines, fiber optic cables, ports, warehouses, and so forth. Each piece of infrastructure has a workforce dedicated to building, maintaining, and repairing it. These workers not only have specialized skills and knowledge, they also have sophisticated equipment that enables them to do their jobs.
AI needs us more than we need it
Which brings me to Bostrom’s second scenario for AI takeover. Bostrom predicts that a superintelligent AI might create a virus that wipes out humanity. It’s conceivable that an AI could trick someone into synthesizing a virus in an existing biology lab. I don’t know if an AI-designed virus could literally wipe out humanity, but let’s assume it can for the sake of argument.
The problem, from the AI’s point of view, is that it would still need some humans around to keep its data centers running.
As consumers, we’re used to thinking of services like electricity, cellular networks, and online platforms as fully automated. But they’re not. They’re extremely complex and have a large staff of people constantly fixing things as they break. If everyone at Google, Amazon, AT&T, and Verizon died, the Internet would quickly grind to a halt—and so would any superintelligent AI connected to it.
Could an AI dispatch robots to keep the Internet and its data centers running? Today there are far fewer industrial robots in the world than human workers, and the vast majority of them are special-purpose robots designed to do a specific job at a specific factory. There are few if any robots with the agility and manual dexterity to fix overhead power lines or underground fiber optic cables, drive delivery trucks, replace failing servers, and so forth. Robots also need human beings to repair them when they break, so without people the robots would eventually stop functioning too.
Of course this could change. Over time we may build increasingly capable robots, and in a few decades we may reach the point where robots are doing a large share of physical work. At that point, an AI takeover scenario might become more plausible.
But this is very different from the “fast takeoff” scenario envisioned by many singularists, in which AI takes over the world within months, weeks, or even days of an intelligence explosion. If AI takes over, it will be a gradual, multi-decade process. And we’ll have plenty of time to change course if we don’t like the way things are heading.
Let’s lock down the physical world
Singularists predict that the first superintelligent AI will be the last superintelligent AI because it will rapidly become smart enough to take over the world. If that’s true, then the question of AI alignment becomes supremely important because everything depends on whether the superintelligent AI decides to treat us well or not.
But in a world where the first superintelligent AI won’t be able to immediately take over the world—the world I think we live in—the picture looks different. In that case, there are likely to eventually be billions of intelligent AIs in the world, with a variety of capabilities and goals.
Many of them will be benevolent. Some may “go rogue” and pursue goals independent of their creators. But even if that doesn’t happen, there will definitely be some AIs created by terrorists, criminals, bored teenagers, or foreign governments. Those are likely to behave badly—not because they’re “misaligned,” but because they’re well-aligned with the goals of their creators.
In this world, anything connected to the Internet will face constant attacks from sophisticated AI-based hacking tools. In addition to discovering and exploiting software vulnerabilities, rogue AI might be able to use technologies like large language models and voice cloning to create extremely convincing phishing attacks.
And if a hacker breaches a computer system that controls a real-world facility—say a factory, a power plant, or a military drone—it could do damage in the physical world.
Last week I asked Matthew Middelsteadt, an AI and cybersecurity expert at the Mercatus Center, to name the most important recent examples of hacks like this. He said these were the three most significant in the last 15 years:
In 2010, someone—widely believed to be the U.S. or Israeli government—unleashed a computer worm on computer systems associated with Iran’s nuclear program, slowing Iran’s efforts to enrich uranium.
In 2015, hackers with suspected ties to Russia hacked computers controlling part of the Ukrainian power grid. This caused about 200,000 Ukrainians to lose power, but utility workers were able to restore power within a few hours by bypassing the computers.
In 2021, a ransomware attack hit the billing infrastructure for the Colonial Pipeline, which moves gasoline from Texas to Southeastern United States. The attack shut down the pipeline for a few days, leading to brief fuel shortages in affected states.
This list makes it clear that this is a real problem that we should take seriously. But overall I found this list reassuring. Even if AI makes attacks like this 100 times more common and 10 times more damaging in the coming years, they would still be a nuisance rather than an existential threat.
Middelsteadt points out that the good guys will be able to use AI to find and fix vulnerabilities in their systems. Beyond that, it would be a good idea to make sure that computers controlling physical infrastructure like power plants and pipelines are not directly connected to the Internet. Middelsteadt argues that safety-critical systems should be “air gapped”: made to run on a physically separate network under the control of human workers located on site.
This principle is particularly important for military hardware. One of the most plausible existential risks from AI is a literal Skynet scenario where we create increasingly automated drones or other killer robots and the control systems for these eventually go rogue or get hacked. Militaries should take precautions to make sure that human operators maintain control over drones and other military assets.
Last fall, the US military publicly committed not to put AI in control of nuclear weapons. Hopefully other nuclear-armed powers will do the same.
Notably, these are all precautions we ought to be taking whether or not we think attacks by rogue AIs is an imminent problem. Even if superintelligent AI never tries to hack our critical infrastructure, it’s likely that terrorists and foreign governments will.
Over the longer term, we should keep the threat of rogue AIs in mind as we decide whether and how to automate parts of the economy. For example, at some point we will likely have the ability to make our cars fully self-driving. This will have significant benefits, but it could also increase the danger from misaligned AI
Maybe it’s possible to lock down self-driving cars so they are provably not vulnerable to hacking. Maybe these vehicles should have “manual override” options where a human passenger can shut down the self-driving system and take the wheel. Or maybe locking down self-driving cars is impossible and we’ll ultimately want to limit how many self-driving cars we put on the road.
Robots today are neither numerous nor sophisticated enough to be of much use to a superintelligent AI bent on world domination. But that could change in the coming decades. If more sophisticated and autonomous robots become commercially viable, we’ll want to think carefully about whether deploying them will make our civilization more vulnerable to misaligned AI
The bottom line is that it seems easier to minimize the harm a superintelligent AI can do than to prevent rogue AI systems from existing at all. If superintelligent AI is possible, then some of those AIs will have harmful goals, just as every human society has a certain number of criminals. But as long as human beings remain firmly in control of assets in the physical world, it’s going to be hard for a hostile AI to do too much damage.
Coming up this week
I’m planning two live online events for Understanding AI readers. Please join us!
Tomorrow (Wednesday) at 1pm Eastern/10am Pacific, I’m going to watch the Google I/O keynote address and share my thoughts in real time using Substack’s chat feature. If you’re on a desktop computer, you should be able to access the chat via this link. On mobile, you’ll need to download the Substack app and then look for the chat icon at the bottom of the screen.
On Thursday at 2pm Eastern/11am Pacific, I’m going to host a virtual reading group on Twitter Spaces. We’ll talk about “Attention Is All You Need,” the 2017 Google paper that introduced the transformer. That’s the deep learning model that powers everything from large language models (the T in ChatGPT stands for transformer) to DeepMind’s protein folding work. To participate, please visit my Twitter profile around the start time. There should be a button to join the space. I recommend reading the paper ahead of time, and I’d love to have a few volunteers to help lead the discussion. If you’re interested in doing that please let me know by email. Or just show up on Thursday to listen and ask questions.
This is the best article I've ever seen debunking the fantasies of AI Risk. It's obvious a lot of hard work, scholarship, and thought went into it.
I particularly appreciated your connecting the risk points to real world felons committing felonies. It seems certain that felons have more powerful tools due to LLMs, and (contra singularists) LLMs do not actually have personhood, so I'd argue the felons are "the real story".
I think the key relevant fact is that the goals of the singularists and non-singularists are pretty similar, and their methods can be too!
There is no reason that we can't work on "let's figure out how to make sure AI doesn't wipe us out" and "let's figure out how to make sure AIs work well at whatever application" at the same time - and in fact, they are complementary. The difference between "AI that figures out chess" and "AI that figures out world conquest" is complexity, and so too for "code that stops chess AI from losing to Gary K" and "code/limits that stops general super intelligent AI from taking over the world." We would want to practice doing the simple thing in sufficiently real (but fake) simulated test cases and work our way up to the complex thing.
To take a specific point of contention, the quote "There are very few examples of a more intelligent thing being controlled by a less intelligent thing" is true and insightful, but there is indeed one example of it, and it's one that we can model our alignment efforts on: we humans are very intelligent, but we are extremely controlled by very stupid things: our DNA, our bodies, our chemicals and proteins. (in this metaphor, the limits of our physical bodies exist on the same continuum as our moral limits - as it would for an AI) Even totally unaligned single humans cannot take over the world for eternity because we have pretty strict limits on our capabilities. The fact that an AI would have far less (in some ways) of these physical limits is of course, not reassuring, but the model of "a very complex thing can have relatively simple rules/limits put in it, that constrain it's ability to take over the world" can be used here.
The one caveat is immutability: if a general AI can change the limits placed on it, then they aren't limits. So, how would we create immutable, perpetuating-themselves-up-the-complexity-curve rules that prevent AIs from taking over the world? I agree with the singularists that without rules like that, a sufficiently powerful, self-editing AI would indeed cause something very bad to happen, but I disagree that it is a problem that we can't solve.