Six principles for thinking about AI risk
Two Princeton computer scientists argue AI is unlikely to pose an existential risk.
When OpenAI released GPT-4 in March 2023, its surprising capabilities triggered a groundswell of support for AI safety regulation. Dozens of prominent scientists and business leaders signed a statement calling for a six-month pause on AI development. When OpenAI CEO Sam Altman called for a new government agency to license AI models at a Congressional hearing in April 2023, both Democratic and Republican senators seemed to take the idea seriously.
It took longer for skeptics of existential risk to find their footing. This might be because few people outside the tight-knit AI safety community were paying attention to the issue prior to the release of ChatGPT. But in recent months, the intellectual climate has changed significantly, with skeptical arguments gaining more traction.
Last month a pair of Princeton computer scientists published a new book that includes my favorite case for skepticism about existential risks from AI. In AI Snake Oil, Arvind Narayanan and Sayash Kapoor write about AI capabilities in a wide range of settings, from criminal sentencing to moderating social media. My favorite part of the book is Chapter 5, which takes the arguments of AI doomers head-on.
Some skeptics of AI doom are skeptics of AI in general. They downplay the importance of generative AI and question whether it will ever be useful. At the opposite extreme are transhumanists who argue it would actually be a good thing if powerful AI systems superseded human beings.
Narayanan and Kapoor stake out a sensible position between these extremes. They write that generative AI has many “beneficial applications,” adding that “we are excited about them and about the potential of generative AI in general.” But they don’t believe LLMs will become so powerful that they’re able to take over the world.
Some of Narayanan and Kapoor’s arguments are similar to points I’ve made in my newsletter over the last 18 months. So as an alum of the Princeton computer science program where Narayanan teaches and Kapoor is studying, I’m going to label their perspective the Princeton School of AI Safety.
The Princeton School emphasizes continuity between past and future progress in computer technology. It predicts that improvements in AI capabilities will often require a lot of real-world data—data that can only be gathered through slow and costly interactions in the real world. This makes a “fast takeoff” in AI capabilities very unlikely.
The Princeton School is skeptical that future AI systems will have either the capacity or the motivation to gain power in the physical world. They urge policymakers to focus on specific threats, such as cyberattacks or the creation of synthetic viruses. Often the best way to do this is by beefing up security in the physical world—for example by regulating labs that synthesize viruses or requiring that power plants be “air gapped” from the Internet—rather than trying to limit the capabilities of AI models.
Here are six principles for thinking about existential risk articulated by Narayanan and Kapoor in AI Snake Oil.
1. Generality is a ladder
Two of today’s leading AI labs—OpenAI and Google’s DeepMind—were explicitly founded to build artificial general intelligence. A third, Anthropic, was founded in 2021 by OpenAI veterans worried OpenAI wasn’t taking the safety risks from AGI seriously enough.
In a recent essay, Anthropic CEO Dario Amodei predicted that AGI (he prefers the term “powerful AI”) will dramatically accelerate scientific progress.
Views like this are common in the technology industry. A widely read June essay by former OpenAI employee Leopold Aschenbrenner predicted that leading labs would create AGI before the end of the decade, and that AGI would provide a “decisive economic and military advantage” to whichever country gets it first.
Narayanan and Kapoor see things differently.
“We don’t think AI can be separated into ‘general’ and ‘not general,’” they write. “Instead, the history of AI reveals a gradual increase in generality.”
The earliest computing devices were designed for one specific task, like tabulating census results. In the middle of the 20th Century, people started building general-purpose computers that could run a variety of programs. In AI Snake Oil, the Princeton authors argue that machine learning represented another step toward generality.
“The general-purpose computer eliminated the need to build a new physical device every time we need to perform a new computational task; we only need to write software. Machine learning eliminated the need to write new software; we only need to assemble a dataset and devise a learning algorithm suited to that data.”
Pretrained language models like GPT-4 represent yet another step toward generality. Users don’t even need to gather training data. Instead, they can simply describe a task in plain English.
Many companies today are trying to build even more powerful AI systems, including models with the capacity to make long-term plans and reason about complex problems.
“Will any of the thousands of innovations being currently produced lead to the next step on the ladder of generality? We don’t know,” Narayanan and Kapoor write. “Nor do we know how many more steps on the ladder there are.”
These authors view AGI as a “serious long-term possibility.” But they believe that “we’re not very high up on the ladder yet.”
2. Recursive self-improvement isn’t new
One reason many people view AGI as an important threshold is that it could enable a process called recursive self-improvement. Once we have an AI model that is as intelligent as a human AI researcher, we can make thousands of copies of that model and put them to work creating even more powerful AI models.
Narayanan and Kapoor don’t deny that this could happen. Rather, they point out that programmers have been doing this kind of thing for decades.
At the dawn of the software industry, programmers had to write software in binary code, a tedious and error-prone process that made it difficult to write complex programs. Later people created software called compilers to automate much of this tedium. Programmers could write programs in higher-level languages like COBOL or Fortran and a computer would automatically translate those programs into the ones and zeros of machine code.
Over the decades, programmers have created increasingly powerful tools to automate the software development process. For example, cloud computing platforms like Amazon Web Services allow a programmer to set up a new server—a process that used to take hours—with a few clicks.
“There is no way we could have gotten to the current stage in the history of AI if our development pipelines weren’t already heavily automated,” the Princeton pair write. “Generative AI pushes this one step further, translating programmers’ ideas from English (or another human language) to computer code, albeit imperfectly.”
In August, I pointed out another example of a company using AI to create better AI: programmers at Meta used older Llama models to generate data they used to train the Llama 3.1 herd of models.
We should absolutely expect this AI-improving-AI process to continue, and even accelerate, in the coming years. But there’s no reason to expect a discontinuity at the moment a frontier AI lab “achieves AGI.” Rather, we should expect smooth acceleration as AI systems become more powerful and the AI development process becomes more automated. By the time we reach AGI, the process may already be so thoroughly automated that there’s not much room for AGI to further accelerate the process.
3. Real-world experience is essential for new capabilities
At this point I expect some readers will object that I—and the authors of AI Snake Oil—are not taking exponential growth seriously. In February 2020, many people dismissed COVID because it seemed that only a handful of people were getting infected per day. But thanks to the power of compounding growth, thousands of people were getting infected daily by the end of March.
So maybe there won’t be a sudden change at the precise moment when an AI system “achieves AGI.” But you might still expect exponential increases in computing power to produce AI systems that are far more capable than humans.
That might be a reasonable assumption if better algorithms and more computing power were the only things required to make AI systems more capable. But data is the third essential ingredient for any AI system. And unlike computing power, data is not fungible. If you want an AI model to design rockets, you need training data about rockets. Data about French literature or entomology isn’t going to help.
In the last 15 years, the AI companies enjoyed a massive infusion of data scraped off the Internet. That enabled the creation of broad and capable models like GPT-4o and Claude 3.5 Sonnet. But more progress is needed to reach human-level capabilities on a wide range of tasks. And that isn’t just going to take more data—it’s going to require different kinds of data than AI companies have used in the past.
In a piece last year, I argued that real-world experience is essential to mastering many difficult tasks:
There’s a famous military saying that “no plan survives contact with the enemy.” The world is complex, and military planners are invariably working with incomplete and inaccurate information. When a battle begins, they inevitably discover that some of their assumptions were wrong and the battle plays out in ways they didn’t anticipate.
Thomas Edison had a saying that expresses a similar idea: “genius is one percent inspiration and 99 percent perspiration.” Edison experimented with 1,600 different materials to find a good material for the filament in his most famous invention, the electric light bulb.
“I never had an idea in my life,” Edison once said. “My so-called inventions already existed in the environment—I took them out. I’ve created nothing. Nobody does. There’s no such thing as an idea being brain-born; everything comes from the outside.”
In other words, raw intelligence isn’t a substitute for interacting with the physical world. And that limits how rapidly AI systems can gain capabilities.
Narayanan and Kapoor have a similar view.
“Most human knowledge is tacit and cannot be codified,” they write. “Beyond a point, capability improvements will require prolonged periods of learning from actual interactions with people. Much of the most prized and valuable human knowledge comes from performing experiments on people, ranging from drug testing to tax policy.”
The pair point to self-driving cars as an example. In the early years, these vehicles seemed to make rapid progress. By the late 2010s, many companies had built prototype vehicles with basic self-driving abilities.
But the authors write that more recently, progress “has been far slower than experts originally anticipated because they underestimated the difficulty of collecting and learning from real-world interaction data.” Self-driving cars have to deal with a long list of “edge cases” that must be discovered through trial and error on real public streets. It took more than a decade for Waymo to make enough progress to launch its first driverless taxi service in Phoenix. And even Waymo’s cars still rely on occasional assistance from remote operators.
And “unlike self-driving cars, AGI will have to navigate not just the physical world but also the social world. This means that the views of tech experts who are notorious for misunderstanding the complexity of social situations should receive no special credence.”
4. The superintelligence is us
Maybe it will take a while to invent AI systems with superhuman intelligence, but doomers still insist that these systems could be extremely dangerous once they’re invented. Just as human intelligence gives us power over chimpanzees and mice, so the extreme intelligence of future AI systems could give them power over us.
But Narayanan and Kapoor argue that this misunderstands the source of our power over the natural world.
“Humans are powerful not primarily because of our brains but because of our technology,” they write. “Prehistoric humans were only slightly more capable at shaping the environment than animals were.”
So how did humans get so powerful? Here’s how I described the process in an essay I wrote last year:
Humanity’s intelligence gave us power mainly because it enabled us to create progressively larger and more complex societies. A few thousand years ago, some human civilizations grew large enough to support people who specialized in mining and metalworking. That allowed them to build better tools and weapons, giving them an edge over neighboring civilizations.
Specialization has continued to increase, century by century, until the present day. Modern societies have thousands of people working on highly specialized tasks from building aircraft carriers to developing AI software to sending satellites into space. It’s that extreme specialization that gives us almost godlike powers over the natural world.
Doomers envision a conflict where most of the AI systems are on one side and most of the human beings are on the other. But there’s little reason to expect things to work out that way. Even if some AI systems eventually “go rogue,” humans are likely to have AI systems they can use for self-defense.
“The crux of the matter is that AI has already been making us more powerful and this will continue as AI capabilities improve,” Narayanan and Kapoor write. “We are the ‘superintelligent’ beings that the bugbear of humanity-ending superintelligence evokes. There is no reason to think that AI acting alone—or in defiance of its creators—will in the future be more capable than people acting with the help of AI.”
5. Powerful AI can be used for both offense and defense
This isn’t to say we shouldn’t worry about possible harms from new AI. Most new technologies enable new harms and AI isn’t an exception. For example, we’ve already started to see people commit fraud using deepfakes created using generative AI.
AI systems have the potential to be very powerful—and hence to enable even more significant harms in the future. So maybe it would be wise to press pause?
However, AI systems could also have tremendous benefits. And the benefits of a new technology are often closely connected to the harms.
Take cybersecurity as an example. There is little doubt that foundation models will enable the creation of powerful tools that hackers could use to identify and exploit vulnerabilities in computer systems. But Narayanan and Kapoor argue that this isn’t new:
Hackers have long had bug-finding AI tools that are much faster and easier to use than manually searching for bugs in software code.
And yet the world hasn’t ended. Why is that? For the simple reason that the defenders have access to the same tools. Most critical software is extensively tested for vulnerabilities by developers and researchers before it is deployed. In fact, the development of bug-finding tools is primarily carried out not by hackers, but by a multibillion-dollar information security industry. On balance, the availability of AI for finding software flaws has improved security, not worsened it.
We have every reason to expect that defenders will continue to have the advantage over attackers even as automated bug-detection methods continue to improve.
6. Safety regulations should focus on specific threats
Many AI safety experts advocate legislation that focuses on the safety of AI models. California’s failed SB 1047, for example, would have required AI companies to certify that their models were safe before releasing them to the public.
But Narayanan and Kapoor question whether this approach will work.
“The alignment research we can do now with regard to a hypothetical future superintelligent agent is inherently limited,” they write. “We can currently only speculate about what alignment techniques might prevent future superintelligent AI from going rogue. Until such AI is actually built, we just can’t know for sure.”
Instead, they advocate a focus on specific real-world risks. In other words, we should lock down the physical world rather than AI models.
Take biological risks as an example.
“It’s possible that in the future, AI might make it easier to develop pandemic-causing viruses in the lab,” they write. “But it’s already possible to create such viruses in the lab. Based on the available evidence, the lab-leak theory of COVID remains plausible.”
“We need to improve security to diminish the risk of lab leaks. The steps we take will also be a defense against AI-aided pandemics. Further, we should (continue to) regulate the lab components needed to engineer viruses.”
I advocated a similar approach in an essay I wrote last year:
It would be a good idea to make sure that computers controlling physical infrastructure like power plants and pipelines are not directly connected to the Internet. [Matt Mittelsteadt, a scholar at the Mercatus Center] argues that safety-critical systems should be “air gapped”: made to run on a physically separate network under the control of human workers located on site.
This principle is particularly important for military hardware. One of the most plausible existential risks from AI is a literal Skynet scenario where we create increasingly automated drones or other killer robots and the control systems for these eventually go rogue or get hacked. Militaries should take precautions to make sure that human operators maintain control over drones and other military assets.
These precautions won’t just protect us from AI systems that “go rogue,” they will also provide an extra layer of protection against terrorists, foreign governments, and other human attackers—whether or not they use AI in their attacks.
If you enjoyed this article, I encourage you to subscribe to Narayanan and Kapoor’s excellent newsletter, which is also called AI Snake Oil.
I fall somewhere in the middle, and I find some of these arguments pretty lazy. For example, the "good guy with an AI" argument is about as convincing as the "good guy with a gun" argument currently. Maybe it will be fine, maybe not.
The bottom line is that no one knows, and people hate not knowing. So they come up with a plethora of arguments explaining why they actually do know what the future will bring.
Perhaps data is a bottleneck, perhaps development will be continuous, etc. Perhaps not. We can't really rule out that we're ~1 more breakthrough away from highly capable AI, and it seems obvious that there would be some level of self-improvement overhang at that point. It can't be that we've already plucked all the low-hanging fruit along the way. Does all of that adds up to a real problem? We don't know.
A rational and sensible set of arguments like this is never going to get any VCs excited.