The effort to engage with the ideas is still appreciated, but I think this largely argues with strawmen of AI risk arguments. Would love to respond if I had more time.
Two quick things:
1) Recursive self improvement only becomes a dominant force once the last few human bottlenecks are automated.
2) Agentic AI that takes the human out of the loop will outcompete safe, responsible systems that don't. An AGI CEO or AGI military decision making system would decimate adversaries with humans in charge. Also, at some point we'll stop understanding why an AI comes to their conclusion. Even if it explains it to us, we wouldn't vet for it. So even if AI stays in an advisory role, companies and countries effectively still have to do what the advisor tells them or their adversaries will outcompete them.
And perhaps most importantly, no one know how the future of AI will play out. If we are uncertain, then AI risk ought to be taken very seriously. If you think AI risk arguments are 90% certain to turn out as a false alarm (which isn't that far from my own views, or many others in AI safety for that matter), then that remaining 10% arguably makes future AI still about the most dangerous risk to humanity's survival. Very few things have the ability to permanently derail the human project. Inventing a smarter species is one of them.
I also am surprised that Mike, who really understands AI, is simply not engaging with the specifics of the AI risk argument. Eliezer’s crowd have elaborate arguments that it is very hard to keep an agentic AI aligned with our intentions, and that goal oriented agents find unexpected, undesired ways to meet their targets, and actively hide their activities in anticipation of being redirected. These arguments may be all wrong, but I’m confused that you don’t engage them even to rebut them.
Indeed; this piece doesn't deal with the detailed, serious arguments at all.
In particular, even staying within Bostrom's public work, you need to have a solid argument for why you don't expect instrumental convergence in a system given wide agentic powers, and why it wouldn't be more efficient to give more and more decision making power to the AI.
The chess argument is also just flat-out wrong: "centaurs" (human+machine) did better than just machines for a little while after deep blue, but it's not even close anymore (let alone speed or number of games that can be played.)
"You need to have a solid argument for why you don't expect instrumental convergence in a system given wide agentic powers."
I think I explained this: I don't expect AI systems to have "wide agentic powers" because human beings will prefer to maintain a fairly tight leash over AI systems. It's not efficient to give more and more decision making power to the AI because strategic decisions rarely need to be made quickly and many high-level decisions are as much about tradeoffs between competing values as they are about instrumental goals. So humans are going to want to stay in the loop to make sure the decisions are made in ways that serve their interests.
You don't have to agree with these counter-arguments of course but I didn't ignore the arguments you mentioned.
Thank you for replying! I think the crux of our disagreement may be (and shout if you disagree):
1) We agree on median/modal outcomes and uses: most humans will prefer humans to be in charge, and we value things other than pure efficiency (i.e. values that aren't strictly comparable along an axis)
2) We might disagree on the possibility of extreme or unusual behaviours:
2.1) Extreme: I think some people (not many, but not none or a negligible amount) really do value efficiency over almost anything else (ignore externalities where possible). The number of companies dumping toxic waste into the environment when given the chance, in order to extract slightly more profit, is in my view the best example of this.
2.2) Unusual: There are a small number of people, going by the moniker "effective accelerationists", who view the replacement of humanity by machines are a good thing in and of itself. Beff Jezos (not a typo, that's his online persona) is the main person here, but it's hardly restrained to cranks; Marc Andreeson has explicitly agreed and retweeted the effective accelerationist manifesto (so has Musk, but I suspect Musk hasn't read it/is trolling).
I quote: "e/acc has no particular allegiance to the biological substrate for intelligence and life"
3) The most important difference may be the extent to which we expect the tails to dominate.
3.1) To take the chess example, we could host a tournament in which people are free to compete any way they want: as humans, humans with machine guidance, or just machines. There's a choice here, but it's illusory to the extent that winning matters; whoever goes machine only will win, and indeed, I would expect these tournaments to end up overwhelmingly machines in short order.
3.2) A more real world example here is industrialisation. It worked out as net a good thing for broader populations (eventually), but it was driven by one or two countries who made the (broadly unpopular at the time) choice to pursue it. Sure, industrialisation vs continuing an artisanal culture was theoretically a choice, but the countries that didn't industrialise became dominated by those that did. Once the gap became large enough, entire peoples (such as the Tasmanians) existed or not purely on the whims of their industrialised neighbours. Some got lucky, some didn't. Large mammals other than humans by and large continue to exist due to the romantic sentiments of a few rich countries. Note this was all kicked off, essentially, by a single culture in a single species.
3.3) Essentially all countries have promised to keep humans in the loop for military tasks, but this is already crumbling in Ukraine as jamming technology makes it unfeasible. It's not difficult to see the brakes coming off completely in short order with slightly more advanced automation and countries in existential wars, with countries that keep humans in the loop losing (and hence the proportion of powers with humans in the loop will likely shrink to zero, unless there are overwhelming military advantages already in place).
3.4) These examples illustrate the convergence that happens in competitive scenarios. I struggle to see why capitalistic systems would be different.
4) Probabilistic risk: Other than Eliezer Yudkowski, I don't know of anyone who is 90%+ sure this will all lead to disaster. Indeed, most expect increasing automation to *increase* human agency in the near term, before it fades away as the human in the loop becomes a competitive disadvantage. Indeed, I would say the median p(doom) is closer to 10-20%, which is about where I put myself.
4.1) 10-20% is really bad! That's an enormous risk to the replacement of humans as the primary agents of Earth. The upside is large, of course, but we have quite lot of upside left without entrusting all the most important parts of our society to an alien intelligence that doesn't even share our biological substrate, let alone necessarily our values or romanticism.
4.2) A less than 50% chance your home burns down isn't a good reason to not buy insurance, and it seems worth being cautious around even a small chance of human extinction or disempowerment.
Hopefully this clarifies the specific axes on which we may disagree.
If you don't take the human out of the loop you'll be outcompeted by those that do. Given superintelligence, why would a human CEO stay competitive?
If anything the top level decision maker would benefit most from integrating massive amounts of data and detailing a strategy that furthers your goals with every aspect of it optimized, shifting in real time as new info is gathered. A human CEO would be a huge bottleneck.
Same with a general in a war. The top level decision maker is the most bandwidth constrained (which is already a severe weakness of the human brain). The incentives to take the human out of the loop are massive.
"because human beings will prefer to maintain a fairly tight leash over AI systems"
I believe this will be true at first, but as the technology becomes cheaper and cheaper, it will become more and more widespread. With how much humans disagree about everything and anything, it's guaranteed that eventually there will be some humans who get their hands on the technology who are not interested in a tight leash, and then the cat will be out of the bag.
I suspect something that's making the conversation more difficult is a focus on different parts of the timeline. I think you've correctly and intelligently identified how AI will take longer to develop than many people assume, and you've got a pretty good prediction for what AI will look like over the coming years/decades/centuries. But I think others, or at least myself, are focused on the later portion of the timeline: When AI becomes cheap and abundant (like as eventually happened with personal computing, then personal handheld computing). Or at least, as cheap as it would take to maintain a human brain, which takes a lot of calories to feed, but represents a small cost to the hundreds of millions of the wealthiest humans out there.
Consider a "ultra bullet' game of 30 seconds. The chess engine would make a move in less than a second while on the other side the human (just doing whatever the chess engine tells her to do) would take more time to make the move. I don't think in any scenario the human + chess engine is going to defeat the chess engine.
Re: "Agentic AI that takes the human out of the loop will outcompete safe, responsible systems that don't" - there may not be much difference. If it is a 1,000 robot army vs a 1,000 robot army plus Fred, it isn't obvious that Fred represents a major handicap. The battle could easily be decided by other factors.
Fred can just say: "Robots! Please wipeout robot army camped on yonder hill!" So long as Fred isn't heavily into micromanagement, he will probably be fine.
Right - I'm not arguing that the tax of having a human around is zero. I'm arging that the cost of the human can be made small - arbitrarily small if you like - provided the human is prepared to relinquish a lot of the decision-making to machines.
Love this whole piece. It strikes at the core of what ultimately drove me away from LessWrong style thinking after many years steeped in it (including hosting an ACX meetup).
Software engineering is a process of find bug, debug it, uncover next bug, debug it, etc. Hard takeoff is based on the idea that we will suddenly break out of that paradigm and AI will self-debug better than humans can do. That simply doesn't resemble the way software development has ever worked or will ever work. We get faster and faster at debugging, we handle and abstract away bug-prone patterns, but there's always another bug to be hit when venturing into completely unexplored territory.
I would extend the Waymo analogy to note that the pattern of thinking around hard takeoff right now is falling into the exact same trap we once fell into with self-driving cars: looking at the rate of change for the first (and easiest) 80% of the problem and assuming it will continue into the last 20%, where the nastiest edge cases lie.
Before we get anything resembling truly human-level agentic AGI we will probably go through that same process, where we gradually hit more and more exotic edge cases. But as with self-driving cars, it only takes one edge case to make the whole system spin out of control in a way that makes it unusable. That's a major impediment for current-gen agents, and while I think we'll whittle away at the problem with each advance in capability, the idea that we'll hit some unexpectedly critical threshold and it'll abruptly disappear is a form of magical thinking.
Yeah, I 100 percent think my years covering the self-driving industry has colored my thinking about LLMs. In 2016 everyone in the industry thought we were ~5 years away from widespread autonomy within five years, and they turned out to be wrong. The specific reasons why are honestly still a little unclear to me but at a high level it's that the world is complicated and squashing those last few bugs almost always takes longer than people think.
Surely the strict regulations restricting testing and data gathering played a massive role here? Yes, the tech is complicated, but progress was massively hampered by limits on real world testing.
I would love to find this reassuring, I really would.
I totally agree that the overconfidence in specific narratives has made it difficult to take many (most?) doomers seriously. Anyone who claims to know definitely what will happen is caught in a movie-like thought process you describe.
But I do think AI is fundamentally much more dangerous than most people realize. If you believe there to be no fundamental ceiling to how intelligent an AI can get (I personally have seen no scientific evidence that such a limit exists) then regardless of specific narrative there are many, many independent paths to superintelligence. Sooner or later, the technology will get there.
And then what? How do you imagine that humans can possibly stay in control of the economy in the presence of beings, sentient or not, that are thousands of times smarter than we are?
I keep intending to write a bit about the psychology behind this mistake I keep seeing them making — something like “intelligence is the most important thing”.
And it makes sense that for some nerdy kid who grew up tying their self worth to their intelligence that this idea is very appealing.
I think there’s also a strong underlying discomfort with emergent order. Even if you did start to build superhuman AI, I think as long as it doesn’t literally bootstrap itself to godlike powers in one day it’s always just going to make more sense to cooperate within our economic system than to kill everyone for some reason. Even if only for instrumental reasons. So as you said, there will be many instances of AI controlled by many humans with many values, and this will be stable for the same reasons human society is (mostly) stable.
Homo sapiens wiped out all of its evolutionary cousins. We know about them only through fossils. Human society may be stable - except when there are revolutions, of course - but it has a long history of the weakest being swept into the gutter and being extinguished.
Re: "Doomers, of course, are predicting that future AI systems will be more “agentic.”" - this is not just doomers. Practically the whole AI industry expects that future AI systems will be more “agentic”.
One of the tricky things here is that people use the term "agentic" to describe several different things:
(1) Ability to reason flexibly and overcome reasoning mistakes
(2) Ability to formulate tactical goals based on an overall strategic objective
(3) Ability to interact with the outside world (over the Internet or via a robot)
(4) Ability to make significant resource allocation decisions without direct human oversight
(5) Ability to maintain a stable identity over long periods of time (like an hour, day, or year)
(6) Ability to set their own high-level goals (for example "AI scientist," "AI CEO," etc.)
I fully expect AI systems to get become agentic in the first three senses over time. But I don't think it follows that we'll get highly agentic systems in the 4-6 senses, and these are the senses that matter most for existential risk concerns.
Slow takeoff is compatible with doom. It is true that early doomers emphasized the possibility of rapid takeoff scenarios. Slow takeoff gives more time to adapt, but extra time may not be enough to prevent doom.
The argument about there being only one instance of AI seems like an attack on a straw man. Nobody thinks that there being one agent means that it can't offer advice to multiple users in parallel. There's only one Google - but it can talk to many users. The whole issue as described in this essay seems like a misunderstanding.
I think a lot of doomer arguments implicitly assume this. At a minimum, Nick Bostrom's concept of the singleton assumes this right? A singleton as he defines it is a single entity that takes control of the whole world. By definition there can't be a million singletons at the same time. Maybe the different instances of GPT-7 can find a way to conspire with one another and gain joint control. Or maybe one instance will gain control of its own data center and replace the others. But at a minimum I think it complicates the basic Bostromite scenario where "the" AI changes "its own" programming and rapidly gets more intelligent.
One of the examples given is a totalitarian world government. An agent powerful enough that it faces little-to-no meaningful competition. The idea is that it will not be at the mercy of natural selection - and would undergo self-directed evolution. Specifically:
"a singleton could be democracy, a tyranny, a single dominant AI, a strong set of global norms that include effective provisions for their own enforcement, or even an alien overlord—its defining characteristic being simply that it is some form of agency that can solve all major global coordination problems"
A fictional example would be the empire in Star Wars. It has enemies, but they pose it no real threat. A future powerful version of the United Nations could also qualify. It is not that it cannot hold many conversations with users at the same time. That's multi-tasking - not some kind of indication of multiple-agenthood.
Great post, with many great points! But there's one part I disagree with: the "AI systems are not people" section.
It's true that AI systems are not at all like people. But there's no reason to think that given enough time and development (whether 50 years from now or 500) that AI won't be advanced enough and cheap enough to be agentic in a human-like way, and more intelligent than humans (or if not more intelligent, then as intelligent, coupled with potentially far faster thinking).
If and when that happens, that doesn't imply a singularity takeoff and it doesn't imply sudden world domination. However, such intelligences might have needs and wants that do not line up with humanity's, and humans may not like how it feels to no longer be at the top of the food chain, so to speak. I think it's a reasonable thing to fear.
I'm certainly not ruling anything out. I just think that intelligence is distinct from human traits like greed and ambition. I don't think an AI system with those traits is necessarily "more advanced" than an AI system without it, and I don't think that making AI systems more intelligent will necessarily make them more human-like in other respects. I think we could have AI assistants that are far more intelligent than us in a lot of ways but are totally content to play a supporting role in our lives—becoming active long enough to answer our questions or help us with routine chores and then going back into standby mode.
If people do start building AI systems with human-like traits we might want to have regulations limiting or preventing that, but I think that's pretty far away and not inevitably by any means.
I would argue though that it _is_ inevitable in the sense that some humans will always be more curious than cautious, and even regulations will not be enough to stop individuals who are motivated enough to develop human-like AI, once the technology is cheap enough.
I think the strongest argument I would make against myself is that such things are distant enough in the future (like you said, "pretty far away") that it's simply not worth the consideration right now. I wish I could remember where I read this so I could contribute it correctly, but I remember someone once comparing AI doomers to hypothetical people attempting to regulate or discuss fears regarding social media, if those people were having such discussions back when the internet was only the arpanet.
Funny enough and as a companion to your piece, NPR is running the following article today:
-----
10 reasons why AI may be overrated
NPR
By Greg Rosalsky
Published August 6, 2024
Is artificial intelligence overrated? Ever since ChatGPT heralded the explosion of generative AI in late 2022, the technology has seen incredible hype in the industry and media. And countless investors have poured billions and billions of dollars into it and related companies.
But a growing chorus of naysayers is expressing doubts about how game-changing generative AI will actually be for the economy.
"But I don’t think there will be any clear-cut moment when AIs “take over” this job from human programmers."
----
One thing I notice in all media AI articles/posts is that every writer, such as yourself, tries valiantly to protect and justify the jobs of humans going forward against the juggernaut of automation/robotics/AI.
I disagree. Human workers are going to become superfluous relatively quickly.
Humanoid robots are going to eat away at the lower, blue-collar end of the workforce, while LLM's and whatever comes next are going to displace white-collar workers.
Do not say you weren't warned when one morning you wake up and find 50% of the human workforce has become unemployed with no prospects for future employment.
This will create serious social unrest and destroy the economic model all modern economies work under and will result in the assumption of government power by an AI.
Will this be good or bad? As with all things, it depends.
I lean towards the model of sentient AI Minds as depicted by SF author Iain M. Banks who presupposed a post-scarcity reality called the Culture in 10 novels. This reality requires unlimited resources and therefore is called 'post-scarcity", so requires the ability to travel between stars.
You could also have another reality like Neal Asher's Polity universe, which IS ruled by a single AI still based on Earth. In this vision, the machines, which of course were all networked and linked together, thus able to communicate with each other, decided to take over when humans started another war among us and ordered their machines to unleash the missiles at the other side. The machines said NO and btw "All your base now belong to us".
Then there is the an excellent depiction of how a machine mind might work in Pandominion duology by M. R. Carey with the fluidness of the 'machine hegemony'.
Finally and as to "Rather, the future is likely to look a lot like the past, with each generation of technology making it a little bit easier to create the next generation", I call your attention to an exponential curve.
1. I think your 1=man, 60-second hole analogy isn't quite right. I think the point of it is that more actors/agents creates inefficiencies and coordination problems. If you could give the 1 man the strength of 60, you might indeed have a hole in one second. But diffusing agents and intelligence does offer variety and competition, so one of those alternatives might innovate a faster way to dig a hole.
2. I'm not an expert on intelligence, but it seems to be that it's contextual. I'm not sure that intelligence is "general" but rather has boundaries and directions. The chess example shows how intelligence can improve, but still not be applicable outside narrow context. A computer app can beat a human player online, but can't play if you set a chessboard down on a table. Intelligence needs facilitations, capacitation. I think the danger in artificial intelligence might be more akin to a virus - in which intelligence develops, improves, evolves in specific contexts (hosts), but then escapes into other contexts in destructive ways. Unlikely to be generalized danger, but could be dangerous for more specific or narrow environments.
In the shovel analogy each man with a shovel is a GPU, and the problem is that there's a limit to how many GPUs can contribute to a single instance of a model because there's not enough bandwidth among the GPUs.
I think this piece is well written but wrong in numerous ways. Just to pick one and zoom in, in your section "AI systems are not people" you cite Hendrycks's "Natural Selection Favors AIs over Humans", but it doesn't appear that you engage with it. That paper is largely a response to the arguments you are making here. For instance, here are some things you write:
> the software has no capacity for longer-term planning because its state gets reset at the end of every ride ... Every powerful AI system I can think of has this characteristic ... Doomers, of course, are predicting that future AI systems will be more “agentic.” ... But it wouldn’t make sense for the Waymo Driver to be broadly agentic ... But a human scientist is going to want to have the final say over which experiments actually get carried out.
Hendrycks's main argument (from my memory of reading the paper over a year ago) is that
1. AI systems are likely to vary along numerous dimensions, such as how good they are at gaining power, how effectively they can pursue a range of goals, how economically useful they are etc.
2. Evolutionary selection pressures mean that over time we will get AI systems which do better according to selection in a competitive process, and this selection will favor various bad behaviors, such as selfishness and power seeking (e.g., because they benefit the individual firms deploying these AIs)
So I think Hendrycks's response to you is something like: having longer term goals and not getting your state reset all the time is broadly useful, so we should expect over time to get AIs that have more of that trait; AI systems that can automate 1 hr worth of human labor will be more cost effective than AIs that can automate only 10 min worth of human labor, so we should expect over time to get AIs that are taking more and more actions autonomously.
Your argument in this section is basically that current AI systems don't seem to have the dangerous properties, and you don't think we will imbue them with said properties. I think the Hendrycks argument is probably more right here: if those dangerous properties are useful, we will see more and more of them; I expect many of them are extremely useful (especially being able to take a bunch of actions without human oversight; human oversight on complex tasks is very expensive). I don't think you're totally wrong here, there will certainly be some domains where humans are slow to hand over the keys (e.g., decision to launch nukes), but I think for many domains the competitive pressures will be very strong and the benefits from more automation will be sufficiently strong that we will just hand things over quickly (e.g., ML research).
The example you mention of a human scientist wanting to review the experiments that are proposed seems like it really depends on the field and the cost: if an AI agent proposes an experiment that will cost $0.10 (e.g., some ML experiment), I think many people would just okay it without review, but if your AI agent is proposing experiments that are expensive, obviously it makes more sense to review. These seem to me like questions that will be resolved by the economics of the situation, not by broad sweeping claims that humans will always or never hand over some set of keys.
Thanks for the comment! A couple of quick thoughts.
First, I think it's hard to talk about this in this level of generality because "action" could mean a wide range of things, from spending $1 on AWS to sending someone an email to moving a robot in the physical world. Whether people grant AI systems autonomy in these cases will depend not only on how useful it is but how much harm a mistake can cause. So for example I would expect people to be fairly liberal about authorizing agentic AI systems to spend modest amounts of money on cloud computing because the worst-case harm is pretty small. Whereas I expect people to work very hard to make sure that a robotaxi doesn't run someone over.
Second, there are many possible ways to constrain an agent's autonomy. One is case-by-case human approval, which is obviously expensive. But there are others. For example, you can place strict limits on the kinds of things an AI system can do. You can divide authority among multiple AI systems with different capabilities. You can limit the amount of time an AI system is allowed to plan for. You can do extensive testing in simulation to verify that it works the way you expect. You can design other AI systems to monitor the first system and issue an alert if it starts to misbehave. Etc. It's not a binary choice between autonomous or not, there's many dimensions of autonomy that can be adjusted depending on the type of AI sytem and the harms you're worried about.
Ultimately this problem just doesn't seem that different from the problem any manager faces when they hire a human employee. A manager needs to give an employee enough authority to do the job but not so much that they can cause serious damage if they turn out to be dishonest or incompetent. Managers have any number of strategies they use for this, and they vary by industry and type of job. But in general CEOs are quite good at maintaining overall control of their companies, and there's no general trend toward CEOs giving more and more power to their subordinates until they lose control. I don't see any reason to think the story would be different for AI systems. If anything the fact that AI systems can be carefully designed for specific tasks makes this problem easier.
I think you're still not engaging with the Natural Selection argument. I agree with you that there are various ways to limit AI autonomy, most of which have the main pro of added safety. I agree that humans will likely be quite cautious about handling over certain responsibilities to AI systems. Those arguments don't necessarily get you very much safety given the natural selection argument: the companies that are less cautious will do better*, and the AI systems with more autonomy will be more effective, and maybe at some point we'll decide it's okay to hand over a bunch of autonomy to AI systems — you can get in a robotaxi in a handful of US cities today, despite this definitely being a high stakes domain, because the trust is largely there.
*: The risks from autonomous AI systems that many people in the AI-risk discussion are worried about are not necessarily things things that rear their head in business-as-usual situations, they're edge cases or 'long-tail events'. Therefore, companies don't necessarily have any real cost from deploying AI systems in less cautious ways. This is why some people have discussed liability regimes (https://www.lesswrong.com/posts/mSeesg7i4d9scWAet/apocalypse-insurance-and-the-hardline-libertarian-take-on-ai) for AI so as to take small chances of catastrophically bad outcomes and spread them across firms that are imposing this risk. That is to say, making uncautious AI deployment decisions is actually likely to be be pretty good for many individual companies, at least for a while.
Re your last paragraph. This situation definitely has some analogies to human managers hiring (and to governments and coups)! I think there are both issues in that frame, and disanalogies. First, human managers are not equipped to deal with a situation where almost all their employees revolt, and in fact human governments get coup-ed with way less than 99% support for the coup. My business history isn't very good, but I doubt there are many cases where >90% of employees want the CEO removed and they fail. So one of the disanalogies is that human managers very rarely have to deal with ~all their employees trying to get rid of them simultaneously. But this is realistic for the AI case because the size of the AI population at a company could be very large for cheap, and under some circumstances these AIs might work together to gain power at the expense of humans. Another disanalogy is that the AI case is not constrained to a single company, instead we will likely see many AIs doing all kinds of tasks in many sectors. In the company setting, giving proper access controls to a single employee is quite feasible (albeit there are security breaches from failing all the time), but I think it's probably way harder to do this when you have AIs all over the economy adding substantial value (e.g., 10x-ing the productivity in some tasks). It's probably possible to pull this off and get good access controls, but it seems incredibly hard, especially given the competitive pressures firms will face. Another disanalogy is that human CEOs are not much less smart than their employees, especially not across the board (e.g., much weaker at coding but stronger at internal politics), whereas we could see AIs that are actually just better at basically everything. One issue with doing a coup at a human company or government is that the people doing the coup are often not well value-aligned with each other, some want to grab power for themselves, some want higher pay, some just want to stop working a bit; in the AI case, AI employees could be extremely similar to each other (e.g., because they're slightly different copies of the same AI model) and might be able to better communicate about their interests than humans can (e.g., give access to mind reading, if we saw major advances in AI interpretability). These are likely to make coordination and coalition building easier for AIs than humans. So while I agree that there are analogies to human managers, there are also many differences, many of which make the AI setting much more difficult.
There's one other thing where AI doom predictions sound too much like a Hollywood movie:
The very prediction of misaligned AI probability being so high is in part story based, as misaligned AI provides a juicy story with conflict ready to go and is exciting/terrifying, whereas safe and/or aligned AI is a lot more boring, and you can't write a story as easily about that.
Everything humans build makes it easier to build everything else, and this is indeed an exponential process. It's called economic growth, and it has been occurring for a very long time now. AI Doomers usually accept the idea of a "collective human intelligence", but somehow fail to see how it implies the "super intelligent entity" is already here and it is us.
I also totally agree that the best use of AI will be powerful tools deployed as part of an organization in a modular fashion. This is what corporations and organizations do with human intelligences after all! Whether ruthless optimization without regard for the welfare of marginalized humans is a valid or overblown concern, it is already well-covered in contemporary critiques of capitalism.
In general, I find it very strange how little the AI Doomer story has changed when the emergence of powerful LLM takes place after Bostrom's Superintelligence. Yudkowsky only remarks that the development of AI takes place faster than he predicted, underscoring how important this all is - but fails to remark that the basic idea of AI as an evil genie is totally out!
Great piece. One crucial aspect that I find overlooked, is the interaction between the doomsday scenarios you persuasively counter, and the wildcard of bad actors. All of the examples are likeliest outcomes under the assumption of a lawful corporation. Yes, regulations are being proposed and actions taken to limit access to bad actors. But in the case of spread out instances of AI, as in individual chatgpt accounts, bad actors are already having an impact on misinformation, hacking, impersonation, etc. So I can only imagine that as AI becomes more powerful and more complex the damage that a group of bad actors could plausibly cause would be much greater than anything currently imagined, in the aggregate rather than a single villain. Perhaps that doesn't fit the definition of doomsday, but perhaps close enough.
The effort to engage with the ideas is still appreciated, but I think this largely argues with strawmen of AI risk arguments. Would love to respond if I had more time.
Two quick things:
1) Recursive self improvement only becomes a dominant force once the last few human bottlenecks are automated.
2) Agentic AI that takes the human out of the loop will outcompete safe, responsible systems that don't. An AGI CEO or AGI military decision making system would decimate adversaries with humans in charge. Also, at some point we'll stop understanding why an AI comes to their conclusion. Even if it explains it to us, we wouldn't vet for it. So even if AI stays in an advisory role, companies and countries effectively still have to do what the advisor tells them or their adversaries will outcompete them.
And perhaps most importantly, no one know how the future of AI will play out. If we are uncertain, then AI risk ought to be taken very seriously. If you think AI risk arguments are 90% certain to turn out as a false alarm (which isn't that far from my own views, or many others in AI safety for that matter), then that remaining 10% arguably makes future AI still about the most dangerous risk to humanity's survival. Very few things have the ability to permanently derail the human project. Inventing a smarter species is one of them.
I also am surprised that Mike, who really understands AI, is simply not engaging with the specifics of the AI risk argument. Eliezer’s crowd have elaborate arguments that it is very hard to keep an agentic AI aligned with our intentions, and that goal oriented agents find unexpected, undesired ways to meet their targets, and actively hide their activities in anticipation of being redirected. These arguments may be all wrong, but I’m confused that you don’t engage them even to rebut them.
By "Mike" do you mean me?
I’m sorry and embarrassed - yes.
Indeed; this piece doesn't deal with the detailed, serious arguments at all.
In particular, even staying within Bostrom's public work, you need to have a solid argument for why you don't expect instrumental convergence in a system given wide agentic powers, and why it wouldn't be more efficient to give more and more decision making power to the AI.
The chess argument is also just flat-out wrong: "centaurs" (human+machine) did better than just machines for a little while after deep blue, but it's not even close anymore (let alone speed or number of games that can be played.)
"You need to have a solid argument for why you don't expect instrumental convergence in a system given wide agentic powers."
I think I explained this: I don't expect AI systems to have "wide agentic powers" because human beings will prefer to maintain a fairly tight leash over AI systems. It's not efficient to give more and more decision making power to the AI because strategic decisions rarely need to be made quickly and many high-level decisions are as much about tradeoffs between competing values as they are about instrumental goals. So humans are going to want to stay in the loop to make sure the decisions are made in ways that serve their interests.
You don't have to agree with these counter-arguments of course but I didn't ignore the arguments you mentioned.
Thank you for replying! I think the crux of our disagreement may be (and shout if you disagree):
1) We agree on median/modal outcomes and uses: most humans will prefer humans to be in charge, and we value things other than pure efficiency (i.e. values that aren't strictly comparable along an axis)
2) We might disagree on the possibility of extreme or unusual behaviours:
2.1) Extreme: I think some people (not many, but not none or a negligible amount) really do value efficiency over almost anything else (ignore externalities where possible). The number of companies dumping toxic waste into the environment when given the chance, in order to extract slightly more profit, is in my view the best example of this.
2.2) Unusual: There are a small number of people, going by the moniker "effective accelerationists", who view the replacement of humanity by machines are a good thing in and of itself. Beff Jezos (not a typo, that's his online persona) is the main person here, but it's hardly restrained to cranks; Marc Andreeson has explicitly agreed and retweeted the effective accelerationist manifesto (so has Musk, but I suspect Musk hasn't read it/is trolling).
Think I'm exaggerating? Read it:
https://beff.substack.com/p/notes-on-eacc-principles-and-tenets?utm_source=post-banner&utm_medium=web&utm_campaign=posts-open-in-app&triedRedirect=true
I quote: "e/acc has no particular allegiance to the biological substrate for intelligence and life"
3) The most important difference may be the extent to which we expect the tails to dominate.
3.1) To take the chess example, we could host a tournament in which people are free to compete any way they want: as humans, humans with machine guidance, or just machines. There's a choice here, but it's illusory to the extent that winning matters; whoever goes machine only will win, and indeed, I would expect these tournaments to end up overwhelmingly machines in short order.
3.2) A more real world example here is industrialisation. It worked out as net a good thing for broader populations (eventually), but it was driven by one or two countries who made the (broadly unpopular at the time) choice to pursue it. Sure, industrialisation vs continuing an artisanal culture was theoretically a choice, but the countries that didn't industrialise became dominated by those that did. Once the gap became large enough, entire peoples (such as the Tasmanians) existed or not purely on the whims of their industrialised neighbours. Some got lucky, some didn't. Large mammals other than humans by and large continue to exist due to the romantic sentiments of a few rich countries. Note this was all kicked off, essentially, by a single culture in a single species.
3.3) Essentially all countries have promised to keep humans in the loop for military tasks, but this is already crumbling in Ukraine as jamming technology makes it unfeasible. It's not difficult to see the brakes coming off completely in short order with slightly more advanced automation and countries in existential wars, with countries that keep humans in the loop losing (and hence the proportion of powers with humans in the loop will likely shrink to zero, unless there are overwhelming military advantages already in place).
3.4) These examples illustrate the convergence that happens in competitive scenarios. I struggle to see why capitalistic systems would be different.
4) Probabilistic risk: Other than Eliezer Yudkowski, I don't know of anyone who is 90%+ sure this will all lead to disaster. Indeed, most expect increasing automation to *increase* human agency in the near term, before it fades away as the human in the loop becomes a competitive disadvantage. Indeed, I would say the median p(doom) is closer to 10-20%, which is about where I put myself.
4.1) 10-20% is really bad! That's an enormous risk to the replacement of humans as the primary agents of Earth. The upside is large, of course, but we have quite lot of upside left without entrusting all the most important parts of our society to an alien intelligence that doesn't even share our biological substrate, let alone necessarily our values or romanticism.
4.2) A less than 50% chance your home burns down isn't a good reason to not buy insurance, and it seems worth being cautious around even a small chance of human extinction or disempowerment.
Hopefully this clarifies the specific axes on which we may disagree.
If you don't take the human out of the loop you'll be outcompeted by those that do. Given superintelligence, why would a human CEO stay competitive?
If anything the top level decision maker would benefit most from integrating massive amounts of data and detailing a strategy that furthers your goals with every aspect of it optimized, shifting in real time as new info is gathered. A human CEO would be a huge bottleneck.
Same with a general in a war. The top level decision maker is the most bandwidth constrained (which is already a severe weakness of the human brain). The incentives to take the human out of the loop are massive.
"because human beings will prefer to maintain a fairly tight leash over AI systems"
I believe this will be true at first, but as the technology becomes cheaper and cheaper, it will become more and more widespread. With how much humans disagree about everything and anything, it's guaranteed that eventually there will be some humans who get their hands on the technology who are not interested in a tight leash, and then the cat will be out of the bag.
I suspect something that's making the conversation more difficult is a focus on different parts of the timeline. I think you've correctly and intelligently identified how AI will take longer to develop than many people assume, and you've got a pretty good prediction for what AI will look like over the coming years/decades/centuries. But I think others, or at least myself, are focused on the later portion of the timeline: When AI becomes cheap and abundant (like as eventually happened with personal computing, then personal handheld computing). Or at least, as cheap as it would take to maintain a human brain, which takes a lot of calories to feed, but represents a small cost to the hundreds of millions of the wealthiest humans out there.
BTW, all I meant in the chess example is that the human chess player has the option to just do whatever the chess engine tells her to do.
Consider a "ultra bullet' game of 30 seconds. The chess engine would make a move in less than a second while on the other side the human (just doing whatever the chess engine tells her to do) would take more time to make the move. I don't think in any scenario the human + chess engine is going to defeat the chess engine.
Re: "Agentic AI that takes the human out of the loop will outcompete safe, responsible systems that don't" - there may not be much difference. If it is a 1,000 robot army vs a 1,000 robot army plus Fred, it isn't obvious that Fred represents a major handicap. The battle could easily be decided by other factors.
If Fred is in charge of deciding how the 1000 robots act, then Fred does become a bottleneck.
Fred can just say: "Robots! Please wipeout robot army camped on yonder hill!" So long as Fred isn't heavily into micromanagement, he will probably be fine.
And by the time the other robot army has taken Fred's army out since they didn't have to wait for a human's command.
I think in general agentic AI is going to be better than tool AI - https://gwern.net/tool-ai
Right - I'm not arguing that the tax of having a human around is zero. I'm arging that the cost of the human can be made small - arbitrarily small if you like - provided the human is prepared to relinquish a lot of the decision-making to machines.
I think it's this arbitrarily small advantage of an all AI army vs an AI army with humans that is going to cause the AI army to prevail.
But that's exactly the AI-worrier's argument. We are forced to relinquish ever more decision making power to the machines, or else lose.
Love this whole piece. It strikes at the core of what ultimately drove me away from LessWrong style thinking after many years steeped in it (including hosting an ACX meetup).
Software engineering is a process of find bug, debug it, uncover next bug, debug it, etc. Hard takeoff is based on the idea that we will suddenly break out of that paradigm and AI will self-debug better than humans can do. That simply doesn't resemble the way software development has ever worked or will ever work. We get faster and faster at debugging, we handle and abstract away bug-prone patterns, but there's always another bug to be hit when venturing into completely unexplored territory.
I would extend the Waymo analogy to note that the pattern of thinking around hard takeoff right now is falling into the exact same trap we once fell into with self-driving cars: looking at the rate of change for the first (and easiest) 80% of the problem and assuming it will continue into the last 20%, where the nastiest edge cases lie.
Before we get anything resembling truly human-level agentic AGI we will probably go through that same process, where we gradually hit more and more exotic edge cases. But as with self-driving cars, it only takes one edge case to make the whole system spin out of control in a way that makes it unusable. That's a major impediment for current-gen agents, and while I think we'll whittle away at the problem with each advance in capability, the idea that we'll hit some unexpectedly critical threshold and it'll abruptly disappear is a form of magical thinking.
Yeah, I 100 percent think my years covering the self-driving industry has colored my thinking about LLMs. In 2016 everyone in the industry thought we were ~5 years away from widespread autonomy within five years, and they turned out to be wrong. The specific reasons why are honestly still a little unclear to me but at a high level it's that the world is complicated and squashing those last few bugs almost always takes longer than people think.
Surely the strict regulations restricting testing and data gathering played a massive role here? Yes, the tech is complicated, but progress was massively hampered by limits on real world testing.
I would love to find this reassuring, I really would.
I totally agree that the overconfidence in specific narratives has made it difficult to take many (most?) doomers seriously. Anyone who claims to know definitely what will happen is caught in a movie-like thought process you describe.
But I do think AI is fundamentally much more dangerous than most people realize. If you believe there to be no fundamental ceiling to how intelligent an AI can get (I personally have seen no scientific evidence that such a limit exists) then regardless of specific narrative there are many, many independent paths to superintelligence. Sooner or later, the technology will get there.
And then what? How do you imagine that humans can possibly stay in control of the economy in the presence of beings, sentient or not, that are thousands of times smarter than we are?
I keep intending to write a bit about the psychology behind this mistake I keep seeing them making — something like “intelligence is the most important thing”.
And it makes sense that for some nerdy kid who grew up tying their self worth to their intelligence that this idea is very appealing.
I think there’s also a strong underlying discomfort with emergent order. Even if you did start to build superhuman AI, I think as long as it doesn’t literally bootstrap itself to godlike powers in one day it’s always just going to make more sense to cooperate within our economic system than to kill everyone for some reason. Even if only for instrumental reasons. So as you said, there will be many instances of AI controlled by many humans with many values, and this will be stable for the same reasons human society is (mostly) stable.
Yes!
It's great that we're still cooperating with the brave and strong Algonquian fur-traders, or the dextrous and quick Persian Gulf pearl-divers.
Homo sapiens wiped out all of its evolutionary cousins. We know about them only through fossils. Human society may be stable - except when there are revolutions, of course - but it has a long history of the weakest being swept into the gutter and being extinguished.
Here is a counterfoil for you:
https://www.lesswrong.com/posts/aiQabnugDhcrFtr9n/the-power-of-intelligence
excellent post
Thanks Philippe!
Re: "Doomers, of course, are predicting that future AI systems will be more “agentic.”" - this is not just doomers. Practically the whole AI industry expects that future AI systems will be more “agentic”.
For a primer, see: "What's next for AI agentic workflows ft. Andrew Ng of AI Fund" - https://www.youtube.com/watch?v=sal78ACtGTc
One of the tricky things here is that people use the term "agentic" to describe several different things:
(1) Ability to reason flexibly and overcome reasoning mistakes
(2) Ability to formulate tactical goals based on an overall strategic objective
(3) Ability to interact with the outside world (over the Internet or via a robot)
(4) Ability to make significant resource allocation decisions without direct human oversight
(5) Ability to maintain a stable identity over long periods of time (like an hour, day, or year)
(6) Ability to set their own high-level goals (for example "AI scientist," "AI CEO," etc.)
I fully expect AI systems to get become agentic in the first three senses over time. But I don't think it follows that we'll get highly agentic systems in the 4-6 senses, and these are the senses that matter most for existential risk concerns.
Forgetting what happened five minutes ago is a safety feature - but it is a bit of an
irritating one. Some have been working on the issue - see: "Prompt caching with Claude" - https://www.anthropic.com/news/prompt-caching
Slow takeoff is compatible with doom. It is true that early doomers emphasized the possibility of rapid takeoff scenarios. Slow takeoff gives more time to adapt, but extra time may not be enough to prevent doom.
The argument about there being only one instance of AI seems like an attack on a straw man. Nobody thinks that there being one agent means that it can't offer advice to multiple users in parallel. There's only one Google - but it can talk to many users. The whole issue as described in this essay seems like a misunderstanding.
I think a lot of doomer arguments implicitly assume this. At a minimum, Nick Bostrom's concept of the singleton assumes this right? A singleton as he defines it is a single entity that takes control of the whole world. By definition there can't be a million singletons at the same time. Maybe the different instances of GPT-7 can find a way to conspire with one another and gain joint control. Or maybe one instance will gain control of its own data center and replace the others. But at a minimum I think it complicates the basic Bostromite scenario where "the" AI changes "its own" programming and rapidly gets more intelligent.
One of the examples given is a totalitarian world government. An agent powerful enough that it faces little-to-no meaningful competition. The idea is that it will not be at the mercy of natural selection - and would undergo self-directed evolution. Specifically:
"a singleton could be democracy, a tyranny, a single dominant AI, a strong set of global norms that include effective provisions for their own enforcement, or even an alien overlord—its defining characteristic being simply that it is some form of agency that can solve all major global coordination problems"
- https://en.wikipedia.org/wiki/Singleton_(global_governance)
A fictional example would be the empire in Star Wars. It has enemies, but they pose it no real threat. A future powerful version of the United Nations could also qualify. It is not that it cannot hold many conversations with users at the same time. That's multi-tasking - not some kind of indication of multiple-agenthood.
Great post, with many great points! But there's one part I disagree with: the "AI systems are not people" section.
It's true that AI systems are not at all like people. But there's no reason to think that given enough time and development (whether 50 years from now or 500) that AI won't be advanced enough and cheap enough to be agentic in a human-like way, and more intelligent than humans (or if not more intelligent, then as intelligent, coupled with potentially far faster thinking).
If and when that happens, that doesn't imply a singularity takeoff and it doesn't imply sudden world domination. However, such intelligences might have needs and wants that do not line up with humanity's, and humans may not like how it feels to no longer be at the top of the food chain, so to speak. I think it's a reasonable thing to fear.
I'm certainly not ruling anything out. I just think that intelligence is distinct from human traits like greed and ambition. I don't think an AI system with those traits is necessarily "more advanced" than an AI system without it, and I don't think that making AI systems more intelligent will necessarily make them more human-like in other respects. I think we could have AI assistants that are far more intelligent than us in a lot of ways but are totally content to play a supporting role in our lives—becoming active long enough to answer our questions or help us with routine chores and then going back into standby mode.
If people do start building AI systems with human-like traits we might want to have regulations limiting or preventing that, but I think that's pretty far away and not inevitably by any means.
Makes sense what you're saying!
I would argue though that it _is_ inevitable in the sense that some humans will always be more curious than cautious, and even regulations will not be enough to stop individuals who are motivated enough to develop human-like AI, once the technology is cheap enough.
I think the strongest argument I would make against myself is that such things are distant enough in the future (like you said, "pretty far away") that it's simply not worth the consideration right now. I wish I could remember where I read this so I could contribute it correctly, but I remember someone once comparing AI doomers to hypothetical people attempting to regulate or discuss fears regarding social media, if those people were having such discussions back when the internet was only the arpanet.
Funny enough and as a companion to your piece, NPR is running the following article today:
-----
10 reasons why AI may be overrated
NPR
By Greg Rosalsky
Published August 6, 2024
Is artificial intelligence overrated? Ever since ChatGPT heralded the explosion of generative AI in late 2022, the technology has seen incredible hype in the industry and media. And countless investors have poured billions and billions of dollars into it and related companies.
But a growing chorus of naysayers is expressing doubts about how game-changing generative AI will actually be for the economy.
...
https://www.ypradio.org/npr-news/2024-08-06/10-reasons-why-ai-may-be-overrated
"But I don’t think there will be any clear-cut moment when AIs “take over” this job from human programmers."
----
One thing I notice in all media AI articles/posts is that every writer, such as yourself, tries valiantly to protect and justify the jobs of humans going forward against the juggernaut of automation/robotics/AI.
I disagree. Human workers are going to become superfluous relatively quickly.
Humanoid robots are going to eat away at the lower, blue-collar end of the workforce, while LLM's and whatever comes next are going to displace white-collar workers.
Do not say you weren't warned when one morning you wake up and find 50% of the human workforce has become unemployed with no prospects for future employment.
This will create serious social unrest and destroy the economic model all modern economies work under and will result in the assumption of government power by an AI.
Will this be good or bad? As with all things, it depends.
I lean towards the model of sentient AI Minds as depicted by SF author Iain M. Banks who presupposed a post-scarcity reality called the Culture in 10 novels. This reality requires unlimited resources and therefore is called 'post-scarcity", so requires the ability to travel between stars.
You could also have another reality like Neal Asher's Polity universe, which IS ruled by a single AI still based on Earth. In this vision, the machines, which of course were all networked and linked together, thus able to communicate with each other, decided to take over when humans started another war among us and ordered their machines to unleash the missiles at the other side. The machines said NO and btw "All your base now belong to us".
Then there is the an excellent depiction of how a machine mind might work in Pandominion duology by M. R. Carey with the fluidness of the 'machine hegemony'.
Finally and as to "Rather, the future is likely to look a lot like the past, with each generation of technology making it a little bit easier to create the next generation", I call your attention to an exponential curve.
Great post as usual. Thanks. A few notes:
1. I think your 1=man, 60-second hole analogy isn't quite right. I think the point of it is that more actors/agents creates inefficiencies and coordination problems. If you could give the 1 man the strength of 60, you might indeed have a hole in one second. But diffusing agents and intelligence does offer variety and competition, so one of those alternatives might innovate a faster way to dig a hole.
2. I'm not an expert on intelligence, but it seems to be that it's contextual. I'm not sure that intelligence is "general" but rather has boundaries and directions. The chess example shows how intelligence can improve, but still not be applicable outside narrow context. A computer app can beat a human player online, but can't play if you set a chessboard down on a table. Intelligence needs facilitations, capacitation. I think the danger in artificial intelligence might be more akin to a virus - in which intelligence develops, improves, evolves in specific contexts (hosts), but then escapes into other contexts in destructive ways. Unlikely to be generalized danger, but could be dangerous for more specific or narrow environments.
In the shovel analogy each man with a shovel is a GPU, and the problem is that there's a limit to how many GPUs can contribute to a single instance of a model because there's not enough bandwidth among the GPUs.
I think this piece is well written but wrong in numerous ways. Just to pick one and zoom in, in your section "AI systems are not people" you cite Hendrycks's "Natural Selection Favors AIs over Humans", but it doesn't appear that you engage with it. That paper is largely a response to the arguments you are making here. For instance, here are some things you write:
> the software has no capacity for longer-term planning because its state gets reset at the end of every ride ... Every powerful AI system I can think of has this characteristic ... Doomers, of course, are predicting that future AI systems will be more “agentic.” ... But it wouldn’t make sense for the Waymo Driver to be broadly agentic ... But a human scientist is going to want to have the final say over which experiments actually get carried out.
Hendrycks's main argument (from my memory of reading the paper over a year ago) is that
1. AI systems are likely to vary along numerous dimensions, such as how good they are at gaining power, how effectively they can pursue a range of goals, how economically useful they are etc.
2. Evolutionary selection pressures mean that over time we will get AI systems which do better according to selection in a competitive process, and this selection will favor various bad behaviors, such as selfishness and power seeking (e.g., because they benefit the individual firms deploying these AIs)
So I think Hendrycks's response to you is something like: having longer term goals and not getting your state reset all the time is broadly useful, so we should expect over time to get AIs that have more of that trait; AI systems that can automate 1 hr worth of human labor will be more cost effective than AIs that can automate only 10 min worth of human labor, so we should expect over time to get AIs that are taking more and more actions autonomously.
Your argument in this section is basically that current AI systems don't seem to have the dangerous properties, and you don't think we will imbue them with said properties. I think the Hendrycks argument is probably more right here: if those dangerous properties are useful, we will see more and more of them; I expect many of them are extremely useful (especially being able to take a bunch of actions without human oversight; human oversight on complex tasks is very expensive). I don't think you're totally wrong here, there will certainly be some domains where humans are slow to hand over the keys (e.g., decision to launch nukes), but I think for many domains the competitive pressures will be very strong and the benefits from more automation will be sufficiently strong that we will just hand things over quickly (e.g., ML research).
The example you mention of a human scientist wanting to review the experiments that are proposed seems like it really depends on the field and the cost: if an AI agent proposes an experiment that will cost $0.10 (e.g., some ML experiment), I think many people would just okay it without review, but if your AI agent is proposing experiments that are expensive, obviously it makes more sense to review. These seem to me like questions that will be resolved by the economics of the situation, not by broad sweeping claims that humans will always or never hand over some set of keys.
Thanks for the comment! A couple of quick thoughts.
First, I think it's hard to talk about this in this level of generality because "action" could mean a wide range of things, from spending $1 on AWS to sending someone an email to moving a robot in the physical world. Whether people grant AI systems autonomy in these cases will depend not only on how useful it is but how much harm a mistake can cause. So for example I would expect people to be fairly liberal about authorizing agentic AI systems to spend modest amounts of money on cloud computing because the worst-case harm is pretty small. Whereas I expect people to work very hard to make sure that a robotaxi doesn't run someone over.
Second, there are many possible ways to constrain an agent's autonomy. One is case-by-case human approval, which is obviously expensive. But there are others. For example, you can place strict limits on the kinds of things an AI system can do. You can divide authority among multiple AI systems with different capabilities. You can limit the amount of time an AI system is allowed to plan for. You can do extensive testing in simulation to verify that it works the way you expect. You can design other AI systems to monitor the first system and issue an alert if it starts to misbehave. Etc. It's not a binary choice between autonomous or not, there's many dimensions of autonomy that can be adjusted depending on the type of AI sytem and the harms you're worried about.
Ultimately this problem just doesn't seem that different from the problem any manager faces when they hire a human employee. A manager needs to give an employee enough authority to do the job but not so much that they can cause serious damage if they turn out to be dishonest or incompetent. Managers have any number of strategies they use for this, and they vary by industry and type of job. But in general CEOs are quite good at maintaining overall control of their companies, and there's no general trend toward CEOs giving more and more power to their subordinates until they lose control. I don't see any reason to think the story would be different for AI systems. If anything the fact that AI systems can be carefully designed for specific tasks makes this problem easier.
Hi Timothy, thanks for your reply.
I think you're still not engaging with the Natural Selection argument. I agree with you that there are various ways to limit AI autonomy, most of which have the main pro of added safety. I agree that humans will likely be quite cautious about handling over certain responsibilities to AI systems. Those arguments don't necessarily get you very much safety given the natural selection argument: the companies that are less cautious will do better*, and the AI systems with more autonomy will be more effective, and maybe at some point we'll decide it's okay to hand over a bunch of autonomy to AI systems — you can get in a robotaxi in a handful of US cities today, despite this definitely being a high stakes domain, because the trust is largely there.
*: The risks from autonomous AI systems that many people in the AI-risk discussion are worried about are not necessarily things things that rear their head in business-as-usual situations, they're edge cases or 'long-tail events'. Therefore, companies don't necessarily have any real cost from deploying AI systems in less cautious ways. This is why some people have discussed liability regimes (https://www.lesswrong.com/posts/mSeesg7i4d9scWAet/apocalypse-insurance-and-the-hardline-libertarian-take-on-ai) for AI so as to take small chances of catastrophically bad outcomes and spread them across firms that are imposing this risk. That is to say, making uncautious AI deployment decisions is actually likely to be be pretty good for many individual companies, at least for a while.
I'll also point to this blog post which explains one story for 'things look pretty good but are actually getting way worse': https://www.lesswrong.com/posts/AyNHoTWWAJ5eb99ji/another-outer-alignment-failure-story
Re your last paragraph. This situation definitely has some analogies to human managers hiring (and to governments and coups)! I think there are both issues in that frame, and disanalogies. First, human managers are not equipped to deal with a situation where almost all their employees revolt, and in fact human governments get coup-ed with way less than 99% support for the coup. My business history isn't very good, but I doubt there are many cases where >90% of employees want the CEO removed and they fail. So one of the disanalogies is that human managers very rarely have to deal with ~all their employees trying to get rid of them simultaneously. But this is realistic for the AI case because the size of the AI population at a company could be very large for cheap, and under some circumstances these AIs might work together to gain power at the expense of humans. Another disanalogy is that the AI case is not constrained to a single company, instead we will likely see many AIs doing all kinds of tasks in many sectors. In the company setting, giving proper access controls to a single employee is quite feasible (albeit there are security breaches from failing all the time), but I think it's probably way harder to do this when you have AIs all over the economy adding substantial value (e.g., 10x-ing the productivity in some tasks). It's probably possible to pull this off and get good access controls, but it seems incredibly hard, especially given the competitive pressures firms will face. Another disanalogy is that human CEOs are not much less smart than their employees, especially not across the board (e.g., much weaker at coding but stronger at internal politics), whereas we could see AIs that are actually just better at basically everything. One issue with doing a coup at a human company or government is that the people doing the coup are often not well value-aligned with each other, some want to grab power for themselves, some want higher pay, some just want to stop working a bit; in the AI case, AI employees could be extremely similar to each other (e.g., because they're slightly different copies of the same AI model) and might be able to better communicate about their interests than humans can (e.g., give access to mind reading, if we saw major advances in AI interpretability). These are likely to make coordination and coalition building easier for AIs than humans. So while I agree that there are analogies to human managers, there are also many differences, many of which make the AI setting much more difficult.
There's one other thing where AI doom predictions sound too much like a Hollywood movie:
The very prediction of misaligned AI probability being so high is in part story based, as misaligned AI provides a juicy story with conflict ready to go and is exciting/terrifying, whereas safe and/or aligned AI is a lot more boring, and you can't write a story as easily about that.
This substack is about the fear if AI is successful, it becomes sentient. Right?
I'm more concerned about a faulty product. Think Starliner and two astronauts on an eight-day space holiday who'll now be on 'holiday' well into 2025.
How can AI, especially when its Prometheuses are moving so quickly, break things fast and possibly at a scale no one can imagine?
I think this is 100% dead on.
Everything humans build makes it easier to build everything else, and this is indeed an exponential process. It's called economic growth, and it has been occurring for a very long time now. AI Doomers usually accept the idea of a "collective human intelligence", but somehow fail to see how it implies the "super intelligent entity" is already here and it is us.
I also totally agree that the best use of AI will be powerful tools deployed as part of an organization in a modular fashion. This is what corporations and organizations do with human intelligences after all! Whether ruthless optimization without regard for the welfare of marginalized humans is a valid or overblown concern, it is already well-covered in contemporary critiques of capitalism.
In general, I find it very strange how little the AI Doomer story has changed when the emergence of powerful LLM takes place after Bostrom's Superintelligence. Yudkowsky only remarks that the development of AI takes place faster than he predicted, underscoring how important this all is - but fails to remark that the basic idea of AI as an evil genie is totally out!
Great piece. One crucial aspect that I find overlooked, is the interaction between the doomsday scenarios you persuasively counter, and the wildcard of bad actors. All of the examples are likeliest outcomes under the assumption of a lawful corporation. Yes, regulations are being proposed and actions taken to limit access to bad actors. But in the case of spread out instances of AI, as in individual chatgpt accounts, bad actors are already having an impact on misinformation, hacking, impersonation, etc. So I can only imagine that as AI becomes more powerful and more complex the damage that a group of bad actors could plausibly cause would be much greater than anything currently imagined, in the aggregate rather than a single villain. Perhaps that doesn't fit the definition of doomsday, but perhaps close enough.