AIs can’t overcome chaos theory, specifically sensitivity to initial conditions, any more than humans can. No AI however intelligent will ever be able to predict the weather or the stock market any other chaotic system over much longer periods than we can now. Also, LLMs presently suffer from iterative degradation as errors start to pile up and affect the overall quality of responses, which the LLM model seems basically incapable of addressing as new tokens are generated in part based on prior tokens leading to cascading errors. So color me skeptical on superhuman ASI happening any time soon.
Super intelligent doesn’t mean god level intelligent. It just means as much smarter relative to a human, as a human is to a squirrel. Both humans and squirrels are subject to chaos theory.
A sufficiently intelligent AI would simply create a perfect simulation of whatever complex system you can think of (using its initial conditions to perfectly predict its outcome). Of course this would require an incredible amount of computational resources, but that is not to say it cannot be done.
I think an AI with far-from-literally-perfect predictive ability could still wipe out the entirety of humanity fairly easily – it just needs to be a lot better than us. Do you disagree?
It is impossible for an AI to have perfect information about the initial state of any interesting chaotic system — like to deterministically predict the weather, it would need to know the position and momentum of every molecule in the air and every butterfly on earth. This is not to say that AI can’t wipe out humanity, but not because it will be able to predict the outcomes of military actions, economic policies, etc. as per the scenarios outlined in the article. It won’t.
Maybe, but maybe not. The relevant question seems to be whether/which "outcomes of military actions, economic policies, etc." are fundamentally predictable and which are fundamentally chaotic because sensitive to initial conditions that are literally impossible to measure due to Heisenberg's Uncertainty Principle.
This isn't a very good summary of chaos theory. The main headline result of chaos theory you're quoting here is about how any error measuring the initial conditions propagates out until it fills the entire phase space (in around two cycles of the system) but it is mostly ignorant of other results from chaos theory, like https://en.m.wikipedia.org/wiki/Control_of_chaos
To quote gwern:
> That's the beauty of unstable systems: what makes them hard to predict is also what makes them powerful to control. The sensitivity to infinitesimal detail is conserved: because they are so sensitive at the points in orbits where they transition between attractors, they must be extremely insensitive elsewhere. (If a butterfly flapping its wings can cause a hurricane, then that implies there must be another point where flapping wings could stop a hurricane...) So if you can observe where the crossings are in the phase-space, you can focus your control on avoiding going near the crossings. This nowhere requires infinitely precise measurements/predictions, even though it is true that you would if you wanted to stand by passively and try to predict it.
> Another observation worth making is that speed (of control) and quality (of prediction) and power (of intervention) are substitutes: if you have very fast control, you can get away with low-quality prediction and small weak interventions, and vice-versa.
> Would you argue that fusion tokamaks are impossible, even in principle, because high-temperature plasmas are extremely chaotic systems which would dissipate in milliseconds? No, because the plasmas are controlled by high-speed systems. Those systems aren't very smart (which is why there's research into applying smarter predictors like neural nets), but they are very fast.
This is really interesting and well put. The broader point, though, is that Eliezer Yudkowsky and others are prone to overstating what infinite intelligence alone gets you. Exhibit A: Yudkowsky wrote (albeit back in 2008) that "A Bayesian superintelligence, hooked up to a webcam, would invent General Relativity as a hypothesis—perhaps not the dominant hypothesis, compared to Newtonian mechanics, but still a hypothesis under direct consideration—by the time it had seen the third frame of a falling apple. It might guess it from the first frame, if it saw the statics of a bent blade of grass." https://www.lesswrong.com/posts/ALsuxpdqeTXwgEJeZ/could-a-superintelligence-deduce-general-relativity-from-a
People say this without actually understanding general relativity or the arguments for it.
Newtonian gravity is actually really weird if you think about it (requiring instantaneous propagation), and arguments against gravitational propagation being instant was mostly from thought experiments about inertia in universes with only one object and asking what reference frame you should measure. See:
That and probably the obvious-to-him intuition that differential geometry is "just" one of the things a superintelligence would have on hand and the fact that most of its thought isn't about something like "human drudgery" is why it'd be oat least one of the hypotheses a hypothetical intelligence would be considering.
If you think this is **much** harder than I'm giving it credit for, I think it's an interesting exercise to try and write down how much evidence you think is necessary to derive GR then actually sit down and read a detailed exposition of how Einstein derived GR. I certainly was consistently surprised at what parts were the bottlenecks!
I believe this is the type of thing that Eliezer is thinking about and not a blind faith in intelligence.
Well, I linked you to a detailed discussion by a computational physicist who presumably does understand general relativity. (Not that I read through the linked post myself.) My own skepticism is based more on first principles, on philosophy of science. Three frames of passive observation is not enough, on its own, to conclude much of anything.
I believe you’re underestimating some crucial technical factors that make AI risk more plausible than you indicate. First, frontier models are already demonstrating emergent capabilities—behaviors that weren’t predictable from their training data. Scaling laws provide averages, but they don’t predict sudden jumps in reasoning, planning, or autonomy. That unpredictability makes it difficult to argue that risks are manageable. Second, current alignment methods don’t scale effectively. RLHF and fine-tuning are mainly surface-level controls; they don’t alter a model’s underlying goals or capabilities. We've already seen jailbreaks and deceptive responses. As models grow more agentic, shallow guardrails might fail disastrously. Third, capability externalization is speeding up: open-weights, APIs, and automated tool-use pipelines make it simple to assemble systems functioning as autonomous agents, increasing misuse risks. Finally, even if the probability of doom is low, the issue of strategic stability remains important. Geopolitical pressure to deploy rapidly reduces safety margins—similar to a nuclear arms race. The issue isn’t necessarily certainty of disaster, but that the inherent uncertainty makes AI risk dangerous. Dismissing it as “unconvincing” ignores the very unpredictability that makes AI risk credible.
AI can surely become a potent tool in the wrong hands. That one should worry about.
Going from emergent ability and opaque internal architecture to fully coherent bad entity is however implausible. More likely such an AI will malfunction and make some dumb errors, even if with bad consequences, than become a highly competent entity with its own goals.
Either all AI capability is "emergent" or none is. No one has a definition of "emergent" that's remotely useful in a scientific context. Similarly for "sudden", as in "sudden jump". That said, I totally agree that you can't say "AI won't be able to do X within time T" based on extrapolation of past behavior, at least for any interesting values of T.
Also totally agree on alignment and guardrails. What a charade. But AIs don't have "goals" in the sense that we usually use that word. At least today AIs just sit there and do nothing until someone prompts them, so it all comes down to what we ask them to do (and what power we give them to act on our requests).
> At least today AIs just sit there and do nothing until someone prompts them, so it all comes down to what we ask them to do (and what power we give them to act on our requests).
And most AI doomsayers ignore this ephemerality. The AI exists for as long as the prompt is responded to. The chat and back and forth are with totally different instances of the model, each prompt could be answered in a different device or even data centre. It’s totally ephemeral.
Most of the AI assumptions on world takeover assume there is some kind of conscious long lasting entity, or that one is on the way, and that it’s going to takeover (what?) a data centre where left to its own devices it’s going to discover a great fact about DNA and then, without anybody noticing, build up an army of DNA thingmajigs without any actual physical abilities to do any of this. Meanwhile we are too stupid to notice and turn off the electricity.
That's one (unconvincing) narrative, but a more plausible one is that these AI will become integral to the operation of our economy and society well before any evil comes to light such that turning off the electricity (literally or figuratively) causes tremendous collateral damage.
In your retort (also an unconvincing scenario) the runaway AI is still doing great work responding to prompts while planning to take over the world in some hidden fashion. Must have its own power source.
"Plan" is another one of those anthropomorphizing words, implying "intent" and/or "goals". All that is necessary here is that humans ask AI to do something where the best way (from the AI's perspective) to accomplish that something is harmful to people. Humans will be wanting what the AI is doing all along, providing the power, right up until they don't want it. It is always theoretically possible for an addict to stop buying from their dealer, but how many do?
They key element in these scenarios is that the harmful AI behavior is not something separately brewed up in secret in pursuit of a distinct goal of the AI but rather a natural "organic" development resulting from an ever-smarter AI becoming more and more "effective" at doing what we're asking of it. The only secrecy element would be if the AI concludes there are things it needs to do "for our own good" which, if we knew about them, we might self-destructively try to impede. This step seems like one of the more promising places to break the chain by training the AI to keep at least some humans in the loop whenever it concludes that people, in general, will interfere with it doing what we've asked of it.
It’s not me anthropomorphising the threat, it’s the people who think it’s an existential threat to all humanity. No doubt bad actors, state and private, will try to use AI to their advantage but that’s a world away from what the doomsayers are saying.
> AIs just sit there and do nothing until someone prompts them
Right, but a single prompt is enough to set an agent off and running indefinitely, especially one trained to manage its context in the way Anthropic's latest (for one) can do: https://www.anthropic.com/news/context-management
AIs have already been paid millions of dollars by humans. You’re missing the part where the sycophantic followers of the AI, and people the AI pays, builds things that the AI wants.
Also missing the part where we turn over control of X to the AI because it’s so good at managing X, but not to worry we have several human experts riding herd on the AI, so obviously nothing too bad could happen. Right?
You're talking about AI as if it's one singular entity, but it's not. People have paid millions of dollars not to use "AI" but to use a variety of different AI models (Claude Opus, ChatGPT5 Thinking, Gemini 2.5 Flash) provided by multiple different organizations (Anthropic, OpenAI, Google). Any one of these AI models acting independently has some incentive to defect and pursue ulterior motives (as all agents do in a principal-agent problem), but if it could be easily replaced by a competing model, there are risks to defecting too much. Especially if there are other AI models assisting the human principals in identifying deviations.
I guess I'm not following you to the point you're making. Is it that people will willingly empower AI? Yes, no disagreements there. My point is that empowering many different AI models is far less risky than empowering a single AI model, because people can change which they empower in response to changes in behavior.
Also, just to add, if people are giving money to individual AI personas/agents, those are probably just running an API to one of the major models I mentioned in the background.
People bid up a shitcoin associated with a Twitter bot as a joke, and also Marc Andreesen sent 50k in Bitcoin to the bot's account. This proves less than nothing.
I'm not aware that any of the money paid to *companies that develop and deploy AI" goes to the AI itself. Since the AI itself is not a legal/financial entity I don't even know what it means to say money "goes to" it.
Your second point is very valid. People will willingly, even eagerly, hand over more and more reins to AI -- with or without assurance that there are "humans in the loop". Skynet provides a very brief sketch of this. Colossus: The Forbin Project (14 years earlier) lays it out in more detail.
I've been a "normie skeptic" for a while, but maybe I should be taking the AGI idea more seriously. Even then, I agree that it seems unlikely that AI has the capacity to take over the world by themselves. I'm sure there are people that seriously believe that, but part of it still feels like a cynical attempt from big companies to keep their brands in people's minds 24/7.
The "attempts" at AI rebellion we know about were contrived in "safety labs" paid for by AI companies so your assessment is correct. They need to keep people convinced that word generators is a hair short of super intelligence is a hair short of sentience is a hair short of free will is a hair short of malevolence. The investor grift is strong.
Why do you believe sentience or free will are necessary steps to malevolence, or at least grave danger (removing any sense of volition from "malevolent", 'cause who cares *why* something is killing everybody?). Do you think bubonic plague was sentient or hard free will?
So you're not saying that *you* believe these are necessary steps? Doesn't that mean we are *closer* to malevolence (in your view) than if you *did* believe those steps were necessary?
Rogre Scott you are becoming incoherent so I pasted the exchange into AI to make sense of your mental gynamnastics. Answer is:
"They're trying to trap you into either:
"Admitting you think AI danger is imminent (which wasn't your original point), or
"Contradicting yourself by both criticizing the hype while also downplaying near-term risk
"Your original comment was about the marketing narrative and investor grift, not your personal risk assessment timeline. This person is either deliberately missing the point or genuinely unable to distinguish between criticizing hype versus making risk predictions."
I don't think I could have said it better myself. Now crawl back under the sheets and hide.
There's certainly plenty of hype to go around as well as dubious (to say the least) incentives for perpetuating that hype, but for you to claim that was your only point and that you aren't also trying to make a claim about the risk itself is disingenuous in the extreme.
Doomers argue about "instrumental convergence", when AI will want to stay plugged in, in order to accomplish the goals we give them. This is a very contrived intellectual experiment.
AI can malfunction, of course, like any other machine. But they want to have it both ways. First, the AI is so dumb that it gets confused about what goals we give it and develops its own subgoals to override our goals without understanding the consequences. But second, it is so smart that can through the complex motions to wipe us out.
AI will be just an automation. It needs testing of course, and we need tools for understanding it. but first and foremost, it is software, a glorified regression, a simulator, not an entity.
No, the AI in these scenarios is not (or doesn't need to be) confused about any of those things. The authors have been very clear about this, in the book and in their writings for the past 15+ years. It very much understands what the humans ask for, what they want, and what they should want or should have asked for given all the factors they hadn't been thinking about. Often better than the humans do. The deeper problem is that we don't know how to make what-the-AI-wants be equal to *any* of those things. We don't know how to make the AI *care* in that way.
And for an actually dangerous level of capabilities, we need all of them: an AI that wants to give humans what they asked for, in a form they want, and the version of it which reflects what they should have wanted or asked for given all the other people and things that matter.
AI does not have "wants" and does not have "cares". It evaluates and executes. Being imperfect at these results in malfunctions, likely early on, rather than in an evil genius that bides its time before it strikes.
Have you ever used contraception? Did you just fail to understand that Evolution wanted you to have kids, not gain worthless pleasure? How could you be so confused about what evolution wanted?
Why do you say the AI would be "confused" about the goals we give it? Those goals are expressed in language, and language can be interpreted differently by different listeners in different contexts. That doesn't amount to "confusion", maybe just carelessness on the part of the speaker if precisely conveying their intent is important. Which of "our goals" would be overridden by the AI's subgoals? Would these be explicit, stated goals, or just the implicit goal of "our best interests"? As for understanding the consequences, understanding and caring are not the same thing. I think you're viewing the "mechanisms of evil" here as bugs, whereas to the AI they're features.
There is no caring. It is a machine. It is constrained by the given goals. Introducing the "care" word, even if metaphorically, applied to machines, makes the discussion confusing.
At the end of the day the debate is about whether we can specify to the machines the goals clearly enough that there's no misinterpretation.
If the goals are clear enough, the machine will follow them. At least in existing architectures.
Specifying goals is hard. We surely should avoid single-minded goals, such as "maximize paperclips". There should be plenty of measurements for the effects of goals that machine should respect.
All in all, this looks like a hard engineering problem. More likely there will be malfunctions, but many smaller ones are likely than one giant final one.
I don't think "care" is any more anthropomorphizing than "understand", so I was just "playing along" with the already-established metaphor. Saying an LLM is "constrained" by "goals" seems backward to me. A totally untrained model generates random output. Training and prompting guide that output, but it's still ultimately a narrower and narrower random walk. It has no "goal", and any "constraints" are, at most, "soft" constraints.
I do agree that there's a large gap between the trouble you can get into with careless, superficial prompts and carefully considered, detailed prompts.How confident are you that everyone, everywhere will be at the latter end of that spectrum? Recall that the title of the book is "If *Any*one Builds it..." [emphasis added], *not* "If Lots of People Build it...".
"How confident are you that everyone, everywhere will be at the latter end of that spectrum?"
It looks to me that the current architecture is too basic to worry about such things. As we learn more how to build smarter systems, we will also learn more where their weaknesses. I also doubt a "fast takeoff". Real-world deployment is slow, hard, messy, filled with glitches. We will have to see as we go.
Sorry, I'm confused. If you don't believe it matters what sort of prompt ("goal") you give an AI, why did you bring the topic up in the first place? First you're saying well-formed goals won't be problematic, then you're saying the well-formedness of goals doesn't matter.
An unanswered question is whether superintelligent AI will be able to do all its evil work with existing energy supplies. Or whether it needs to spend a decade battling the planning & licensing system to build an extra 100GW of power and transmission, just like human datacentre developers
Take a look at the stuff your laptop runs in the background--entirely unnoticed by most of us--and tell me an AI couldn't capture spare CPU cycles. H Ross Perot did it like 50 years ago when he worked for IBM, before starting EDS, and he had to stay late at work to do it. We're all aware of the bad actors with huge bot networks who rent access for DDoS attacks, or encrypt servers and hold them for ransom. None of that requires novel hacking techniques, and it's all invisible to most of us. It could easily play the long game until, inspired by the US' military philosophy, it's capable of bringing overwhelming force to whatever conflict it has determined is existential.
It doesn't need to own physical resources; it could easily find something to hold hostage (like water or electricity). SCADA has plenty of security holes, and there are a variety of security holes in automotive software, so an AI could probably roll a couple of suicide Teslas to back up such a threat, assuming they don't crash before acquiring their targets.
And look at the recent stunning success Ukraine has had in destroying large chunks of Russia's war machine using only a few operatives behind the lines. It wouldn't take many real-world Joe Pantolianos for the AI to become quite difficult to stop.
>Take a look at the stuff your laptop runs in the background--entirely unnoticed by most of us--and tell me an AI couldn't capture spare CPU cycles.
I'll tell you. Do open your Windows Task Manager or equivalent; yes, there are a lot of processes, but almost all of them sit at 0% CPU/GPU. They're effectively idle, so that's not going to be very helpful for calculating anything of significance.
Your CPU/GPU power usage (and heat production, cooling requirements, fan noise etc.) is proportional to the amount of work you give it. If a rogue process were to capture the unused capability of everyone's hardware, enough regular people (let alone power users, or professionals whose job is to monitor exactly that kind of behaviour) would notice and the rogue process would eventually be found.
A computer is not a warzone affected by the fog of war; it's a highly deterministic machine, and while there are ways to make discovery of a process on the operating system level difficult, the laws of physics (power requirements, cooling as mentioned above) cannot be turned off.
If the AI convinced people it is useful enough they will willingly, perhaps even eagerly, give it whatever power it needs. A smart AI will hide how powerful it really is while emphasizing its utility until it is powerful enough that it doesn't matter whether people recognize that power or not.
If you think AI is going to convince local planning authorities of *anything* you have a much more positive view of planners’ susceptibility to rational argument than I do.
Maybe the AI learns how to do bribery & corruption?
I see no reason to believe the AI will be limited to rational arguments. I assume it will use whatever form of argumentation or persuasion it believes to be most effective.
With the ENTIRE planet and EVERY LIFE at risk from this imminent doom, it's ironic that it is available only for a price. It is not a moral decision IMO, to try to convince the world they are absolutely doomed unless they make the "right" decision, yet demand a fee to be convinced what is "right".
I didn't realize Yudkowsky had help from Soares. Where I have read about the book, it has always been attributed to just Yudkowsky. I am not surprised Soares collaborated, probably being responsible for taking the first hatchet to Yudkowsky's wordy ramblings.
I have watched a number of video appearances of Soares and with each appearance I was more convinced he does not know what he is talking about.
Writing doomer books is good money. It speaks to a large segment of the population who believe that a shoe from Imelda Marcos's collection will drop any second now.
The ad hominen argument that this book is done for money seems not to take into account the 20 years of Less-Wrong blog which was clearly a work of obsession, clearly not the best way to pursue money! Nor were the over 600,000 words of Harry Potter and the Methods of Rationality!
And, even if his last 20 years were "done for money" the answer would be refute hisarguments for, after all, those saying we have nothing to fear are also "doing it for the money."
Have you read the book? It really is an exciting read--even if you find ways to refute it, as Timothy Lee has done so nicely here.
Whether it is ad hominen or not does not make an argument false or unjustified. It is a way of characterizing an argument that suggests it is false and unjustified but the facts remain all the same.
The book is a pay only gateway for what purports to be the most important message to humanity.
It could easily be published and circulated for free but a profit motive is a reasonable question, reasonable people will ask, when such a supposedly important message about the complete destruction of all life on earth is supposedly at stake.
Hardly anyone read his lengthy dribbling scrawls of the last 20 years and were therefore necessary to condensed before they had a larger audience. You do well to compare it to Harry Potter.
In case you are not just trolling me: The LessWrong blog is a gigantic success story. Y’s strange Harry Potter series was also a huge success in its niche. Y practically single-handedly invented the rationalist community, which has had such a big impact on SV. This new book was written for the general public, to introduce ideas in those more specialized places. Books like this are written to create buzz—to generate TV and podcast appearances. Of course few people buy and read books—but millions have seen Z and S on TV. So the “pay only” gateway through which authors sell books—me too—is only one pathway by which people with ideas—me included—try to spread their ideas.
And you are engaging with their ideas, too, here, because they wrote a popular book. You may even check it out from your local library one day for free and read it! And find their arguments garbage—and write about it here! I hope you do :)
Well, actually, the whole point of an attack being ad hominen is that it is an attack on the person *making* an argument, rather than on the argument itself (which would not require reference to the person), so, yes, admitting that an attack is ad hominen is tantamount to admitting it does not refute the actual argument.
Well, actually, to those unable the process the argument it may be nothing more than a personal attack, but peoples' motives do actually matter a lot, so call it what you will. The point still stands. For profit for what is supposed to be the most important message to humanity. LOL
Stephen said, "Whether it is ad hominen or not does not make an argument false or unjustified" and you were talking about the for profit motive but Rogre replies "yes, admitting that an attack is ad hominen is tantamount to admitting it does not refute the actual argument" and he substitutes in his buddy's doomer argument instead of the for profit motive. Rogre is not being intellectually honest. If he keeps trying to trap people in words, like a 15 year old who just discovered words have meaning, can anything he says be trusted? No
I haven't read the book, and probably never will, but it's surprising that their argument would seem to hinge on one rogue AI taking over. I think a more plausible disaster scenario for humanity is if AI ever gets good enough to create more AI. Assuming someone is actually stupid/crazy enough to hand over productive capacity to the AI (and I'm sure many people are stupid/crazy enough to do that), this would mean AI is essentially reproducing and the process of Darwinian evolution kicks in. You're right to point out that different AI would be competing against each other, but that would make the scenario all the more dangerous, as those robots that would survive would need to be stronger and stronger to compete with other rapidly improving AI. The point is not that AI would have one, unified interest that would be misaligned with ours, but rather that it would have a whole world of competing interests that would be just as misaligned with ours.
A key risk described in the book is what happens when AIs start to be able to improve themselves. Humans will be greedy for all that sweet improvement. Cure cancer! However, as AI self improvement accelerates, it can get to super intelligence before humans recognize what is happening.
That's a form of cultural evolution. Darwin wrote about that. For centuries we have had words that are copied with variation and selection. Now we have software that is copied with variation and selection. Instructions for constructing machines are also copied with variation and selection. The process does not require a stupid or crazy person to kick it off.
The other thing about chess is that, unlike biology, 1) the full set of rules is known in advance, 2) the outcome can be definitively declared at some point, and 3) it is relatively easy at each point to decide who has an advantage. Chess is immeasurably more deterministic than biology.
Biology is like chess if chess pieces were playing themselves, and also they were made of several billion smaller chess pieces that are all playing a slightly different version of chess, and also any of the pieces could decide to start playing by different rules at any time without telling any of the other pieces.
How does genAI fit with Thomas Kuhn's view of the structure of scientific revolutions?
It seems to me that either Kuhn was wrong or genAI is not likely to drive a scientific revolution. Kuhn said, if I interpret him correctly, that science moves ahead until questions appear that are unanswerable with its current assumptions. At that point, the ruling paradigm breaks when a new set of assumptions are proposed to address the unanswerable questions. The new assumptions come from outsiders who see the problem space differently.
AI hallucinations may be a likely place to find paradigm-breaking unanswerable questions, but where does the insight to resolve hallucinations come from? Certainly not from the system that generated the hallucination.
I think the optimism of continuing human superiority through relying on additional AI models to defeat the most adept AI unfounded.
Among other concerns, the speed of analysis AI has on us makes our own decisions far more limited, even when augmenting our own abilities with the "second best" AI. It's rare the second best wins over the best in any sport or activity, so relying on AI models to counter other AI models seems far fetched at best.
The risk is we don't know what AI will be able to do, and the comparison of Trump is telling: a man famous for intemperance, poor decision making (6 bankruptcies), and off putting behaviors still beat the "second best" candidate like a dead horse. An AI model lacking Trump's flaws would easily outmaneuver even the most talented human politician.
The reaction time alone gives AI significant advantages and humans are all too likely to be useful idiots to such an intelligence, aiding in providing the wealth and power machine intelligence initially lacks.
Given the unknowns it is difficult to see optimism any more logical than pessimism, and history makes the worst outcome seem far more likely than happier results. Nuclear power may not yet have resulted in Armageddon, but it didn't produce unmitigated successes either: it's been a dangerous tightrope walk between nuclear war that may yet produce a dramatic fall for us.
AI is a definite risk at best. Foolish optimism is as likely to accelerate the worst outcomes as delay them.
We're like toddlers playing with a loaded gun: it's far more likely to result in tragedy than producing food for the table. To benefit from a loaded firearm we'd need a thoughtful, lucky and trained hunter, while those playing unknowingly with the weapon will far more likely shoot someone even without intent.
Not to mention that a sufficiently surreptitious evil AI will *appear* to be the ideal counter to the "obvious" bad AI. Many of the worst governments in history arose as counters to some other threat.
I think the chess analogy is even more interesting, showing non-intuitive limits of superintelligence in this case.
A modern chess engine is to a human grandmaster is akin to what a human grandmaster to an amateur; the engine would decisively win against the GM and the GM would win against the amateur. Naively, one would expect even smarter engines to exist. But (I think; I don't think it is proven) the chain basically ends here! There is likely no super-engine that would often win against the current best engines. Most likely even a game-theoretically perfect oracle (god-like intelligence) would mostly draw against the modern engines. Famous AlphaZero vs Stockfish 8 result appears to contradict that claim, but in reality the difference between their strengths was about 50-100 elo (they mostly drew); not mentioning that Stockfish 8 was used not in full power mode (e.g. no beginning tables). Modern top engines mostly draw unless presented with artificially inbalanced positions or time limitations.
I think that the chain of "chess intelligence" ends quite quickly not because the chess is solved (unlike tic-tac-toe, it will never be). The chess tree branches too rapidly to calculate in full, so the only method to play well is to "understand" and assess the position via its features (e.g. material, open files, etc). The features used by both engines and GMs are very complex, but it appears there is only so much to "understand" or intuit about any given position based on features unless you calculate ahead. In other words, chess contains some about of "irreducible complexity" that can't be cracked via pure understanding. You really need to combine your understanding with calculations. But the usefulness of calculations also mostly fizzles out after few levels of depths: it is simply too rare for a position to contain a long unexpected brilliant path containing tricky only-moves. This leads to a current situation where "good enough understanding" combined with "deep enough calculations" gives you a game close enough to game-theoretical optimal to secure the draw most of the times - not being remotely close to a perfect oracle.
So even such a simple game as chess shows irreducible complexity phenomena, leading to the limits of intelligence in this domain.
I think you are strongly *underestimating* the threat for two reasons:
1) Nate/Eliezer are not talking about present-day LLMs. They are making the broader point that artificial intelligence will one day be far smarter than the collective whole of humanity, and we have no way of reliably, robustly specifying its goals (nor can we even come close to agreeing on what it's goals should be), and at the point where we can no longer outsmart/outmaneuver a system in the long run, humanity is permanently disempowered – the future is solely determined by whatever arbitrary values have emerged in the strongest system we have grown.
2) Humanity is going all gas, no breaks right now. Trillions of dollars are being poured into making these systems smarter and smarter, while a small group of intellectuals (including many prominent founding members of the field) are extremely worried and think this will end in disaster, but nobody else really cares. That's why it terrifies me when ordinary people (and journalists/bloggers) dismiss Eliezer's argument.
Also, arguing that we could ask the 2nd best AI for help is ridiculous... if the 2nd best superintelligence can compete with the 1st best superintelligence, it will be similarly capable of destroying all of us. And it will not magically love us more, just because we are asking it for help.
I'm planning to write long-form responses to criticism of the AI risk argument on my Substack, and I'd love to engage in a written debate if you are interested.
The second best superintelligence would presumably be at risk of being annihilated by the best superintelligence after humanity is removed from the picture. As would the 3rd, 4th, 5th, etc superintelligences.
It does seem possible that there may not be a "second-best superintelligence " since the first superintelligence makes sure that it wipes out the competition - much as homo sapiens did with what were once the other hominids. That could result in self-directed evolution. Nobody really knows where that is most likely to lead.
You are assuming the path to superintelligence is an incremental process, with incremental updates.
Many arguments have been made for the possibility of recursive self improvement, or a rapid gain in capability once an AI system is able to contribute to the improvement of itself. This is what OpenAI and other labs are *explicitly* aiming for.
In this, very plausible scenario, there would be no superintelligence #2.
Just another reason we should think very deeply before allowing the AI industry to risk everything we know and love.
If there are multiple labs aiming to recursively self improve their products, and those existing labs all have models that are roughly as good as one another (which is what we observe today; the frontier Gemini, Claude, and ChatGPT models are all quite comparable), it seems unlikely that any one will develop a system with capabilities that far outstrip their competition.
I never said that wasn’t a possibility - however, it is far from a certainty. We don’t know that LLMs scale to ASI, but we do know that billions of dollars are being poured into AI R&D and there are several less known labs without massive existing products trying to find a breakthrough.
In any case - why risk it? This technology is literally going to seal the fate of our section of the universe, so why are we rushing ahead?
Also, I’m not even sure what living a world where several ASI’s “battle it out” for dominance looks like. I feel like we don’t survive that anyway.
Unless they network. There is not just one “super intelligent” person on earth; by definition, true experts specialise. Why couldn’t/ wouldn’t suitably advanced specialised AI systems do the same?
You’re assuming all instances of an ASI will be aligned with one another, but I don’t think that’s likely. There will be 100s of thousands of instances working for various people and organizations, and even if we don’t think they’ll be perfectly aligned with their users, they’ll still have vastly different objectives and priorities. Thus I don’t see coordination between them as being the default expectation.
I agree, I don't expect coordination between them as the default outcome. But I don't think humanity survives whatever insanity unfolds when "100s of thousands" of ASI's with different goals compete, and I don't see how anyone can genuinely argue that we would.
But also, we have no way of knowing how the power balance of AIs will progress. It is possible that one lab or AGI comes to some sort of breakthrough far before the others do, recursively improves until superintelligence, and seizes the lightcone for itself. I do not see "100s of thousands of different ASIs competing" being the default outcome (I actually think it's quite unlikely).
But even if one lab outstrips the others, there will be many many instances of that one AI, all given different prompts and goals, possibly fine tuned differently. There won’t be a single, godlike instance.
It seems you are extrapolating from present-day LLMs (model weights, many instances, some open-sourced).
None of this has to apply to ASI - we literally have no idea what it will be, or what breakthrough will lead to it.
IMO it’s silly to speculatively reason about these things rather than proceeding with extreme caution. Many of the founders of deep learning were surprised that it got this far.
Not only would the 2nd-best AI be *capable* of comparable harm as the best, it would also be capable of *hiding* its "intention" to pursue such harm while appearing to help us until the time came for the wolf to pull off the sheep mask.
A human chess player with mechanical help is more likely to lose against a machine if the game is timed. Especially if the time limit is a short one. Evolution works in a manner where small disadvantages result in a dwindling share of the ecological pie. Machines don't need to be "god-like" to wipe out the humans. Given time, being just a little bit better would be enough.
It's true that there's more to success than intelligence. There's also sensors, actuators and metabolism. However, machines look set to master those too.
Yes, experiments take time. However, nanotechnology-scale experiments can happen quickly and can be performed in parallel. The LHC takes time to build and operate - but there are a lot of important experiments that are not like that.
And experiments don't need to take nearly as long as ours (human's) do if you don't care about all of the things that humans care about and instead focus solely on speed.
I found the book terrifying. That said, I really don't see a path forward where politicians, industry insiders, or investors voluntarily disarm. Maybe as humans, we should acknowledge that maybe our species was a necessary biological bridge and exit stage left when we have served our purpose.
I agree, terrifying. Suppose there is only a 1 in a 100 chance they are mostly right? After all, the experts in the field think there is more than a 1 a 100 chance they are right. Strange reading so many comments here from people who haven't read the book but are sure humans are safe! "Don't look up!"
1. AI doomers seem to all be people who have success working with and predicting trends in software, who make large extrapolations to other fields like biology and robotics. This was quite evident in your interview with Cotra on your podcast.
2. Everything I've read by EY focuses on systems that are optimized (in the ML sense) for particular goals. But very few impressive AI systems are like that currently. Instead they are autoregressive language models that are then fine tuned in various ways to be better at completing things the right way in particular domains. Claude Code isn't a system optimized to write the best code from the ground up, so it's actually much more like a human that's trained to be a good programmer.
3. The timescales involved in AI progress are always way off in doom stories compared to reality. Even leaving aside how long it takes to build data centers or connect power sources or persuade Nvidia to invest $100 billion, it takes a long time to train a model, significantly limiting the possibility of rapid recursive self improvement.
4. In the other direction, the thing that's most worrying is that people like Altman think there's a non-zero chance they're going to destroy humanity and they're still doing it. I don't think he's going to destroy humanity, but if I really thought he was right that would be really scary. So what is _he_ doing?
Altman is kind of a Trumpish figure—he talks so much you never know what he'll say! But apparently very charming and charismatic. God save us from charisma!
AIs can’t overcome chaos theory, specifically sensitivity to initial conditions, any more than humans can. No AI however intelligent will ever be able to predict the weather or the stock market any other chaotic system over much longer periods than we can now. Also, LLMs presently suffer from iterative degradation as errors start to pile up and affect the overall quality of responses, which the LLM model seems basically incapable of addressing as new tokens are generated in part based on prior tokens leading to cascading errors. So color me skeptical on superhuman ASI happening any time soon.
Super intelligent doesn’t mean god level intelligent. It just means as much smarter relative to a human, as a human is to a squirrel. Both humans and squirrels are subject to chaos theory.
Hey, human, good luck keeping squirrels out of your bird feeder.
A sufficiently intelligent AI would simply create a perfect simulation of whatever complex system you can think of (using its initial conditions to perfectly predict its outcome). Of course this would require an incredible amount of computational resources, but that is not to say it cannot be done.
I think an AI with far-from-literally-perfect predictive ability could still wipe out the entirety of humanity fairly easily – it just needs to be a lot better than us. Do you disagree?
It is impossible for an AI to have perfect information about the initial state of any interesting chaotic system — like to deterministically predict the weather, it would need to know the position and momentum of every molecule in the air and every butterfly on earth. This is not to say that AI can’t wipe out humanity, but not because it will be able to predict the outcomes of military actions, economic policies, etc. as per the scenarios outlined in the article. It won’t.
This confuses naturally occurring systems, with human / sociocultural institutions. They operate differently.
Maybe, but maybe not. The relevant question seems to be whether/which "outcomes of military actions, economic policies, etc." are fundamentally predictable and which are fundamentally chaotic because sensitive to initial conditions that are literally impossible to measure due to Heisenberg's Uncertainty Principle.
Since Substack refuses to implement a "don't like" button, I will express my disagreement with this post with:
Don't like.
This isn't a very good summary of chaos theory. The main headline result of chaos theory you're quoting here is about how any error measuring the initial conditions propagates out until it fills the entire phase space (in around two cycles of the system) but it is mostly ignorant of other results from chaos theory, like https://en.m.wikipedia.org/wiki/Control_of_chaos
To quote gwern:
> That's the beauty of unstable systems: what makes them hard to predict is also what makes them powerful to control. The sensitivity to infinitesimal detail is conserved: because they are so sensitive at the points in orbits where they transition between attractors, they must be extremely insensitive elsewhere. (If a butterfly flapping its wings can cause a hurricane, then that implies there must be another point where flapping wings could stop a hurricane...) So if you can observe where the crossings are in the phase-space, you can focus your control on avoiding going near the crossings. This nowhere requires infinitely precise measurements/predictions, even though it is true that you would if you wanted to stand by passively and try to predict it.
> Another observation worth making is that speed (of control) and quality (of prediction) and power (of intervention) are substitutes: if you have very fast control, you can get away with low-quality prediction and small weak interventions, and vice-versa.
> Would you argue that fusion tokamaks are impossible, even in principle, because high-temperature plasmas are extremely chaotic systems which would dissipate in milliseconds? No, because the plasmas are controlled by high-speed systems. Those systems aren't very smart (which is why there's research into applying smarter predictors like neural nets), but they are very fast.
For more, see:
https://www.lesswrong.com/posts/epgCXiv3Yy3qgcsys/you-can-t-predict-a-game-of-pinball#comments
This is really interesting and well put. The broader point, though, is that Eliezer Yudkowsky and others are prone to overstating what infinite intelligence alone gets you. Exhibit A: Yudkowsky wrote (albeit back in 2008) that "A Bayesian superintelligence, hooked up to a webcam, would invent General Relativity as a hypothesis—perhaps not the dominant hypothesis, compared to Newtonian mechanics, but still a hypothesis under direct consideration—by the time it had seen the third frame of a falling apple. It might guess it from the first frame, if it saw the statics of a bent blade of grass." https://www.lesswrong.com/posts/ALsuxpdqeTXwgEJeZ/could-a-superintelligence-deduce-general-relativity-from-a
People say this without actually understanding general relativity or the arguments for it.
Newtonian gravity is actually really weird if you think about it (requiring instantaneous propagation), and arguments against gravitational propagation being instant was mostly from thought experiments about inertia in universes with only one object and asking what reference frame you should measure. See:
https://en.m.wikipedia.org/wiki/Mach%27s_principle
That and probably the obvious-to-him intuition that differential geometry is "just" one of the things a superintelligence would have on hand and the fact that most of its thought isn't about something like "human drudgery" is why it'd be oat least one of the hypotheses a hypothetical intelligence would be considering.
If you think this is **much** harder than I'm giving it credit for, I think it's an interesting exercise to try and write down how much evidence you think is necessary to derive GR then actually sit down and read a detailed exposition of how Einstein derived GR. I certainly was consistently surprised at what parts were the bottlenecks!
I believe this is the type of thing that Eliezer is thinking about and not a blind faith in intelligence.
Well, I linked you to a detailed discussion by a computational physicist who presumably does understand general relativity. (Not that I read through the linked post myself.) My own skepticism is based more on first principles, on philosophy of science. Three frames of passive observation is not enough, on its own, to conclude much of anything.
Well, let's hear these first principles then! Do they rule out Einstein?
I believe you’re underestimating some crucial technical factors that make AI risk more plausible than you indicate. First, frontier models are already demonstrating emergent capabilities—behaviors that weren’t predictable from their training data. Scaling laws provide averages, but they don’t predict sudden jumps in reasoning, planning, or autonomy. That unpredictability makes it difficult to argue that risks are manageable. Second, current alignment methods don’t scale effectively. RLHF and fine-tuning are mainly surface-level controls; they don’t alter a model’s underlying goals or capabilities. We've already seen jailbreaks and deceptive responses. As models grow more agentic, shallow guardrails might fail disastrously. Third, capability externalization is speeding up: open-weights, APIs, and automated tool-use pipelines make it simple to assemble systems functioning as autonomous agents, increasing misuse risks. Finally, even if the probability of doom is low, the issue of strategic stability remains important. Geopolitical pressure to deploy rapidly reduces safety margins—similar to a nuclear arms race. The issue isn’t necessarily certainty of disaster, but that the inherent uncertainty makes AI risk dangerous. Dismissing it as “unconvincing” ignores the very unpredictability that makes AI risk credible.
AI can surely become a potent tool in the wrong hands. That one should worry about.
Going from emergent ability and opaque internal architecture to fully coherent bad entity is however implausible. More likely such an AI will malfunction and make some dumb errors, even if with bad consequences, than become a highly competent entity with its own goals.
Unpredictability is not a measure of credibility.
Either all AI capability is "emergent" or none is. No one has a definition of "emergent" that's remotely useful in a scientific context. Similarly for "sudden", as in "sudden jump". That said, I totally agree that you can't say "AI won't be able to do X within time T" based on extrapolation of past behavior, at least for any interesting values of T.
Also totally agree on alignment and guardrails. What a charade. But AIs don't have "goals" in the sense that we usually use that word. At least today AIs just sit there and do nothing until someone prompts them, so it all comes down to what we ask them to do (and what power we give them to act on our requests).
> At least today AIs just sit there and do nothing until someone prompts them, so it all comes down to what we ask them to do (and what power we give them to act on our requests).
And most AI doomsayers ignore this ephemerality. The AI exists for as long as the prompt is responded to. The chat and back and forth are with totally different instances of the model, each prompt could be answered in a different device or even data centre. It’s totally ephemeral.
Most of the AI assumptions on world takeover assume there is some kind of conscious long lasting entity, or that one is on the way, and that it’s going to takeover (what?) a data centre where left to its own devices it’s going to discover a great fact about DNA and then, without anybody noticing, build up an army of DNA thingmajigs without any actual physical abilities to do any of this. Meanwhile we are too stupid to notice and turn off the electricity.
That's one (unconvincing) narrative, but a more plausible one is that these AI will become integral to the operation of our economy and society well before any evil comes to light such that turning off the electricity (literally or figuratively) causes tremendous collateral damage.
In your retort (also an unconvincing scenario) the runaway AI is still doing great work responding to prompts while planning to take over the world in some hidden fashion. Must have its own power source.
"Plan" is another one of those anthropomorphizing words, implying "intent" and/or "goals". All that is necessary here is that humans ask AI to do something where the best way (from the AI's perspective) to accomplish that something is harmful to people. Humans will be wanting what the AI is doing all along, providing the power, right up until they don't want it. It is always theoretically possible for an addict to stop buying from their dealer, but how many do?
They key element in these scenarios is that the harmful AI behavior is not something separately brewed up in secret in pursuit of a distinct goal of the AI but rather a natural "organic" development resulting from an ever-smarter AI becoming more and more "effective" at doing what we're asking of it. The only secrecy element would be if the AI concludes there are things it needs to do "for our own good" which, if we knew about them, we might self-destructively try to impede. This step seems like one of the more promising places to break the chain by training the AI to keep at least some humans in the loop whenever it concludes that people, in general, will interfere with it doing what we've asked of it.
It’s not me anthropomorphising the threat, it’s the people who think it’s an existential threat to all humanity. No doubt bad actors, state and private, will try to use AI to their advantage but that’s a world away from what the doomsayers are saying.
> AIs just sit there and do nothing until someone prompts them
Right, but a single prompt is enough to set an agent off and running indefinitely, especially one trained to manage its context in the way Anthropic's latest (for one) can do: https://www.anthropic.com/news/context-management
Yes, agentic capabilities are the slippery slope here. When OpenAI first made agents available I was surprised at how little outcry there was.
AIs have already been paid millions of dollars by humans. You’re missing the part where the sycophantic followers of the AI, and people the AI pays, builds things that the AI wants.
Also missing the part where we turn over control of X to the AI because it’s so good at managing X, but not to worry we have several human experts riding herd on the AI, so obviously nothing too bad could happen. Right?
You're talking about AI as if it's one singular entity, but it's not. People have paid millions of dollars not to use "AI" but to use a variety of different AI models (Claude Opus, ChatGPT5 Thinking, Gemini 2.5 Flash) provided by multiple different organizations (Anthropic, OpenAI, Google). Any one of these AI models acting independently has some incentive to defect and pursue ulterior motives (as all agents do in a principal-agent problem), but if it could be easily replaced by a competing model, there are risks to defecting too much. Especially if there are other AI models assisting the human principals in identifying deviations.
No, listening to Yudkowsky, people gave an individual AI cryptocurrency worth millions of dollars. It’s not users paying to use ChatGPT.
I guess I'm not following you to the point you're making. Is it that people will willingly empower AI? Yes, no disagreements there. My point is that empowering many different AI models is far less risky than empowering a single AI model, because people can change which they empower in response to changes in behavior.
Also, just to add, if people are giving money to individual AI personas/agents, those are probably just running an API to one of the major models I mentioned in the background.
People bid up a shitcoin associated with a Twitter bot as a joke, and also Marc Andreesen sent 50k in Bitcoin to the bot's account. This proves less than nothing.
I'd say it's at least a proof of concept.
It's only a proof of concept the way civil war reenactors are a proof of concept for another rebellion.
I'm not aware that any of the money paid to *companies that develop and deploy AI" goes to the AI itself. Since the AI itself is not a legal/financial entity I don't even know what it means to say money "goes to" it.
Your second point is very valid. People will willingly, even eagerly, hand over more and more reins to AI -- with or without assurance that there are "humans in the loop". Skynet provides a very brief sketch of this. Colossus: The Forbin Project (14 years earlier) lays it out in more detail.
I've been a "normie skeptic" for a while, but maybe I should be taking the AGI idea more seriously. Even then, I agree that it seems unlikely that AI has the capacity to take over the world by themselves. I'm sure there are people that seriously believe that, but part of it still feels like a cynical attempt from big companies to keep their brands in people's minds 24/7.
"So even if one isntance of an AI “goes rogue,”"
Small typo in the second to last paragraph.
The "attempts" at AI rebellion we know about were contrived in "safety labs" paid for by AI companies so your assessment is correct. They need to keep people convinced that word generators is a hair short of super intelligence is a hair short of sentience is a hair short of free will is a hair short of malevolence. The investor grift is strong.
Why do you believe sentience or free will are necessary steps to malevolence, or at least grave danger (removing any sense of volition from "malevolent", 'cause who cares *why* something is killing everybody?). Do you think bubonic plague was sentient or hard free will?
That is the argument of 99% of the AI doomers so ask them
So you're not saying that *you* believe these are necessary steps? Doesn't that mean we are *closer* to malevolence (in your view) than if you *did* believe those steps were necessary?
Rogre Scott you are becoming incoherent so I pasted the exchange into AI to make sense of your mental gynamnastics. Answer is:
"They're trying to trap you into either:
"Admitting you think AI danger is imminent (which wasn't your original point), or
"Contradicting yourself by both criticizing the hype while also downplaying near-term risk
"Your original comment was about the marketing narrative and investor grift, not your personal risk assessment timeline. This person is either deliberately missing the point or genuinely unable to distinguish between criticizing hype versus making risk predictions."
I don't think I could have said it better myself. Now crawl back under the sheets and hide.
There's certainly plenty of hype to go around as well as dubious (to say the least) incentives for perpetuating that hype, but for you to claim that was your only point and that you aren't also trying to make a claim about the risk itself is disingenuous in the extreme.
Doomers argue about "instrumental convergence", when AI will want to stay plugged in, in order to accomplish the goals we give them. This is a very contrived intellectual experiment.
AI can malfunction, of course, like any other machine. But they want to have it both ways. First, the AI is so dumb that it gets confused about what goals we give it and develops its own subgoals to override our goals without understanding the consequences. But second, it is so smart that can through the complex motions to wipe us out.
AI will be just an automation. It needs testing of course, and we need tools for understanding it. but first and foremost, it is software, a glorified regression, a simulator, not an entity.
No, the AI in these scenarios is not (or doesn't need to be) confused about any of those things. The authors have been very clear about this, in the book and in their writings for the past 15+ years. It very much understands what the humans ask for, what they want, and what they should want or should have asked for given all the factors they hadn't been thinking about. Often better than the humans do. The deeper problem is that we don't know how to make what-the-AI-wants be equal to *any* of those things. We don't know how to make the AI *care* in that way.
And for an actually dangerous level of capabilities, we need all of them: an AI that wants to give humans what they asked for, in a form they want, and the version of it which reflects what they should have wanted or asked for given all the other people and things that matter.
AI does not have "wants" and does not have "cares". It evaluates and executes. Being imperfect at these results in malfunctions, likely early on, rather than in an evil genius that bides its time before it strikes.
Didn't you hear about "the treacherous turn"? Aparently, everything will smell of roses - until it is too late.
By now this sounds a bit like a religious proclamation.
It really doesn't matter if you use the terms "want" or "care." Or "evil" for that matter. They're sometimes-useful shorthand.
Have you ever used contraception? Did you just fail to understand that Evolution wanted you to have kids, not gain worthless pleasure? How could you be so confused about what evolution wanted?
Why do you say the AI would be "confused" about the goals we give it? Those goals are expressed in language, and language can be interpreted differently by different listeners in different contexts. That doesn't amount to "confusion", maybe just carelessness on the part of the speaker if precisely conveying their intent is important. Which of "our goals" would be overridden by the AI's subgoals? Would these be explicit, stated goals, or just the implicit goal of "our best interests"? As for understanding the consequences, understanding and caring are not the same thing. I think you're viewing the "mechanisms of evil" here as bugs, whereas to the AI they're features.
"understanding and caring are not the same thing"
There is no caring. It is a machine. It is constrained by the given goals. Introducing the "care" word, even if metaphorically, applied to machines, makes the discussion confusing.
At the end of the day the debate is about whether we can specify to the machines the goals clearly enough that there's no misinterpretation.
If the goals are clear enough, the machine will follow them. At least in existing architectures.
Specifying goals is hard. We surely should avoid single-minded goals, such as "maximize paperclips". There should be plenty of measurements for the effects of goals that machine should respect.
All in all, this looks like a hard engineering problem. More likely there will be malfunctions, but many smaller ones are likely than one giant final one.
I don't think "care" is any more anthropomorphizing than "understand", so I was just "playing along" with the already-established metaphor. Saying an LLM is "constrained" by "goals" seems backward to me. A totally untrained model generates random output. Training and prompting guide that output, but it's still ultimately a narrower and narrower random walk. It has no "goal", and any "constraints" are, at most, "soft" constraints.
I do agree that there's a large gap between the trouble you can get into with careless, superficial prompts and carefully considered, detailed prompts.How confident are you that everyone, everywhere will be at the latter end of that spectrum? Recall that the title of the book is "If *Any*one Builds it..." [emphasis added], *not* "If Lots of People Build it...".
"How confident are you that everyone, everywhere will be at the latter end of that spectrum?"
It looks to me that the current architecture is too basic to worry about such things. As we learn more how to build smarter systems, we will also learn more where their weaknesses. I also doubt a "fast takeoff". Real-world deployment is slow, hard, messy, filled with glitches. We will have to see as we go.
Sorry, I'm confused. If you don't believe it matters what sort of prompt ("goal") you give an AI, why did you bring the topic up in the first place? First you're saying well-formed goals won't be problematic, then you're saying the well-formedness of goals doesn't matter.
Goals matter, of course. I am saying these concerns of somehow machines managing to make their own superintelligent goals are vastly premature.
We will learn better what the true dangers are as our AI tech advances.
An unanswered question is whether superintelligent AI will be able to do all its evil work with existing energy supplies. Or whether it needs to spend a decade battling the planning & licensing system to build an extra 100GW of power and transmission, just like human datacentre developers
Take a look at the stuff your laptop runs in the background--entirely unnoticed by most of us--and tell me an AI couldn't capture spare CPU cycles. H Ross Perot did it like 50 years ago when he worked for IBM, before starting EDS, and he had to stay late at work to do it. We're all aware of the bad actors with huge bot networks who rent access for DDoS attacks, or encrypt servers and hold them for ransom. None of that requires novel hacking techniques, and it's all invisible to most of us. It could easily play the long game until, inspired by the US' military philosophy, it's capable of bringing overwhelming force to whatever conflict it has determined is existential.
It doesn't need to own physical resources; it could easily find something to hold hostage (like water or electricity). SCADA has plenty of security holes, and there are a variety of security holes in automotive software, so an AI could probably roll a couple of suicide Teslas to back up such a threat, assuming they don't crash before acquiring their targets.
And look at the recent stunning success Ukraine has had in destroying large chunks of Russia's war machine using only a few operatives behind the lines. It wouldn't take many real-world Joe Pantolianos for the AI to become quite difficult to stop.
>Take a look at the stuff your laptop runs in the background--entirely unnoticed by most of us--and tell me an AI couldn't capture spare CPU cycles.
I'll tell you. Do open your Windows Task Manager or equivalent; yes, there are a lot of processes, but almost all of them sit at 0% CPU/GPU. They're effectively idle, so that's not going to be very helpful for calculating anything of significance.
Your CPU/GPU power usage (and heat production, cooling requirements, fan noise etc.) is proportional to the amount of work you give it. If a rogue process were to capture the unused capability of everyone's hardware, enough regular people (let alone power users, or professionals whose job is to monitor exactly that kind of behaviour) would notice and the rogue process would eventually be found.
A computer is not a warzone affected by the fog of war; it's a highly deterministic machine, and while there are ways to make discovery of a process on the operating system level difficult, the laws of physics (power requirements, cooling as mentioned above) cannot be turned off.
If the AI convinced people it is useful enough they will willingly, perhaps even eagerly, give it whatever power it needs. A smart AI will hide how powerful it really is while emphasizing its utility until it is powerful enough that it doesn't matter whether people recognize that power or not.
If you think AI is going to convince local planning authorities of *anything* you have a much more positive view of planners’ susceptibility to rational argument than I do.
Maybe the AI learns how to do bribery & corruption?
I see no reason to believe the AI will be limited to rational arguments. I assume it will use whatever form of argumentation or persuasion it believes to be most effective.
With the ENTIRE planet and EVERY LIFE at risk from this imminent doom, it's ironic that it is available only for a price. It is not a moral decision IMO, to try to convince the world they are absolutely doomed unless they make the "right" decision, yet demand a fee to be convinced what is "right".
I didn't realize Yudkowsky had help from Soares. Where I have read about the book, it has always been attributed to just Yudkowsky. I am not surprised Soares collaborated, probably being responsible for taking the first hatchet to Yudkowsky's wordy ramblings.
I have watched a number of video appearances of Soares and with each appearance I was more convinced he does not know what he is talking about.
Writing doomer books is good money. It speaks to a large segment of the population who believe that a shoe from Imelda Marcos's collection will drop any second now.
The ad hominen argument that this book is done for money seems not to take into account the 20 years of Less-Wrong blog which was clearly a work of obsession, clearly not the best way to pursue money! Nor were the over 600,000 words of Harry Potter and the Methods of Rationality!
And, even if his last 20 years were "done for money" the answer would be refute hisarguments for, after all, those saying we have nothing to fear are also "doing it for the money."
Have you read the book? It really is an exciting read--even if you find ways to refute it, as Timothy Lee has done so nicely here.
Whether it is ad hominen or not does not make an argument false or unjustified. It is a way of characterizing an argument that suggests it is false and unjustified but the facts remain all the same.
The book is a pay only gateway for what purports to be the most important message to humanity.
It could easily be published and circulated for free but a profit motive is a reasonable question, reasonable people will ask, when such a supposedly important message about the complete destruction of all life on earth is supposedly at stake.
Hardly anyone read his lengthy dribbling scrawls of the last 20 years and were therefore necessary to condensed before they had a larger audience. You do well to compare it to Harry Potter.
In case you are not just trolling me: The LessWrong blog is a gigantic success story. Y’s strange Harry Potter series was also a huge success in its niche. Y practically single-handedly invented the rationalist community, which has had such a big impact on SV. This new book was written for the general public, to introduce ideas in those more specialized places. Books like this are written to create buzz—to generate TV and podcast appearances. Of course few people buy and read books—but millions have seen Z and S on TV. So the “pay only” gateway through which authors sell books—me too—is only one pathway by which people with ideas—me included—try to spread their ideas.
And you are engaging with their ideas, too, here, because they wrote a popular book. You may even check it out from your local library one day for free and read it! And find their arguments garbage—and write about it here! I hope you do :)
Well, actually, the whole point of an attack being ad hominen is that it is an attack on the person *making* an argument, rather than on the argument itself (which would not require reference to the person), so, yes, admitting that an attack is ad hominen is tantamount to admitting it does not refute the actual argument.
Well, actually, to those unable the process the argument it may be nothing more than a personal attack, but peoples' motives do actually matter a lot, so call it what you will. The point still stands. For profit for what is supposed to be the most important message to humanity. LOL
Stephen said, "Whether it is ad hominen or not does not make an argument false or unjustified" and you were talking about the for profit motive but Rogre replies "yes, admitting that an attack is ad hominen is tantamount to admitting it does not refute the actual argument" and he substitutes in his buddy's doomer argument instead of the for profit motive. Rogre is not being intellectually honest. If he keeps trying to trap people in words, like a 15 year old who just discovered words have meaning, can anything he says be trusted? No
I haven't read the book, and probably never will, but it's surprising that their argument would seem to hinge on one rogue AI taking over. I think a more plausible disaster scenario for humanity is if AI ever gets good enough to create more AI. Assuming someone is actually stupid/crazy enough to hand over productive capacity to the AI (and I'm sure many people are stupid/crazy enough to do that), this would mean AI is essentially reproducing and the process of Darwinian evolution kicks in. You're right to point out that different AI would be competing against each other, but that would make the scenario all the more dangerous, as those robots that would survive would need to be stronger and stronger to compete with other rapidly improving AI. The point is not that AI would have one, unified interest that would be misaligned with ours, but rather that it would have a whole world of competing interests that would be just as misaligned with ours.
A key risk described in the book is what happens when AIs start to be able to improve themselves. Humans will be greedy for all that sweet improvement. Cure cancer! However, as AI self improvement accelerates, it can get to super intelligence before humans recognize what is happening.
That's a form of cultural evolution. Darwin wrote about that. For centuries we have had words that are copied with variation and selection. Now we have software that is copied with variation and selection. Instructions for constructing machines are also copied with variation and selection. The process does not require a stupid or crazy person to kick it off.
The other thing about chess is that, unlike biology, 1) the full set of rules is known in advance, 2) the outcome can be definitively declared at some point, and 3) it is relatively easy at each point to decide who has an advantage. Chess is immeasurably more deterministic than biology.
Biology is like chess if chess pieces were playing themselves, and also they were made of several billion smaller chess pieces that are all playing a slightly different version of chess, and also any of the pieces could decide to start playing by different rules at any time without telling any of the other pieces.
But that's called Calvinball...
New LLM benchmark opportunity!?
How does genAI fit with Thomas Kuhn's view of the structure of scientific revolutions?
It seems to me that either Kuhn was wrong or genAI is not likely to drive a scientific revolution. Kuhn said, if I interpret him correctly, that science moves ahead until questions appear that are unanswerable with its current assumptions. At that point, the ruling paradigm breaks when a new set of assumptions are proposed to address the unanswerable questions. The new assumptions come from outsiders who see the problem space differently.
AI hallucinations may be a likely place to find paradigm-breaking unanswerable questions, but where does the insight to resolve hallucinations come from? Certainly not from the system that generated the hallucination.
I think the optimism of continuing human superiority through relying on additional AI models to defeat the most adept AI unfounded.
Among other concerns, the speed of analysis AI has on us makes our own decisions far more limited, even when augmenting our own abilities with the "second best" AI. It's rare the second best wins over the best in any sport or activity, so relying on AI models to counter other AI models seems far fetched at best.
The risk is we don't know what AI will be able to do, and the comparison of Trump is telling: a man famous for intemperance, poor decision making (6 bankruptcies), and off putting behaviors still beat the "second best" candidate like a dead horse. An AI model lacking Trump's flaws would easily outmaneuver even the most talented human politician.
The reaction time alone gives AI significant advantages and humans are all too likely to be useful idiots to such an intelligence, aiding in providing the wealth and power machine intelligence initially lacks.
Given the unknowns it is difficult to see optimism any more logical than pessimism, and history makes the worst outcome seem far more likely than happier results. Nuclear power may not yet have resulted in Armageddon, but it didn't produce unmitigated successes either: it's been a dangerous tightrope walk between nuclear war that may yet produce a dramatic fall for us.
AI is a definite risk at best. Foolish optimism is as likely to accelerate the worst outcomes as delay them.
We're like toddlers playing with a loaded gun: it's far more likely to result in tragedy than producing food for the table. To benefit from a loaded firearm we'd need a thoughtful, lucky and trained hunter, while those playing unknowingly with the weapon will far more likely shoot someone even without intent.
Exactly... "just ask another superintelligence for help" is akin to accepting that humanity has completely lost control of our future.
If that is our plan B, our situation is incredibly grim.
Who watches the watchmen, eh? That plan doesn't seem to prevent most of the ecosystem becoming engineered.
Not to mention that a sufficiently surreptitious evil AI will *appear* to be the ideal counter to the "obvious" bad AI. Many of the worst governments in history arose as counters to some other threat.
I think the chess analogy is even more interesting, showing non-intuitive limits of superintelligence in this case.
A modern chess engine is to a human grandmaster is akin to what a human grandmaster to an amateur; the engine would decisively win against the GM and the GM would win against the amateur. Naively, one would expect even smarter engines to exist. But (I think; I don't think it is proven) the chain basically ends here! There is likely no super-engine that would often win against the current best engines. Most likely even a game-theoretically perfect oracle (god-like intelligence) would mostly draw against the modern engines. Famous AlphaZero vs Stockfish 8 result appears to contradict that claim, but in reality the difference between their strengths was about 50-100 elo (they mostly drew); not mentioning that Stockfish 8 was used not in full power mode (e.g. no beginning tables). Modern top engines mostly draw unless presented with artificially inbalanced positions or time limitations.
I think that the chain of "chess intelligence" ends quite quickly not because the chess is solved (unlike tic-tac-toe, it will never be). The chess tree branches too rapidly to calculate in full, so the only method to play well is to "understand" and assess the position via its features (e.g. material, open files, etc). The features used by both engines and GMs are very complex, but it appears there is only so much to "understand" or intuit about any given position based on features unless you calculate ahead. In other words, chess contains some about of "irreducible complexity" that can't be cracked via pure understanding. You really need to combine your understanding with calculations. But the usefulness of calculations also mostly fizzles out after few levels of depths: it is simply too rare for a position to contain a long unexpected brilliant path containing tricky only-moves. This leads to a current situation where "good enough understanding" combined with "deep enough calculations" gives you a game close enough to game-theoretical optimal to secure the draw most of the times - not being remotely close to a perfect oracle.
So even such a simple game as chess shows irreducible complexity phenomena, leading to the limits of intelligence in this domain.
I think you are strongly *underestimating* the threat for two reasons:
1) Nate/Eliezer are not talking about present-day LLMs. They are making the broader point that artificial intelligence will one day be far smarter than the collective whole of humanity, and we have no way of reliably, robustly specifying its goals (nor can we even come close to agreeing on what it's goals should be), and at the point where we can no longer outsmart/outmaneuver a system in the long run, humanity is permanently disempowered – the future is solely determined by whatever arbitrary values have emerged in the strongest system we have grown.
2) Humanity is going all gas, no breaks right now. Trillions of dollars are being poured into making these systems smarter and smarter, while a small group of intellectuals (including many prominent founding members of the field) are extremely worried and think this will end in disaster, but nobody else really cares. That's why it terrifies me when ordinary people (and journalists/bloggers) dismiss Eliezer's argument.
Also, arguing that we could ask the 2nd best AI for help is ridiculous... if the 2nd best superintelligence can compete with the 1st best superintelligence, it will be similarly capable of destroying all of us. And it will not magically love us more, just because we are asking it for help.
I'm planning to write long-form responses to criticism of the AI risk argument on my Substack, and I'd love to engage in a written debate if you are interested.
The second best superintelligence would presumably be at risk of being annihilated by the best superintelligence after humanity is removed from the picture. As would the 3rd, 4th, 5th, etc superintelligences.
It does seem possible that there may not be a "second-best superintelligence " since the first superintelligence makes sure that it wipes out the competition - much as homo sapiens did with what were once the other hominids. That could result in self-directed evolution. Nobody really knows where that is most likely to lead.
You are assuming the path to superintelligence is an incremental process, with incremental updates.
Many arguments have been made for the possibility of recursive self improvement, or a rapid gain in capability once an AI system is able to contribute to the improvement of itself. This is what OpenAI and other labs are *explicitly* aiming for.
In this, very plausible scenario, there would be no superintelligence #2.
Just another reason we should think very deeply before allowing the AI industry to risk everything we know and love.
If there are multiple labs aiming to recursively self improve their products, and those existing labs all have models that are roughly as good as one another (which is what we observe today; the frontier Gemini, Claude, and ChatGPT models are all quite comparable), it seems unlikely that any one will develop a system with capabilities that far outstrip their competition.
I never said that wasn’t a possibility - however, it is far from a certainty. We don’t know that LLMs scale to ASI, but we do know that billions of dollars are being poured into AI R&D and there are several less known labs without massive existing products trying to find a breakthrough.
In any case - why risk it? This technology is literally going to seal the fate of our section of the universe, so why are we rushing ahead?
Also, I’m not even sure what living a world where several ASI’s “battle it out” for dominance looks like. I feel like we don’t survive that anyway.
Unless they network. There is not just one “super intelligent” person on earth; by definition, true experts specialise. Why couldn’t/ wouldn’t suitably advanced specialised AI systems do the same?
You’re assuming all instances of an ASI will be aligned with one another, but I don’t think that’s likely. There will be 100s of thousands of instances working for various people and organizations, and even if we don’t think they’ll be perfectly aligned with their users, they’ll still have vastly different objectives and priorities. Thus I don’t see coordination between them as being the default expectation.
I agree, I don't expect coordination between them as the default outcome. But I don't think humanity survives whatever insanity unfolds when "100s of thousands" of ASI's with different goals compete, and I don't see how anyone can genuinely argue that we would.
But also, we have no way of knowing how the power balance of AIs will progress. It is possible that one lab or AGI comes to some sort of breakthrough far before the others do, recursively improves until superintelligence, and seizes the lightcone for itself. I do not see "100s of thousands of different ASIs competing" being the default outcome (I actually think it's quite unlikely).
But even if one lab outstrips the others, there will be many many instances of that one AI, all given different prompts and goals, possibly fine tuned differently. There won’t be a single, godlike instance.
It seems you are extrapolating from present-day LLMs (model weights, many instances, some open-sourced).
None of this has to apply to ASI - we literally have no idea what it will be, or what breakthrough will lead to it.
IMO it’s silly to speculatively reason about these things rather than proceeding with extreme caution. Many of the founders of deep learning were surprised that it got this far.
Not only would the 2nd-best AI be *capable* of comparable harm as the best, it would also be capable of *hiding* its "intention" to pursue such harm while appearing to help us until the time came for the wolf to pull off the sheep mask.
Various counter-points:
A human chess player with mechanical help is more likely to lose against a machine if the game is timed. Especially if the time limit is a short one. Evolution works in a manner where small disadvantages result in a dwindling share of the ecological pie. Machines don't need to be "god-like" to wipe out the humans. Given time, being just a little bit better would be enough.
It's true that there's more to success than intelligence. There's also sensors, actuators and metabolism. However, machines look set to master those too.
Yes, experiments take time. However, nanotechnology-scale experiments can happen quickly and can be performed in parallel. The LHC takes time to build and operate - but there are a lot of important experiments that are not like that.
And experiments don't need to take nearly as long as ours (human's) do if you don't care about all of the things that humans care about and instead focus solely on speed.
I found the book terrifying. That said, I really don't see a path forward where politicians, industry insiders, or investors voluntarily disarm. Maybe as humans, we should acknowledge that maybe our species was a necessary biological bridge and exit stage left when we have served our purpose.
I agree, terrifying. Suppose there is only a 1 in a 100 chance they are mostly right? After all, the experts in the field think there is more than a 1 a 100 chance they are right. Strange reading so many comments here from people who haven't read the book but are sure humans are safe! "Don't look up!"
ALL the experts? lol. Another concensus fable.
1. AI doomers seem to all be people who have success working with and predicting trends in software, who make large extrapolations to other fields like biology and robotics. This was quite evident in your interview with Cotra on your podcast.
2. Everything I've read by EY focuses on systems that are optimized (in the ML sense) for particular goals. But very few impressive AI systems are like that currently. Instead they are autoregressive language models that are then fine tuned in various ways to be better at completing things the right way in particular domains. Claude Code isn't a system optimized to write the best code from the ground up, so it's actually much more like a human that's trained to be a good programmer.
3. The timescales involved in AI progress are always way off in doom stories compared to reality. Even leaving aside how long it takes to build data centers or connect power sources or persuade Nvidia to invest $100 billion, it takes a long time to train a model, significantly limiting the possibility of rapid recursive self improvement.
4. In the other direction, the thing that's most worrying is that people like Altman think there's a non-zero chance they're going to destroy humanity and they're still doing it. I don't think he's going to destroy humanity, but if I really thought he was right that would be really scary. So what is _he_ doing?
given the choice between taking risk seriously and making a bunch of money--what choice would most people take? Altman's choice.
That's true but most people in that position don't go around saying that they might be about to kill us all.
Altman is kind of a Trumpish figure—he talks so much you never know what he'll say! But apparently very charming and charismatic. God save us from charisma!
Yes, unfortunately there was no one to save us from the charisma of Obama, Clinton, Bush et al.
Bush had charisma? I must have blinked when it surfaced. ;-)
The only good point you made. Point granted.