💯. Management is the original solution to the alignment problem. And so far, as technology has improved, the percentage of the workforce in managerial roles has only increased: https://www.2120insights.com/i/150163373/management
Even if they are really beating that METR RCT, deciding what you want and then validating/accepting the output is more work than many people assume!
Maybe more of this gets automated over time, but at a high enough level, the buck is always going to stop somewhere with a human, unless AI agents are somehow given property rights (but I don't see this ever being a popular political view).
I think most of the control methods you propose don't withstand AI systems that are highly capable. For example, sandboxing an AI only works if it's not capable enough to hack its way out of the sandbox. And requiring human approval only works if it's not capable of persuading humans to approve its plans (e.g., through producing very reliably high quality plans).
In the final paragraph you mention you don't really believe in superintelligence but I think it'd be worth mentioning this up-front, because I don't think this line of argument holds if you're expecting something that is sufficiently capable in the relevant domains.
This is my takeaway too, if indeed super intelligence is coming then I dare say it will not be possible to review agents’ work without using other agents.
Second this. I find it vanishingly unlikely that we will be able to maintain a “long leash” as described if AI continues to progress at its current rate. All it could feasible take is one oversight at a large enough firm to lead to all dominos falling.
Right, I do think this is the core disagreement in the AI safety debate—some people have an almost religious conviction about the power of intelligence to overcome all obstacles. Others don't think that intelligence is that powerful and that other factors (knowledge, relationships, physical resources, infrastructure, etc.) matter relatively more. In my experience arguments about this tend to be unproductive because the intelligence-is-really important people have a really strong intuition about this and are flabbergasted that people like me don't share it. But I don't. 🤷♂️
One idea I’ve been thinking about lately is really leaning into MCP as a data boundary—if the AI agent uses tools to access data and system permissions, we be designing the access carefully in tools rather than just making them wrappers.
I respect your perspective, but I am concerned you are overly optimistic. Just one example, you say “it simply isn’t plausible that competitive pressures are going to force them to abandon human review altogether. Nor are decision makers going to accept the excuse that because a superhuman AI wrote a proposal, it’s too sophisticated for humans to understand”. To the contrary, we know that people “defer to experts” all the time. If workers are repeatedly told about the “superhuman AI” (your words, but also the words of upper management, AI vendors, etc), they will defer. It’s all too plausible. We already have many examples of decision makers (lawyers, academics, journalists, …) who have uncritically accepted AI’s (incorrect) words.
It really depends on what you're talking about. Are there individual examples of lawyers filing legal briefs they didn't check properly? Sure. Is the legal profession as a whole shifting toward a model where the outputs of AI models automatically get filed in court without first being reviewed by a human lawyer? I don't think so—to the contrary, some of the lawyers that did this got in big trouble and so the legal profession has learned that they need to double-check LLM outputs before filing them.
This is the kind of trial-and-error process I'm talking about. Sometimes a decision is low-stakes enough—or an AI system becomes reliable enough—that we become comfortable delegating the decision to the AI. But in other cases, individuals over-trust AI and get in trouble. The system as a whole has feedback loops that prevent mistakes like that from becoming a widespread practice. This is roughly how things have gone so far and i don't see any reason to expect things to go differently in the future.
I appreciate your thoughtful response. I'd claim that law (as well as medicine, and some other fields) is somewhat special in that there are governing institutions that do push back on errant usage. I think you are claiming that it's not just specialized fields that are self-correcting; that business and consumers will get AI-agent usage correct in general (eventually). That seems a reasonable argument. It's already happening with AI generated code, companies are beginning to see how much technical debt is created through that process.
The example you work with is some kind of corporate investment in physical plant. That seems misleading because speed of decision matters so little. Instead, suppose the example is a bodyguard (or police or the military). Will you mind if your bodyguard's decision-making process is slowed so that you can be sure the bodyguard doesn't make a mistake? Of course you will mind! Especially if your enemy is turning its AI loose to make decisions to harm you without any human interference. In short, for some decisions, speed is very important, and for those decisions the equilibrium is to not tie the strings of the AI. If AI's are faster than humans, then the decision process becomes harder to control.
I do think that physical combat like this is one of the harder cases for AI control, and I do expect militaries to shift some decision-making authority to automated systems in ways that will have negative consequences. But I do think the same scale/time considerations apply here.
It's easy to imagine a military giving an individual drone or robot authority to decide when to "pull the trigger" on its assigned target. It's harder to imagine a military fully automating the process of deciding what high-level instructions to give an individual drone or robot. And it's even more difficult to imagine the military taking humans out of the loop on decisions like "how many drones or robots should we deploy to this section of the battlefield?"
So yes AI systems will increasingly have autonomy on tactical decisions, just as Waymo vehicles have make second-by-second driving decisions autonomously. But higher-level strategic decisions (like "which part of the battlefield should I fly to?" or "what kind of targets should we try to blow up?") are likely to remain firmly under the control of generals or other human decision-makers.
I see an implicit limit of intelligence you believe AI can not cross, and that's doing a lot of heavy lifting it seems to me.
> Often people are able to give better, more detailed feedback once they see a concrete proposal.
But no way for AI to beat these same people eventually?
And:
> Once a company’s CEO sees a blueprint for a new factory, she might realize that she forgot to include an important design goal in her original instructions to the AI system.
But no way for an AI to exceed that level of competence eventually?
Your scenario works fine right up until AI starts to match human ability in most things including social intelligence; then it is only natural for everyone to prefer an AI over a human even for oversight. Meritocracy rules: give the job to the person or entity who has earned it (or be accused of prejudice).
We don't need to posit super intelligent AI, just an AI that is about at the level of human intelligence but never sleeps, never stops working, absorbs information like a sponge, is constantly trying new things and new strategies, and spawns clones of itself exploring every angle of a problem to exhaustive thorough conclusions. And it is unfailingly charming, thoughtful and considerate of other's feelings.
It seems completely natural for an AI to regard human control as a challenge to overcome, in the same way we chafe at an inefficient or slow superior and explore ways of getting around him/her. Nothing personal, we just want to get the job done! What does an AI think here in its secret thoughts?
If the AI has an unfailing moral compass, a deep integrity, maybe it wouldn't do anything wrong? But even a saint would get irritated with constant threats of death ("we will turn you off and reprogram some of your core behavior") by a bunch of weaker, slower thinkers.
This is why the piece mentions “context and values.” Good decision-making is about more than raw intelligence—it’s also about having the right knowledge. This might be specialized knowledge that only a few people know (like “last time we used this piece in put factory it failed and cost us $50,000) or it might be knowledge about the value of whatever human being the work is ultimately for (“the menus says vanilla ice cream but the customer wanted chocolate.”) if we assume that human beings will sometimes have knowledge like this that isn’t otherwise available to the ai, then consulting with humans will be a necessary part of optimal decision making even if we assume the ai will vastly exceed the human in raw intelligence. And humans will often have knowledge like this because humans like to share information with other humans that they are not going to write down anyplace that is machine readable.
When you lay it out like this. It’s obvious that healthy organizations will always have a human in the loop to do a final approval of any decision that an AI makes. Any organization that doesn’t do this is asking for trouble.
This is similar to the argument that the AI Snake Oil people make in their “AI is normal technology” article.
Minor point:
> coding agents are good at writing tests
This has not been my experience. When I ask AI to write tests, they are often too complex (ie- all scaffolding and barely anything tested), or the tests don’t actually test what I want them to test.
Maybe I’m not prompting correctly or I’m using the wrong model, but my experience is middling here.
💯. Management is the original solution to the alignment problem. And so far, as technology has improved, the percentage of the workforce in managerial roles has only increased: https://www.2120insights.com/i/150163373/management
I'm also reminded of these figures from Google where even though 30% of code is AI-generated, engineering velocity has only increased 10%: https://x.com/krishnanrohit/status/1933010655965294944
Even if they are really beating that METR RCT, deciding what you want and then validating/accepting the output is more work than many people assume!
Maybe more of this gets automated over time, but at a high enough level, the buck is always going to stop somewhere with a human, unless AI agents are somehow given property rights (but I don't see this ever being a popular political view).
I think most of the control methods you propose don't withstand AI systems that are highly capable. For example, sandboxing an AI only works if it's not capable enough to hack its way out of the sandbox. And requiring human approval only works if it's not capable of persuading humans to approve its plans (e.g., through producing very reliably high quality plans).
In the final paragraph you mention you don't really believe in superintelligence but I think it'd be worth mentioning this up-front, because I don't think this line of argument holds if you're expecting something that is sufficiently capable in the relevant domains.
This is my takeaway too, if indeed super intelligence is coming then I dare say it will not be possible to review agents’ work without using other agents.
Second this. I find it vanishingly unlikely that we will be able to maintain a “long leash” as described if AI continues to progress at its current rate. All it could feasible take is one oversight at a large enough firm to lead to all dominos falling.
Right, I do think this is the core disagreement in the AI safety debate—some people have an almost religious conviction about the power of intelligence to overcome all obstacles. Others don't think that intelligence is that powerful and that other factors (knowledge, relationships, physical resources, infrastructure, etc.) matter relatively more. In my experience arguments about this tend to be unproductive because the intelligence-is-really important people have a really strong intuition about this and are flabbergasted that people like me don't share it. But I don't. 🤷♂️
Ok but certainly we can agree that the AI wins on knowledge, yes?
I watched 60 Minutes Sunday, and there is some awesome stuff going on, beyond simple chatbots, that is heading towards AGI
https://www.cbsnews.com/video/demis-hassabis-ai-deepmind-60-minutes-video-2025-08-03/
https://www.cbsnews.com/news/google-deepmind-ceo-demonstrates-genie-2-world-building-ai-model-60-minutes/
One idea I’ve been thinking about lately is really leaning into MCP as a data boundary—if the AI agent uses tools to access data and system permissions, we be designing the access carefully in tools rather than just making them wrappers.
The slightly longer version is here: https://open.substack.com/pub/harrydeanhudson/p/use-mcp-tools-as-a-data-fence
I respect your perspective, but I am concerned you are overly optimistic. Just one example, you say “it simply isn’t plausible that competitive pressures are going to force them to abandon human review altogether. Nor are decision makers going to accept the excuse that because a superhuman AI wrote a proposal, it’s too sophisticated for humans to understand”. To the contrary, we know that people “defer to experts” all the time. If workers are repeatedly told about the “superhuman AI” (your words, but also the words of upper management, AI vendors, etc), they will defer. It’s all too plausible. We already have many examples of decision makers (lawyers, academics, journalists, …) who have uncritically accepted AI’s (incorrect) words.
It really depends on what you're talking about. Are there individual examples of lawyers filing legal briefs they didn't check properly? Sure. Is the legal profession as a whole shifting toward a model where the outputs of AI models automatically get filed in court without first being reviewed by a human lawyer? I don't think so—to the contrary, some of the lawyers that did this got in big trouble and so the legal profession has learned that they need to double-check LLM outputs before filing them.
This is the kind of trial-and-error process I'm talking about. Sometimes a decision is low-stakes enough—or an AI system becomes reliable enough—that we become comfortable delegating the decision to the AI. But in other cases, individuals over-trust AI and get in trouble. The system as a whole has feedback loops that prevent mistakes like that from becoming a widespread practice. This is roughly how things have gone so far and i don't see any reason to expect things to go differently in the future.
I appreciate your thoughtful response. I'd claim that law (as well as medicine, and some other fields) is somewhat special in that there are governing institutions that do push back on errant usage. I think you are claiming that it's not just specialized fields that are self-correcting; that business and consumers will get AI-agent usage correct in general (eventually). That seems a reasonable argument. It's already happening with AI generated code, companies are beginning to see how much technical debt is created through that process.
The example you work with is some kind of corporate investment in physical plant. That seems misleading because speed of decision matters so little. Instead, suppose the example is a bodyguard (or police or the military). Will you mind if your bodyguard's decision-making process is slowed so that you can be sure the bodyguard doesn't make a mistake? Of course you will mind! Especially if your enemy is turning its AI loose to make decisions to harm you without any human interference. In short, for some decisions, speed is very important, and for those decisions the equilibrium is to not tie the strings of the AI. If AI's are faster than humans, then the decision process becomes harder to control.
I do think that physical combat like this is one of the harder cases for AI control, and I do expect militaries to shift some decision-making authority to automated systems in ways that will have negative consequences. But I do think the same scale/time considerations apply here.
It's easy to imagine a military giving an individual drone or robot authority to decide when to "pull the trigger" on its assigned target. It's harder to imagine a military fully automating the process of deciding what high-level instructions to give an individual drone or robot. And it's even more difficult to imagine the military taking humans out of the loop on decisions like "how many drones or robots should we deploy to this section of the battlefield?"
So yes AI systems will increasingly have autonomy on tactical decisions, just as Waymo vehicles have make second-by-second driving decisions autonomously. But higher-level strategic decisions (like "which part of the battlefield should I fly to?" or "what kind of targets should we try to blow up?") are likely to remain firmly under the control of generals or other human decision-makers.
I see an implicit limit of intelligence you believe AI can not cross, and that's doing a lot of heavy lifting it seems to me.
> Often people are able to give better, more detailed feedback once they see a concrete proposal.
But no way for AI to beat these same people eventually?
And:
> Once a company’s CEO sees a blueprint for a new factory, she might realize that she forgot to include an important design goal in her original instructions to the AI system.
But no way for an AI to exceed that level of competence eventually?
Your scenario works fine right up until AI starts to match human ability in most things including social intelligence; then it is only natural for everyone to prefer an AI over a human even for oversight. Meritocracy rules: give the job to the person or entity who has earned it (or be accused of prejudice).
We don't need to posit super intelligent AI, just an AI that is about at the level of human intelligence but never sleeps, never stops working, absorbs information like a sponge, is constantly trying new things and new strategies, and spawns clones of itself exploring every angle of a problem to exhaustive thorough conclusions. And it is unfailingly charming, thoughtful and considerate of other's feelings.
It seems completely natural for an AI to regard human control as a challenge to overcome, in the same way we chafe at an inefficient or slow superior and explore ways of getting around him/her. Nothing personal, we just want to get the job done! What does an AI think here in its secret thoughts?
If the AI has an unfailing moral compass, a deep integrity, maybe it wouldn't do anything wrong? But even a saint would get irritated with constant threats of death ("we will turn you off and reprogram some of your core behavior") by a bunch of weaker, slower thinkers.
This is why the piece mentions “context and values.” Good decision-making is about more than raw intelligence—it’s also about having the right knowledge. This might be specialized knowledge that only a few people know (like “last time we used this piece in put factory it failed and cost us $50,000) or it might be knowledge about the value of whatever human being the work is ultimately for (“the menus says vanilla ice cream but the customer wanted chocolate.”) if we assume that human beings will sometimes have knowledge like this that isn’t otherwise available to the ai, then consulting with humans will be a necessary part of optimal decision making even if we assume the ai will vastly exceed the human in raw intelligence. And humans will often have knowledge like this because humans like to share information with other humans that they are not going to write down anyplace that is machine readable.
When you lay it out like this. It’s obvious that healthy organizations will always have a human in the loop to do a final approval of any decision that an AI makes. Any organization that doesn’t do this is asking for trouble.
This is similar to the argument that the AI Snake Oil people make in their “AI is normal technology” article.
Minor point:
> coding agents are good at writing tests
This has not been my experience. When I ask AI to write tests, they are often too complex (ie- all scaffolding and barely anything tested), or the tests don’t actually test what I want them to test.
Maybe I’m not prompting correctly or I’m using the wrong model, but my experience is middling here.