Discussion about this post

User's avatar
Adam's avatar
Aug 4Edited

I think most of the control methods you propose don't withstand AI systems that are highly capable. For example, sandboxing an AI only works if it's not capable enough to hack its way out of the sandbox. And requiring human approval only works if it's not capable of persuading humans to approve its plans (e.g., through producing very reliably high quality plans).

In the final paragraph you mention you don't really believe in superintelligence but I think it'd be worth mentioning this up-front, because I don't think this line of argument holds if you're expecting something that is sufficiently capable in the relevant domains.

Expand full comment
Ethan Heppner's avatar

💯. Management is the original solution to the alignment problem. And so far, as technology has improved, the percentage of the workforce in managerial roles has only increased: https://www.2120insights.com/i/150163373/management

I'm also reminded of these figures from Google where even though 30% of code is AI-generated, engineering velocity has only increased 10%: https://x.com/krishnanrohit/status/1933010655965294944

Even if they are really beating that METR RCT, deciding what you want and then validating/accepting the output is more work than many people assume!

Maybe more of this gets automated over time, but at a high enough level, the buck is always going to stop somewhere with a human, unless AI agents are somehow given property rights (but I don't see this ever being a popular political view).

Expand full comment
52 more comments...

No posts

Ready for more?