I've long wished that conversations about AI risk would focus on the specific risks that seem most plausible. To me that list is fraud, drone warfare, softness in the labor market, and cybersecurity. I wish this focus on cybersecurity had come a lot earlier, but I applaud Anthropic's attempts to address it. This seems like a very serious attempt.
The good guys look well positioned to win in the long term. Let's hope for few problems in the short term and Godspeed to the cybersecurity professionals who do this work. If we're very lucky this can be another y2k.
Excellent write-up! But not exactly new news. See below. Just as humans are going to be too slow to conduct warfare with AIs, humans are going to have to be eliminated from coding and QA. Humans are too much of a security risk.
I recall reading some story where AIs are everywhere, battling each other. Peoples AI driven devices, some implanted, are constantly working to protect themselves from viruses from other AIs.
----
Exclusive: Anthropic's new model is a pro at finding security flaws
Sam Sabin
5 Feb 2026
Anthropic's latest AI model has found more than 500 previously unknown high-severity security flaws in open-source libraries with little to no prompting, the company shared first with Axios.
Why it matters: The advancement signals an inflection point for how AI tools can help cyber defenders, even as AI is also making attacks more dangerous.
Driving the news: Anthropic debuted Claude Opus 4.6, the latest version of its largest AI model, on Thursday.
• Before its debut, Anthropic's frontier red team tested Opus 4.6 in a sandboxed environment to see how well it could find bugs in open-source code.
• The team gave the Claude model everything it needed to do the job — access to Python and vulnerability analysis tools, including classic debuggers and fuzzers — but no specific instructions or specialized knowledge.
• Claude found more than 500 previously unknown zero-day vulnerabilities in open-source code using just its "out-of-the-box" capabilities, and each one was validated by either a member of Anthropic's team or an outside security researcher.
Thanks for pointing the Axios article about 4.6! I agree that previous versions of Claude (particularly 4.6) and other models had the capability to find security flaws. If we take Anthropic's word, the types of exploits that Mythos Preview are capable of is significantly more impressive than the examples Anthropic gave for Opus 4.6. (You can read https://red.anthropic.com/2026/zero-days/ versus https://red.anthropic.com/2026/mythos-preview/).
That being said, one thing I had wished I'd emphasized more is that there isn't a particular reason that Mythos Preview is near the ceiling of model's cybersecurity capability. So I would not be surprised to see models above Mythos capability in 2026
Yeah, every reason to think in 3 months cutting edge models will find a new layer of bugs. It also looks like doubling the amount of CPU spent on finding these bugs would have found a few hundred more, and it's probably very expensive to get to the point where you stop finding them.
Very informative. Thank you for writing. Would it be possible to write an explainer piece about how exactly the model found the vulnerability? What were the prompts, what was the goal, and so forth? What does the structure of, say, OpenBSD look like and how does Claude parse millions of lines of code to get at this vulnerability? I have a hard time wrapping my mind around these questions.
Also, I often wonder, because I have no domain knowledge, how likely these vulnerabilities are likely to be hacked in practice? You mentioned that Firefox and Chrome don't really let a website code get too close to the system. Many crucial systems are also not on the internet. More context would be good. I should say I'm biased to be suspicious of claims like this, because I think they often get inflated like the MITR claim about task length.
I've long wished that conversations about AI risk would focus on the specific risks that seem most plausible. To me that list is fraud, drone warfare, softness in the labor market, and cybersecurity. I wish this focus on cybersecurity had come a lot earlier, but I applaud Anthropic's attempts to address it. This seems like a very serious attempt.
The good guys look well positioned to win in the long term. Let's hope for few problems in the short term and Godspeed to the cybersecurity professionals who do this work. If we're very lucky this can be another y2k.
For the detail-oriented, I highly recommend Anthropic's long post: https://red.anthropic.com/2026/mythos-preview/
Great insights, Kai. Thanks.
Excellent write-up! But not exactly new news. See below. Just as humans are going to be too slow to conduct warfare with AIs, humans are going to have to be eliminated from coding and QA. Humans are too much of a security risk.
I recall reading some story where AIs are everywhere, battling each other. Peoples AI driven devices, some implanted, are constantly working to protect themselves from viruses from other AIs.
----
Exclusive: Anthropic's new model is a pro at finding security flaws
Sam Sabin
5 Feb 2026
Anthropic's latest AI model has found more than 500 previously unknown high-severity security flaws in open-source libraries with little to no prompting, the company shared first with Axios.
Why it matters: The advancement signals an inflection point for how AI tools can help cyber defenders, even as AI is also making attacks more dangerous.
Driving the news: Anthropic debuted Claude Opus 4.6, the latest version of its largest AI model, on Thursday.
• Before its debut, Anthropic's frontier red team tested Opus 4.6 in a sandboxed environment to see how well it could find bugs in open-source code.
• The team gave the Claude model everything it needed to do the job — access to Python and vulnerability analysis tools, including classic debuggers and fuzzers — but no specific instructions or specialized knowledge.
• Claude found more than 500 previously unknown zero-day vulnerabilities in open-source code using just its "out-of-the-box" capabilities, and each one was validated by either a member of Anthropic's team or an outside security researcher.
...
https://www.axios.com/2026/02/05/anthropic-claude-opus-46-software-hunting
Thanks for pointing the Axios article about 4.6! I agree that previous versions of Claude (particularly 4.6) and other models had the capability to find security flaws. If we take Anthropic's word, the types of exploits that Mythos Preview are capable of is significantly more impressive than the examples Anthropic gave for Opus 4.6. (You can read https://red.anthropic.com/2026/zero-days/ versus https://red.anthropic.com/2026/mythos-preview/).
That being said, one thing I had wished I'd emphasized more is that there isn't a particular reason that Mythos Preview is near the ceiling of model's cybersecurity capability. So I would not be surprised to see models above Mythos capability in 2026
Yeah, every reason to think in 3 months cutting edge models will find a new layer of bugs. It also looks like doubling the amount of CPU spent on finding these bugs would have found a few hundred more, and it's probably very expensive to get to the point where you stop finding them.
Grace and peace to you Amigo,
https://claude.ai/public/artifacts/540acc9d-1cf3-4cde-9f00-8915c2060c80
Markets and Planning in the Surveillance Ecosystem ~ 📯💰🇦🇹🐲🤖💸💽⚡🦅📜🐳📋💱🌐
Very informative. Thank you for writing. Would it be possible to write an explainer piece about how exactly the model found the vulnerability? What were the prompts, what was the goal, and so forth? What does the structure of, say, OpenBSD look like and how does Claude parse millions of lines of code to get at this vulnerability? I have a hard time wrapping my mind around these questions.
Also, I often wonder, because I have no domain knowledge, how likely these vulnerabilities are likely to be hacked in practice? You mentioned that Firefox and Chrome don't really let a website code get too close to the system. Many crucial systems are also not on the internet. More context would be good. I should say I'm biased to be suspicious of claims like this, because I think they often get inflated like the MITR claim about task length.