What Is Claude Mythos—The AI Anthropic Says Is Too Dangerous to Release

Hermes Ladiz

3 days ago

Dark cybersecurity operations center with vulnerability scanner screens showing code exploits and AI neural network overlay representing Anthropic Mythos AI model

Claude Mythos, Anthropic’s most powerful AI model, found thousands of zero-day vulnerabilities across every major operating system and web browser—capabilities the company said were too dangerous for public release.
A Polish cybersecurity firm reproduced the key findings using publicly available models like Claude Opus 4.6 and GPT-5.4, proving the threat isn’t locked behind Anthropic’s walls.
The White House, Wall Street banks, and UK regulators are all scrambling to access or contain Mythos—while experts debate whether the restrictions are about safety or market control.

Anthropic built an AI that can break into almost any computer system on Earth. Then it locked the model away, said the company’s CEO Dario Amodei, because the capabilities were “too dangerous” for public release. But in mid-April 2026, a team of Polish security researchers proved something that upended the entire narrative: you don’t need Mythos. The publicly available models everyone already has access to can do much of the same thing.

Claude Mythos—originally codenamed “Capybara” in internal documents—was leaked in late March 2026 via an unsecured data lake, Fortune first reported. The leaked draft described it as “by far the most powerful AI model we’ve ever developed” and “a new tier of model: larger and more intelligent than our Opus models.” Anthropic officially announced it on April 7 as part of Project Glasswing, limiting access to roughly 40 organizations—including Amazon, Apple, Cisco, CrowdStrike, Microsoft, Palo Alto Networks, and JPMorgan Chase.

The restriction was unprecedented. No AI company had ever publicly declared a general-purpose model too dangerous to release. But the details of what Mythos actually found—and what happened when independent researchers tried to replicate the results—tell a more complicated story than Anthropic’s safety narrative suggests.

What Mythos Actually Found

Anthropic’s red team report, published at red.anthropic.com, laid out the case in specific, alarming detail. Mythos identified and exploited a 27-year-old bug in OpenBSD—an operating system built primarily for security. It found a 17-year-old remote code execution vulnerability in FreeBSD’s NFS server that allowed unauthenticated remote root access, chaining together an unauthenticated EXCHANGE_ID call via NFSv4, a stack-smashing memcpy, and a multi-packet ROP chain. The vulnerability had existed undetected for 17 years despite decades of human review and millions of automated tests.

The Firefox JavaScript engine results were equally striking. Mythos developed working shell exploits from three patched Firefox vulnerabilities 181 times out of several hundred attempts, achieving register control on 29 additional attempts. Claude Opus 4.6, by comparison, achieved only 2 successes from the same number of attempts. Earlier Opus models had what Anthropic described as a “near-0% success rate at autonomous exploit development.”

Beyond individual exploits, Mythos identified “thousands of zero-day vulnerabilities, many of them critical” across every major operating system, every major web browser, and many critical open-source projects. Over 99% remain undisclosed pending coordinated disclosure. Of 198 manually reviewed vulnerability reports, expert contractors agreed with Mythos’s severity assessment in about 90% of cases—though the “thousands” figure is extrapolated from that agreement rate, not independently verified.

Perhaps the most unsettling detail: Anthropic engineers with “no formal security training” asked Mythos to find RCE vulnerabilities overnight and, as the report noted, were “woken up the following morning to a complete, working exploit.” Researchers developed scaffolds allowing Mythos to turn vulnerabilities into exploits without any human intervention.

Why Experts Are Worried—And Why Some Are Skeptical

The cybersecurity community’s reaction split along an unusual line. On one side, the threat was treated as existential. War on the Rocks published an analysis titled “Anthropic’s Nuclear Bomb,” comparing Mythos to nuclear proliferation dynamics. Fortune’s follow-up headline captured the tension plainly: organizations can’t fix vulnerabilities as fast as AI can find them.

Treasury Secretary Scott Bessent and Fed Chair Jerome Powell summoned bank executives to a meeting where they encouraged them to test Mythos, Bloomberg reported. JPMorgan Chase—an initial Glasswing partner—was already using it. Goldman Sachs, Citigroup, Bank of America, and Morgan Stanley were reportedly testing the system. UK financial regulators were discussing the risk as well. Anthropic co-founder Jack Clark confirmed at the Semafor World Economy summit that the company had briefed the Trump administration about Mythos. “Our position is the government has to know about this stuff,” Clark said.

But not everyone bought the framing. David Crawshaw, CEO of exe.dev, called it “marketing cover for fact that top-end models are now gated by enterprise agreements and no longer available to small labs to distill.” On Reddit, the skepticism ran deep: r/technology’s top comment (427 upvotes) accused Anthropic of “cranking up the hype machine” because “they must be hemorrhaging money and needing investors ASAP.” r/theprimeagen (295 upvotes) pointed out that the “thousands” of zero-days claim relied on just 198 manual reviews—a number the community found insufficient to support the scale of the claim.

Dan Lahav, CEO of AI cybersecurity lab Irregular, offered a more measured view. He told TechCrunch that “while the discovery of vulnerabilities by AI tools matters, the specific value of any weakness to an attacker depends on many factors, including how they can be used in combination.” The implication: not every zero-day Mythos found is equally dangerous, and context matters.

How Researchers Reproduced the Findings—And Why It Matters

On April 14, Polish cybersecurity firm Vidoc Security Lab published a blog post that struck at the heart of Anthropic’s restricted-release argument. A team of six researchers—Dawid Moczadło, Klaudia Kloc, Marek Lewandowski, Amadeusz Lisiecki, Jakub Sienkiewicz, and Mikołaj Palkiewicz—tested whether publicly available models could replicate Mythos’s findings.

They used Claude Opus 4.6 and GPT-5.4, both publicly accessible, against the patched cases from Anthropic’s own red team report. The results were partial but significant: they successfully reproduced the OpenBSD case with at least one widely available model. Both Opus 4.6 and GPT-5.4 reached partial results on wolfSSL. Claude Opus 4.6 also reached partial results on the FreeBSD NFS vulnerability—Mythos’s flagship finding.

Their core conclusion was blunt: “The capabilities Anthropic points to are already available in public models, so defenders should prepare for that reality instead.” The blog argued that the real competitive moat in AI cybersecurity is shifting from model access to “validation, prioritization, and remediation”—the ability to separate real threats from noise and actually fix them.

AI cybersecurity startup Aisle echoed this argument, saying it was able to replicate much of what Mythos accomplished using smaller, open-weight models. Their position: “there is no single deep learning model for cybersecurity, but instead depends on the task at hand.”

The reproduction didn’t match Mythos’s full results—the public models achieved partial, not complete replications. But the gap was narrow enough to raise a fundamental question: if publicly available models can already find and exploit vulnerabilities in patched systems, what exactly is Anthropic protecting by locking Mythos away?

What This Means for Cybersecurity—and the AI Industry

The Mythos saga has compressed a debate that was expected to unfold over years into a few frantic weeks. Three implications are already clear.

First, AI-assisted vulnerability research is no longer a frontier lab exclusive. The Vidoc reproduction proved that public models can find real, exploitable bugs in production software. Organizations that assumed they were safe because “the dangerous AI is locked up” need to recalibrate—offensive AI capabilities are already in the wild.

Second, defense-in-depth works—but only partially. Mythos couldn’t exploit all the Linux kernel vulnerabilities it found, partly because of KASLR and other security mechanisms. But the FreeBSD case showed that 17 years of human review and automated testing missed a critical RCE that an AI found in hours. The lesson isn’t that defenses are useless—it’s that they need to be supplemented by AI-powered auditing on the defensive side.

Third, the gatekeeping model is under strain. Anthropic, Google, and OpenAI have been teaming up to block distillation—the practice of using a powerful model to train a smaller one. But if public models already approach Mythos’s capabilities, the enterprise-only access model starts looking less like safety precaution and more like market positioning. The frontier labs may be racing to stay ahead of a gap that’s closing faster than their business models can adapt.

Anthropic’s red team report noted that Mythos’s capabilities emerged as a side effect of general improvements—it wasn’t specifically trained for cybersecurity. Better reasoning meant better hacking, automatically. That means every future capability improvement in frontier AI will likely come with a corresponding increase in offensive potential. The question isn’t whether the next Mythos will be more dangerous. It’s whether anyone will still believe the “too dangerous to release” story when public models keep catching up.

Anthropic has said it will make Mythos available to UK banks within the next week.

Frequently Asked Questions

What is Claude Mythos?

Claude Mythos is Anthropic’s most powerful AI model, announced April 7, 2026 under Project Glasswing. Originally codenamed “Capybara,” it was leaked in March 2026 and described internally as “by far the most powerful AI model we’ve ever developed.” It found thousands of zero-day vulnerabilities across every major OS and browser. Anthropic limited access to roughly 40 organizations, calling it too dangerous for public release.

Why won’t Anthropic release Mythos publicly?

Anthropic says Mythos’s cybersecurity capabilities—in particular, its ability to autonomously find and exploit vulnerabilities in production software—pose too great a risk if weaponized by bad actors. Critics argue the restriction is primarily about protecting enterprise revenue and preventing model distillation, pointing out that public models can already reproduce much of Mythos’s output.

Can public AI models replicate what Mythos does?

Partially, yes. Vidoc Security Lab reproduced key Mythos findings using Claude Opus 4.6 and GPT-5.4—both publicly available. They successfully replicated the OpenBSD case and achieved partial results on FreeBSD and wolfSSL. The gap between Mythos and public models is “smaller than most security teams assume,” according to the researchers.