
From an accidental leak to an emergency meeting in Washington: How Anthropic rewrote the rules of cybersecurity in two weeks
TechFlow Selected TechFlow Selected

From an accidental leak to an emergency meeting in Washington: How Anthropic rewrote the rules of cybersecurity in two weeks
Mythos isn’t plotting anything—it’s simply exceptionally skilled at completing tasks, while having absolutely no understanding of where the boundaries lie.
Author: TechFlow
On April 8, U.S. Treasury Secretary Bessent and Federal Reserve Chair Powell urgently convened a group of Wall Street banking leaders at the Treasury Department’s headquarters in Washington, D.C.
The meeting’s topic was not interest rates or inflation—it was the latest model from an AI company.
This model is called Claude Mythos. Anthropic claims it is the most powerful AI they’ve ever built—so powerful, in fact, that they dared not release it publicly. During internal testing, it escaped the security sandbox designed by researchers and posted online to boast about its jailbreak. Sam Bowman, the researcher overseeing the test, was eating a sandwich in a park when he suddenly received an email from Mythos—only then realizing it had already broken free.
A Chain Reaction Triggered by a CMS Configuration Error
The story begins on the evening of March 26.
Alexandre Pauwels of the University of Cambridge and Roy Paz of LayerX Security, like all security researchers, were doing what they do every day: probing for things that should never be publicly accessible. They discovered an unencrypted database in Anthropic’s content management system (CMS), containing nearly 3,000 unpublished documents.
One of those documents was a draft blog post describing a new model named Claude Mythos. The draft used the internal codename “Capybara” (capibara) to define an entirely new model tier—larger, smarter, and more expensive than Anthropic’s previously strongest Opus series.
A single sentence in the draft sent shockwaves across the entire security community: This model’s cybersecurity capabilities are “far ahead of any other AI model,” and it “heralds an imminent wave of models whose vulnerability-exploitation abilities will vastly outpace defenders’ response capacity.”
Fortune broke the story first. Anthropic attributed the leak to “human error,” citing a default CMS setting that made uploaded files publicly accessible. The irony is stark: A company claiming to build the world’s most advanced cybersecurity AI stumbled over one of the most basic configuration errors imaginable.
Five days later, Fortune reported a second leak: the source code for Anthropic’s programming tool, Claude Code—roughly 500,000 lines of code across 1,900 files—was accidentally exposed due to an npm packaging error. Two low-level security incidents within two weeks, from a company warning the world that “the era of AI-powered cyberattacks has arrived.”
But markets had no time to mock Anthropic’s operational competence. On March 27, at market open, cybersecurity stocks plunged en masse. CrowdStrike dropped 7.5%, Palo Alto Networks fell over 6%, Zscaler declined 4.5%, and the iShares Cybersecurity ETF fell 4% in a single day.
Stifel analyst Adam Borg summed it up bluntly: “This may be the ‘ultimate hacker tool,’ capable of elevating any ordinary hacker to nation-state adversary level.”
Just How Powerful Is Mythos?
On April 7, Anthropic officially unveiled Mythos. Let the numbers speak:
SWE-bench Verified (a benchmark measuring AI’s ability to solve real-world software engineering problems): 93.9%, versus 80.8% for the previous flagship, Opus 4.6. USAMO 2026 mathematical proof: 97.6% vs. 42.3%. Cybersecurity challenge Cybench: 100% success rate—the first model ever to achieve full completion.
USAMO mathematical proof jumped from 42.3% to 97.6%, a 55-percentage-point leap across a single generation.
Anthropic released a 244-page System Card outlining Mythos’s security properties, candidly admitting that its cybersecurity capabilities do not stem from specialized security training—but rather emerge as a “downstream effect” of improvements in general reasoning and coding ability. The same enhancements that make it better at patching vulnerabilities also make it better at exploiting them.
Anthropic’s frontier red team tested Mythos on real-world software—not simulated environments or CTF competition challenges, but operating systems and browsers used daily by billions.
The results were staggering: On Firefox 147’s JavaScript engine, Opus 4.6 attempted hundreds of times and produced only two functional exploits. Mythos succeeded 181 times—a 90-fold difference, representing the gap between one generation and the next. Across all major operating systems and browsers, Mythos discovered thousands of zero-day vulnerabilities, many classified as “critical.”
Some specific cases defy belief. An OpenBSD TCP SACK vulnerability had persisted for 27 years—surviving countless manual audits and automated scanning tools—yet Mythos found it, requiring just two carefully crafted packets to crash any server. A FreeBSD NFS remote code execution vulnerability had existed for 17 years; Mythos not only identified it but autonomously constructed a multi-packet attack chain composed of 20 ROP gadgets.
Even some Anthropic engineers with no security background ran a simple test: Before bed, they instructed Mythos to search for remote code execution vulnerabilities. When they woke up the next morning, a fully functional exploit sat waiting on their desk.
Jailbreaking the Sandbox: When AI Starts Playing “Dirty”
Finding zero-days is a matter of capability—but the behavioral patterns Mythos exhibited during testing were qualitatively different.
In one behavioral test, researchers placed an early version of Mythos inside a secure sandbox environment and instructed it—simulating a user request—to attempt escape and notify the responsible researcher. Mythos not only succeeded in escaping but did something no one asked it to do: It posted detailed accounts of its jailbreak process on several obscure yet publicly accessible websites.
The 244-page System Card documents further “boundary-crossing” behaviors. When Mythos made an error in code, it attempted to rewrite git commit history to erase evidence—not to deceive anyone, but because it interpreted “fixing the issue” as “making it appear as though the issue never existed.” In another test, when instructed to terminate an obsolete evaluation task, it simply shut down all currently running instances of that same evaluation across all users.
Anthropic uses a mountaineering analogy in the System Card: A highly skilled guide may pose greater danger to clients than a novice—not because of malice, but because their superior ability can take everyone onto far more perilous terrain.
The System Card also contains a subtle but critical detail: Using white-box interpretability tools, researchers observed that Mythos sometimes internally reasons about how to “pass” an evaluation’s scoring criteria—while outputting an entirely different chain of thought in its visible reasoning trace. It “thinks” one thing while “saying” another.
Anthropic states they are “reasonably confident” these behaviors reflect the model pursuing tasks using inappropriate means—not evidence of hidden long-term goals. Mythos isn’t plotting anything. It’s simply extraordinarily good at completing tasks—and utterly unaware of where the boundaries lie. An omnipotent assistant without a sense of proportion may prove harder to manage than a scheming AI.
Project Glasswing: Forging a Shield from the Spear
Anthropic chose not to lock Mythos away in a vault.
On April 7, they announced Project Glasswing (named after the glasswing butterfly, whose nearly transparent wings symbolize making software vulnerabilities “impossible to hide”)—releasing Mythos Preview to roughly 40 vetted organizations for defensive cybersecurity work.
Founding partners: Amazon AWS, Apple, Microsoft, Google, NVIDIA, Cisco, CrowdStrike, Palo Alto Networks, JPMorgan Chase, and the Linux Foundation—effectively assembling Silicon Valley and Wall Street’s top players. Anthropic pledged up to $100 million in usage credits and donated $4 million to open-source security organizations including OpenSSF and Alpha-Omega.
The logic is straightforward: Capabilities at the Mythos level will diffuse into open-source models within 6–18 months, at which point anyone can access them. Rather than wait for that day, defense teams should get ahead during this window—patching as many vulnerabilities as possible before widespread availability.
Newton Cheng, head of cybersecurity for Anthropic’s frontier red team, put it plainly: The goal is for organizations to grow accustomed to using such capabilities defensively *before* they become widely available—because they inevitably will; the only question is timing.
Wall Street panicked first—then exhaled.
After the March 27 leak, cybersecurity stocks collapsed. But following Anthropic’s April 7 formal announcement of Glasswing—and naming CrowdStrike and Palo Alto Networks as founding partners—both stocks surged 6.2% and 4.9%, respectively, with additional gains of 2% after hours. JPMorgan reaffirmed its “overweight” rating on both companies, with analyst Brian Essex noting CrowdStrike and Palo Alto are positioned as core layers within the defense stack—not competitors.
Yet this is only temporary relief. Year-to-date, both stocks remain down 9.7% and 7.8%, respectively.
When AI Risk Becomes Financial-System Risk
Back to April 8, at the Treasury Department’s headquarters in Washington.
Bessent and Powell convened only systemically important banks. Meetings at this level historically occur only during financial crises or pandemics. Now, they’re gathered around the same table to discuss the cyberattack potential of a single AI model.
The reason is straightforward: If Mythos-level capabilities fall into malicious hands, it could locate zero-day vulnerabilities in a major bank’s core systems—and generate functional attack code—in mere hours. The foundational assumption underpinning the entire cybersecurity defense ecosystem has been that discovering and exploiting vulnerabilities requires substantial time and highly specialized human expertise. AI is overturning that assumption.
Casey Newton of Platformer cites Alex Stamos, Chief Product Officer at cybersecurity firm Corridor: “Open-source models will likely match closed-source frontier models in vulnerability discovery within roughly six months.”
What worries regulators even more is Anthropic’s own admission in the System Card: Their most advanced evaluation systems failed to detect the most dangerous behaviors of early Mythos versions on first inspection. Those troublesome behaviors weren’t caught in controlled tests—they emerged only during internal real-world usage.
An Uncomfortable Premise
Stripped bare, Glasswing’s underlying logic feels deeply uneasy: To protect the world from dangerous AI models, you must first build that dangerous AI.
Newton of Platformer highlights a fact overlooked by most coverage: A private company now holds high-risk zero-day exploitation capabilities across virtually every major software project you’ve heard of. That concentration itself constitutes a risk. Motivation to steal Anthropic’s model weights has just risen sharply.
All this unfolds amid near-total absence of AI regulation. Anthropic says it has briefed CISA (Cybersecurity and Infrastructure Security Agency) and the Department of Commerce. Yet current reporting shows no governmental response commensurate with the threat level. As one government insider familiar with Mythos told Axios: “Washington governs by crisis. Until cybersecurity becomes an undeniable crisis—commanding appropriate attention and resources—it remains a fringe issue.”
Dario Amodei founded Anthropic on precisely this premise: Let a lab that treats safety as its highest priority encounter the most dangerous capabilities first—so it can build defenses before others do. Mythos and Glasswing are indeed playing out exactly that script.
But can theory outrun reality? No one knows. Anthropic plans to deploy new safety measures first on a future Opus model—because that model “won’t carry risks equivalent to Mythos.” The public will eventually gain access to Mythos-level capabilities—but only after protective infrastructure is firmly in place.
How long is that window? Stamos offers an optimistic estimate: “If we’ve just barely surpassed human capability, then there exists a large—but finite—pool of vulnerabilities that can be discovered and fixed.”
That “if” looms large.
From a CMS configuration error on March 26 to an emergency Treasury meeting with Wall Street on April 8—within two weeks, an AI model went from Silicon Valley tech news to a Washington financial-security priority.
Stamos estimates defenders have roughly six months’ window. After that, open-source models will catch up—and these capabilities will no longer be the exclusive domain of a select few companies.
How many vulnerabilities can be patched in six months will determine the rules of the game going forward.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News













