Skip to content

When the Off Switch Belongs to the Government

Anthropic, Agentic AI, and the Case for Continuous Governance

When the Off Switch Belongs to the Government
Published:

On the evening of Friday, June 12, 2026, hundreds of millions of people lost access to two of the most powerful artificial intelligence models ever deployed to the public. Not because of a server failure. Not because of a cyberattack. Because the United States government ordered it, and one of the world's most safety-conscious AI companies had no choice but to comply.

The abrupt disabling of Anthropic's Fable 5 and Mythos 5 models is already being debated as a national security story, a geopolitical story, and a corporate governance story. It is all three. But viewed through the lens of what AI enterprise leaders actually need to do right now, it is something more urgent: a defining proof point that governance cannot be episodic. You cannot govern a frontier AI model with a pre-launch checklist. You cannot govern an agentic enterprise with a compliance audit that happens once a year. The Anthropic crisis demonstrates, in real time, what happens when governance ends at deployment, and why continuous governance of the agentic enterprise is no longer a theoretical framework. It is a survival requirement.


The Architecture of a Crisis

To understand what went wrong, it helps to understand the sequence of events - and what they reveal about the structural fault lines running through every advanced AI deployment today.

Anthropic launched Claude Fable 5 on June 9, 2026, describing it as a "Mythos-class" model, the first iteration of a capability tier the company said exceeded anything it had ever made available to the public. The company acknowledged at launch that "releasing a model this capable comes with risks," and built in what it described as deliberately broad cybersecurity guardrails to prevent misuse. Those guardrails were, by Anthropic's own admission, strong enough that many users complained they were too restrictive.

Four days later, the government declared those guardrails insufficient and ordered the model pulled. Commerce Secretary Howard Lutnick sent a letter to CEO Dario Amodei informing him that Fable 5 and Mythos 5 were now subject to export controls, effectively national security-level restrictions on commercial software. The net effect, Anthropic said, was that it had to "abruptly disable" the models not just for foreign nationals but for all customers to ensure compliance.

The proximate cause was a jailbreak. Someone, or some organization, had apparently found a method to bypass the cybersecurity guardrails on Fable 5, allowing the model to identify software vulnerabilities in ways Anthropic had explicitly tried to prevent. The government, which had received this intelligence through Amazon CEO Andy Jassy and other tech leaders who raised concerns directly with senior White House officials, concluded that this narrow exploit posed a national security risk severe enough to justify pulling the entire model from deployment.

Anthropic pushed back, hard. It argued the jailbreak was "narrow" and "non-universal," meaning it couldn't broadly unlock the model's most dangerous capabilities, only specific ones. It argued other publicly available models showed similar abilities. It argued that the government had only provided "verbal evidence of a potential narrow, non-universal jailbreak" and that applying this standard across the industry "would essentially halt all new model deployments for all frontier model providers".

The government was unmoved. Treasury Secretary Scott Bessent, who joined the calls remotely while traveling, told Amodei directly that he was making a "bad decision". Less than 24 hours after those calls, the export control directive was in effect.


The Paradox at the Heart of AI Safety

Here is the irony that should keep every board director, chief AI officer, and enterprise risk manager awake at night: Anthropic is arguably the AI company that has most publicly and consistently advocated for exactly the kind of government oversight that was just used against it.

As recently as the Wednesday before the Friday crackdown, Anthropic had publicly called for greater U.S. oversight of AI, including the ability for the government to block models with unacceptable risks. The company has defined itself, and differentiated itself from OpenAI and Meta, on the principle that safety and capability must advance together. Its "Constitutional AI" methodology, its Responsible Scaling Policy, its cautious tiered release of Mythos: these are not PR gestures. They represent a genuine, institutionally embedded conviction that AI systems require rigorous oversight.

And yet, when oversight arrived, it didn't look like the principled, "transparent, fair, clear, and grounded in technical facts" process Anthropic had advocated for. It looked like a 90-minute deadline, a verbal threat of financial penalties, and an export control directive issued without the evidentiary standard Anthropic had expected the government to apply.

The lesson here is not that Anthropic was wrong to advocate for oversight. The lesson is that advocating for governance is not the same as being prepared for governance. Governance, when it arrives, arrives fast. It arrives imprecisely. It arrives in the form of phone calls with Treasury secretaries and White House cyber directors, not carefully structured regulatory proceedings. Organizations that have built governance into their continuous operating rhythm can respond. Organizations that have treated governance as a policy posture, a set of public commitments, find themselves, as Anthropic did, defending against a government claim they cannot fully rebut because they don't share the same evidentiary baseline with the regulator.

This is the paradox at the heart of AI safety in 2026: the companies most committed to safety may be the least prepared for the governance moment, precisely because they have invested in safety as a technical problem rather than as a continuous, organizational, and institutional discipline.


The Jailbreak as a Governance Failure, Not a Technical One

It is tempting to frame the Fable 5 episode as a story about jailbreaking, about the insufficiency of guardrails, the inevitability of red-teaming failures, the arms race between capability and constraint. That framing misses the deeper issue.

Jailbreaks are not exceptional events. They are predictable features of the landscape. Anthropic itself has acknowledged that "total avoidance of any jailbreaks isn't now possible for them or any other companies". Researchers at the Carnegie Endowment and cybersecurity firms have warned repeatedly that as models approach and exceed "Mythos-class" capability, the probability of exploit discovery scales with the model's power. This is not a flaw in Anthropic's safety process. It is a property of the capability curve itself.

What this means, and what the Fable 5 episode makes viscerally clear, is that deploying an advanced model is not a governance event. It is the beginning of a governance period. The risk profile of a model like Fable 5 does not stabilize after launch. It evolves. Adversaries probe. Red teams discover. The threat surface shifts. The regulatory context changes. A governance framework that front-loads safety assessment into pre-deployment review and then treats the model as "cleared" is structurally inadequate to this reality.

Anthropic did everything right by pre-2026 standards. It worked with U.S. and international government agencies to identify vulnerabilities before Fable's release. It submitted to review by the United Kingdom's AI Security Institute. The government did not object to the model's release in those conversations. And yet, four days after launch, a jailbreak surfaced that neither the company nor the government had caught, and the entire model was pulled.

The standard has changed. The question is no longer "was the model safe at launch?" The question is "do you have the institutional infrastructure to detect, assess, and respond to emerging risks continuously, from launch through the entire deployment lifecycle?" If the answer is no, if your governance framework ends at deployment, then you are not governing an agentic AI system. You are auditing a static artifact and hoping the world doesn't change.


What Continuous Governance Actually Means

This is the moment for specificity. "Continuous governance" risks becoming another piece of enterprise jargon - a phrase that sounds important but means nothing until it's operationalized. The Anthropic case provides a clear framework for what it must include.

Real-time threat monitoring. Governance cannot wait for external actors to surface vulnerabilities. Anthropic had a sophisticated safety apparatus, but it was Amazon's Andy Jassy, a commercial partner with intelligence from its own AWS infrastructure, who first raised the alarm to the White House. Enterprise AI governance must include active, continuous monitoring of how deployed models are being used, probed, and potentially exploited, not just by internal red teams but in actual deployment conditions.

Dynamic risk reassessment. A model's risk profile is not a fixed point. It is a function of capability, deployment context, adversary sophistication, and the ecosystem of other tools available. Fable 5 was described as uniquely dangerous for cybersecurity not because it was uniquely powerful in isolation, but because of how it interacted with the existing landscape of interconnected, legacy-dependent financial and infrastructure systems. Governance frameworks must continuously reassess risk against evolving context, not just against the baseline established at launch.

Pre-negotiated regulator relationships. The frantic 90-minute timeline that preceded the export control directive was not just a government overreach. It was a governance failure on both sides, a symptom of the absence of a pre-established, trusted process for exactly this kind of situation. The Anthropic case produced a three-way failure: the company felt it had provided adequate safety assurances pre-launch; the government felt it had been dismissed when it raised post-launch concerns; and the regulator relationship had already been poisoned by a separate dispute over autonomous weapons. Continuous governance means maintaining live, trusted, and institutionalized relationships with regulators, not negotiating them in real time during a national security crisis.

Board-level accountability infrastructure. Perhaps the most underappreciated dimension of this story is the boardroom dimension. Anthropic is pre-IPO, its governance structures are still being built. But every company deploying advanced AI, at any stage of maturity, must ask: does our board have the AI governance literacy to oversee this? Do we have independent directors who can assess not just the technical risk brief but the regulatory and geopolitical context in which our models are operating? The answer, at most organizations, is no. That is not a technology problem. It is a leadership problem.

Internationally aware sovereignty mapping. The Anthropic episode triggered immediate debate about "AI sovereignty," the ability of nations to maintain access to and control over AI capabilities. British lawmakers called for domestic AI investment; cybersecurity leaders warned that restricting access to Anthropic's models while China continues advancing its own creates asymmetric risk. For enterprises deploying AI globally, continuous governance must include active mapping of jurisdictional exposure, understanding not just where your model is deployed but how your deployment landscape would shift overnight if U.S. export controls were applied.


The Agentic Dimension

The Anthropic episode is not primarily an agentic AI story. Fable 5 and Mythos 5 are powerful models, but they are not fully autonomous agents operating in the world on behalf of enterprises. The governance failure here was at the model layer, not the agent layer.

But the implications for agentic AI are direct and urgent, because the governance challenges at the agent layer are orders of magnitude more complex.

When an enterprise deploys an agentic system, an AI that not only generates outputs but takes actions, makes decisions, triggers workflows, and operates with delegated authority across organizational functions, the risk surface explodes. The number of potential exploit vectors multiplies. The speed at which a compromised agent can cause damage accelerates. The difficulty of monitoring, detecting, and responding to emerging failures increases exponentially.

If Anthropic, with its world-class safety team, its pre-deployment government reviews, its Constitutional AI methodology, and its genuine institutional commitment to responsible deployment, could find itself in a governance crisis four days after launching a carefully safeguarded model, what does that portend for enterprises deploying autonomous agents with less safety infrastructure, less regulatory engagement, and less board-level AI governance literacy?

The answer is not hypothetical. The agentic enterprise is here. Financial institutions are deploying AI agents to execute trades, manage risk, and process compliance workflows. Healthcare systems are deploying agents to triage patients and authorize treatments. Supply chains are running on AI agents that make purchasing, logistics, and vendor selection decisions in real time. These agents are not operating in sandboxes. They are operating in the world.

And yet, in most enterprises, the governance model for these agents looks like what it looked like for Anthropic's pre-2026 model releases: a pre-deployment checklist, an ethics review, a legal sign-off, and then a handoff to operations. Governance as a launch event. Governance as a static artifact.

That model is structurally inadequate. Not because the pre-deployment work doesn't matter, it does. But because the agentic enterprise, by definition, operates in conditions that change continuously. Agents interact with systems, data, and counterparties that evolve. Threat actors probe agent interfaces. Regulatory expectations shift. The enterprise's own risk appetite changes. Continuous governance is not a premium feature for the most safety-conscious organizations. It is the minimum viable infrastructure for any organization operating in the agentic era.


The Ratings Imperative

There is one more dimension of the Anthropic story that deserves attention, because it speaks directly to a structural gap in how AI risk is currently assessed and communicated.

In the aftermath of the Fable 5 export controls, a letter signed by more than 50 cybersecurity leaders at companies including Nvidia and Adobe argued that Anthropic's models were "not uniquely capable" of finding security flaws and weaponizing exploits, and that many rival models offered similar abilities. The government's response, through White House AI adviser David Sacks and Pentagon CIO Kirsten Davies, was that national security must come before revenue cycles and pre-IPO valuations.

Both sides are right. Both sides are also talking past each other. The fundamental problem is the absence of an independent, credible, institutionalized mechanism for assessing and communicating the risk profile of advanced AI models, something analogous to what credit rating agencies do for debt instruments, or what actuarial frameworks do for insurance risk, or what independent verification organizations do for financial statements.

We have no such institution for AI. We have safety teams at labs. We have voluntary government review frameworks. We have cybersecurity researchers who probe models opportunistically. We have government agencies that can impose export controls based on intelligence that the affected company never gets to fully examine. What we do not have is a continuous, independent, transparent rating and verification infrastructure that provides enterprises, investors, regulators, and the public with a shared, credible, evidence-based framework for understanding AI risk.

That gap is not incidental. It is the enabling condition for the kind of crisis that just engulfed Anthropic. When there is no shared evidentiary standard, every governance moment becomes a negotiation, and those negotiations, as Dario Amodei discovered, happen on someone else's timeline, with someone else's intelligence, under conditions that favor decisive government action over measured deliberation.

Building that infrastructure, continuous, independent, ratings-quality governance assessment for AI systems, is not a nice-to-have for the era ahead. It is the foundational architecture of a trustworthy agentic economy. The Anthropic episode did not just reveal a jailbreak in Fable 5. It revealed a jailbreak in the governance architecture itself.


What the Board Must Ask Monday Morning

For enterprise leaders digesting this story, the strategic implications are immediate. Here are the questions that belong on every boardroom agenda:

  1. Who owns continuous AI governance in our organization? Not pre-deployment review. Not annual audits. Who is accountable, on a continuous basis, for the evolving risk profile of our deployed AI systems - including agentic systems?
  2. What is our regulator relationship infrastructure? Do we have pre-established, trusted channels with the regulators who have jurisdiction over our AI deployments? If a government official called today with an AI security concern, would the call go to someone with authority and context - or would it begin a 90-minute scramble?
  3. Does our board have the AI governance literacy to oversee this? Not just technical briefings, but the conceptual framework to ask the right questions about risk, accountability, and escalation - across geographies, across model versions, across deployment contexts?
  4. What is our agentic exposure map? Where are we deploying or planning to deploy AI agents? What is the governance framework for each? What is our incident response protocol if an agent behaves in unexpected ways - not just internally, but in ways that trigger regulatory or national security attention?
  5. Are we building toward independent verification? As AI governance standards mature - and they are maturing fast, driven by exactly these kinds of crises - enterprises that have invested in independent, credible governance infrastructure will have a significant advantage over those that have not. The ratings moment for AI is coming. Are we positioned for it?

Conclusion: The Governance Gap Is the Risk

The Anthropic episode will be remembered as a landmark moment in the history of AI regulation, the first time the U.S. government applied export controls to AI models themselves, rather than to the chips that power them. It will be studied in business schools and policy institutes for years.

But the most important lesson is not about export controls, or jailbreaks, or the geopolitics of AI sovereignty. The most important lesson is about the fundamental inadequacy of episodic governance in a world of continuous AI deployment.

Every organization operating at the AI frontier, whether as a developer, a deployer, or a dependent, is now operating in conditions where the governance moment can arrive without warning, at the speed of a government directive, with consequences that can disable your most critical systems in a matter of hours. The organizations that will navigate this era are not the ones with the best pre-launch safety reviews. They are the ones that have built continuous governance into the operating rhythm of the enterprise itself, monitoring, assessing, responding, and communicating in real time, across the full lifecycle of every AI system they deploy.

The off switch, as it turns out, always belonged to someone. The question is whether you have the governance infrastructure to make sure it is never used on you without warning, or whether, like Anthropic, you discover that truth on a Friday night.

More from Alpha Editorial Board

See all