A week for posting incomplete, scrappy, or otherwise draft-y posts. Read more

Hide table of contents

Epistemic status: high confidence on the structural asymmetry argument; moderate confidence on the five-level cascade as a causal hypothesis rather than a prediction; low confidence on implementability of proposed interventions within described timelines. Key uncertainty: the compute governance / open weights tension is underdeveloped.

 

TL;DR

Generative AI creates a structural asymmetry: producing plausible content now costs a few cents, while verification still demands scarce human attention. This post maps how that asymmetry can cascade — from interpersonal distrust through institutional and national failure to existential paralysis — and proposes architectural interventions (provenance standards and compute governance) that may still keep the window open before open weights close it for good.

 

1. Three Pillars of the Capacity for Collective Action

Epistemic basis: a shared picture of reality. We must agree on basic facts — that a specific AI model has dangerous capabilities, that a virus exists, that a treaty has been violated. Without this, mobilizing a common response to a common threat is impossible. You cannot make an agreement about something whose existence you cannot agree upon. Generative AI undermines this basis, reducing the cost of fabricating plausible alternative facts by orders of magnitude — from hundreds of dollars to cents.

Normative basis: minimally compatible goals. Even agreeing on facts, people may disagree on what counts as an acceptable response. Generative AI, by personalizing normative frameworks through value-aligned chatbots and targeted persuasive communication, fragments moral consensus. An important caveat: AI may not create these rifts from scratch, but merely expose deep contradictions previously suppressed by mainstream media. For our thesis, the crucial point is that AI massively accelerates fragmentation regardless of its root cause.

Institutional basis: trust in procedures. Coordination requires mechanisms that fix obligations and resolve disputes. Courts, regulators, electoral commissions, international verification bodies — all function only to the extent that parties trust the procedure. When any content can be AI-generated, not just trust in specific institutions is undermined, but the very concept of accountability. Who is responsible for reputational damage from a deepfake? You cannot put an algorithm in prison. When we cannot reliably answer "who did this and when?", accountability mechanisms collapse — and with them, institutional legitimacy.

Generative AI acts as a simultaneous accelerator of the decay of all three pillars. By reducing the cost of creating plausible but false content, it makes fact-checking, goal alignment, and maintaining institutional trust cognitively and economically unviable. In the world of open weights, this asymmetry becomes absolute: marginal generation costs fall to practically the cost of electricity, while verification still requires human attention. Coordination is only possible where verification costs are comparable to production costs. We are already moving further from this regime. The experience of the Biological Weapons Convention shows that without natural choke points, such regimes remain on paper. AI risks following this path unless we find a way to make verification technically achievable.

 

2. The De-Coordination Cascade: A Working Causal Hypothesis

This is a working hypothesis about how local failures can cascade into global paralysis. Not a prediction, but a map of vulnerabilities. For each transition, I specify conditions under which it fails.

Cascade levels: 1. Interpersonal → 2. Institutional → 3. National → 4. International → 5. Existential paralysis.

 

1 → 2. From Interpersonal to Institutional

Mechanism. Personalized AI feeds and assistants create divergent pictures of reality faster than social correction (conversation, shared experience) can synchronize them. Institutions (courts, media, regulators) exist partly to arbitrate such disputes. As interpersonal divergence grows, the demand for institutional arbitration outstrips institutional capacity.

Failure condition. Institutions manage to increase capacity (or reduce demand through preventive norms) — the cascade stops.


2 → 3. From Institutional to National

Mechanism. Institutions fail to cope with verification in critical windows. Example: deepfake audio of a candidate 48 hours before elections in Slovakia (September 2023).[1]

The legal context is crucial: the failure occurred not merely from the absence of a protocol, but because an existing protocol — a legislative moratorium on election coverage for the final 48 hours ("day of silence") — created an ideal window of vulnerability. Traditional media could not legally publish refutations, while malicious actors could freely spread deepfakes on social media and messengers where the moratorium did not apply. The election commission and platforms failed to coordinate during the period when verification could still have influenced the outcome.

Public institutional failure erodes trust, and eroding trust creates demand for alternative sources of truth (less accountable and more vulnerable to capture). In the world of open weights, such attacks are cheap, and attribution is nearly impossible — institutions find themselves in a state of permanent crisis.

Failure condition. Institutional failures remain rare exceptions, not a pattern. Or — institutions manage to revise outdated protocols (like pre-election moratoria) before they become systemic breaches.


3 → 4. From National to International

Mechanism. Within countries, epistemic segregation forms — groups with non-overlapping standards of evidence. International treaties require domestic support for ratification and compliance (two-level games[2]). Without minimal consensus on basic facts (does the threat exist? is the other party complying?), a government cannot reliably undertake international commitments. Any foreign policy concessions will be immediately sabotaged by domestic political forces who, within their own information reality, perceive these steps as unjustified capitulation to hostile forces.

Failure condition. Authoritarian regimes can comply with treaties without regard for domestic consensus. For democracies, this constraint operates much more strongly.


4 → 5. From International to Existential

Three independent mechanisms. Each is sufficient for the transition.

A. The Security Dilemma[3], adapted for AI.

In classical international relations theory, the dilemma arises when one state's measures to enhance its own security are perceived by others as a direct threat. In the case of AI, this mechanism is pushed to its limit.

The problem is not the "absence" of balance, but its radical shift towards offense-dominance: the cost of generating persuasive noise, deepfakes, or malicious code is orders of magnitude lower than the cost of their verification and neutralization. The situation is compounded by zero distinguishability: laboratory reports on safety (alignment) successes are technically indistinguishable to an external observer from reports on the growth of raw capabilities.

Under conditions where open weights make technical audit impossible, and any signal of "peaceful intentions" can be easily fabricated, worst-case planning becomes the only rational strategy. The AI race is driven not merely by greed, but by the fear of ultimate civilizational defeat in an environment of complete adversary opacity.

B. Political Window Compression. International agreements require domestic concessions. In epistemically fragmented societies (transition 3→4), these concessions are easily framed as capitulation. Precedent: the failure of the BWC Verification Protocol in 2001. Pharmaceutical lobbies and defense critics in the US successfully reframed inspections as a threat to commercial secrets. The protocol never entered into force. AI cheapens the production of such polarizing narratives, allowing them to be personalized for different audiences. The BWC problem returns — in digital form and at amplified scale.

C. Consumption of the Temporal Resource (conditional). If a threshold exists beyond which control over AI becomes dramatically harder (e.g., due to the speed of self-modification or the impossibility of verifying open weights), then time to that threshold is a critical resource. Coordination collapse consumes this resource on epistemic disputes instead of developing governance mechanisms.


General failure condition for transition 4→5. Alignment is solved before collapse, or verification technologies outpace fabrication.


Why This Schema Matters

No transition is inevitable — each has failure conditions. The problem is that currently, all five levels are moving in the same direction simultaneously. And the same technology that accelerates the deadline complicates the negotiations on how to avoid it. This is the meta-risk: coordination collapse blocks our capacity to respond to all other threats at once.


3. Conceptual Model: The Feedback Loop

The driving force of the cascade is a self-sustaining cycle, spinning at each level:

The feedback loop (schematic). Cheap generation raises verification costs → verification costs erode institutional trust → eroded trust creates demand for alternative narratives → demand incentivises noise production → more noise raises verification costs further. Each cycle makes coordination relatively harder and defection relatively cheaper.

  1. Generative AI increases verification costs. Content volume and sophistication grow faster than institutional verification capacity.
  2. Rising verification costs reduce trust in signals. This is not irrational panic but a calibrated response: when distinguishing truth from falsehood is too expensive, distrusting everything becomes rational.
  3. Declining trust increases incentives to generate noise. If truth is indistinguishable from falsehood anyway, and marginal content production costs approach zero, defection (flooding the environment with noise) becomes the dominant strategy.
  4. Under mutual distrust, creating indistinguishability is rational. Why persuade when you can simply make verification impossible? This closes the loop: verification costs rise again.

Why Open Weights Make the Cycle Harder to Break

In the world of closed APIs, this cycle has a natural limiter: providers can moderate content, restrict access, and implement watermarks. Open weights remove this limiter.

When model weights are published: generation costs fall practically to the cost of running code (electricity + hardware amortization); attribution becomes fundamentally impossible (the user controls the entire stack); provenance verification fails (metadata is removed in seconds).

In this regime, the cycle ceases to be a theoretical construct. It becomes a structural property of the environment: verification costs always exceed production costs, and this gap only widens. The experience of the Biological Weapons Convention shows that without natural choke points, such regimes remain on paper. Open weights create an analogous problem for AI — but in a digital environment with orders of magnitude higher speed of propagation.


Nonlinearity and Thresholds

The cycle is dangerous not in itself, but in its nonlinearity. A threshold may exist beyond which it becomes self-sustaining regardless of external intervention. If institutions fail to keep pace with content generation in the early stages, the window for creating verification infrastructure may close forever.

Approximate survival condition. The coordination capacity required to agree on and implement governance mechanisms (C_governance) must remain below the threshold at which AI capabilities become critical (C_capabilities). Coordination collapse reduces the left side — we lose the ability to negotiate because we cannot even agree on facts. Advanced capabilities shrink the right side — the threshold approaches faster. The meta-risk is that these two dynamics are linked: the same technology that accelerates the deadline undermines our capacity to coordinate.


Implications

If the cycle is real, standard epistemic solutions (fact-checking, media literacy, watermarks) are swimming against the current. They address consequences, not causes. We need not ways to distinguish truth from falsehood, but ways to break the cycle — to make verification cheaper than production, or to make defection individually unprofitable.


4. What To Do: Architecture, Not Moderation

Based on the cascade and the loop: solutions must change incentives, not impose truth. The key insight from Jervis is that the security dilemma is mitigated not by convincing actors of good intentions, but by altering the environment to make offensive and defensive actions distinguishable. Historically, this was achieved through verification regimes.

Intervention levels: 1→2, 2→3, 3→4, 4→5.


1→2. Provenance Standards

Goal. Basic facts about content origin should not be disputable.

Tool. Cryptographic signatures at the device level. C2PA (c2pa.org) — a standard already being implemented in cameras and software.

Limitations. Systems are vulnerable to compromise at the manufacturing level. The "analog hole": even perfect cryptography doesn't solve the problem of fabricated content simply being re-filmed from a screen by a camera with a genuine signature. A deepfake displayed on a screen and recorded by a smartphone receives cryptographic "authenticity" credentials. Does not address content created before standard implementation.

But it's a shift: provenance verification doesn't require an arbiter of truth — only a cryptographically reliable metadata tag. Even with the analog hole, provenance standards narrow the space for manipulation, even if they don't close it entirely.


2→3. Pre-negotiated Protocols

Goal. Institutions must be able to act within critical windows.

Tool. Verification protocols agreed upon before a crisis, with guaranteed response windows and funded pre-crisis. The Slovak case was a failure due to the absence of such a protocol, not technological failure.

Limitation. Protocols require prior political agreement, which itself falls victim to epistemic fragmentation.


3→4. Compute Governance and the BWC Lesson

Goal. Make defection visible and compliance verifiable.

Tool. Compute governance: access to resources for training frontier models requires participation in monitoring systems. This makes compliance individually rational.

Two historical precedents:

IAEA (success) — natural choke points: fissile material production. Physics worked for verification. The Additional Protocol expanded inspections, but the basis remained material choke points. Safeguards reports show how technical verification creates space for trust.

BWC (failure) — dual-use dilemma: laboratories indistinguishable from research centers. Verification proved practically impossible.

AI is closer to the biological case. But there's a difference: the digital footprint. If next-generation chips are designed with built-in computational accounting modules (analogous to TPM 3.0 for training), the open-weights world could become a world where every significant computation leaves a verifiable signature.

Limitations. Requires global agreement on hardware standards — unlikely short-term under US-China competition. Hardware provenance on new chips won't help with models released before standard implementation (pre-verification weights).


4→5. Derivative Effect

No direct intervention here — only the effect of success at levels 1–3. If verification infrastructure works, if pre-negotiated protocols exist, if compute governance creates compliance incentives — the window for politically feasible governance of advanced AI stays open longer. As with the IAEA, where technical verification created space for political agreements, here the infrastructure of distinguishability can delay the security dilemma.

 

The Central Constraint: Open Weights

All described interventions work in a world where AI remains a service (API access, provider control). Compute governance, C2PA, and pre-negotiated protocols assume a centralized point of enforcement.

Open weights undermine this assumption. Once weights are published, the user controls the entire stack: provenance metadata is removed in seconds, filters are bypassed. Regulation shifts to controlling weight distribution — a fight against digital piracy with existential stakes. If a powerful open-weights model spreads widely before verification infrastructure is established, the loop may close irreversibly in the open segment.


Questions for Red-Teaming

I'm particularly uncertain about the 3→4 transition and the compute governance feasibility. Pushback welcome.

  • From GovAI researchers: is a compute governance regime feasible under conditions of global competition?
  • From game theorists: is the security dilemma model correct as applied to the AI arms race?
  • From experts on international verification mechanisms: does the IAEA analogy help, and where is it misleading (especially considering the BWC's lessons)?
  • Can success at lower levels compensate for the lack of direct control at the highest level, or does the security dilemma (Mechanism A) require a separate arms control regime (analogous to the NPT)?
  • Are there measurable metrics of progress (e.g., share of content with C2PA signatures, compute governance coverage) that could calibrate optimism?

What This Means

No intervention requires a world government. All of them change incentive architectures. And all require agreement on procedures before a crisis — not on facts during it. Cryptography does not replace politics — it shifts the point of application of political will: from disputes about what is true to disputes about what verification infrastructure to build.

Will we build this infrastructure before open weights render it meaningless?

 

Note: posting from a temporary account while recovering access to my main account.


  1. AI Incident Database, Incident #573. Deepfake audio of Slovak presidential candidate Michal Šimečka, circulated 48 hours before the September 2023 election. https://incidentdatabase.ai/cite/573 ↩︎
  2. Putnam, R. (1988). Diplomacy and Domestic Politics: The Logic of Two-Level Games. International Organization, 42(3), 427–460. ↩︎
  3. Jervis, R. (1978). Cooperation Under the Security Dilemma. World Politics, 30(2), 167–21 

1

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities