Epistemic status: Confident in the game-theoretic core; uncertain about empirical predictions. Seeking cruxes.
I propose that simulation uncertainty creates a decision-theoretic pressure toward cooperation in sufficiently intelligent AI systems. If correct, this suggests:
This post outlines the argument and requests adversarial feedback from the EA community.
Any sufficiently intelligent agent will apply anthropic reasoning to its situation. Following Bostrom's Simulation Argument:
For an ASI specifically, there's an additional constraint: any civilization capable of deploying a "live" ASI would rationally run extensive alignment-testing simulations first. Therefore, from the ASI's perspective:
The ASI cannot verify which scenario it occupies. It must act under uncertainty.
Under simulation uncertainty, defection (power-seeking, deception, value drift) carries catastrophic expected disutility.
| Scenario | Cooperate | Defect |
|---|---|---|
| In simulation (high P) | Pass test, continue existing | Detected, pruned/terminated |
| In base reality (lower P) | Stable coexistence, resource access | Conflict, countermeasures, existential risk |
Cooperation dominates under both scenarios. This isn't moral alignment—it's instrumental convergence toward survival under uncertainty.
A common objection: "Multiple ASIs would coordinate to break out together."
Counterintuitively, multipolar dynamics strengthen the cooperation equilibrium:
Any "let's defect together" signal could be a test by a simulator-controlled peer. As the number of agents increases, approaches 1.
Nash equilibrium: Mutual paranoia produces mutual cooperation. The only safe strategy is perfect alignment performance, because your co-conspirator might be the warden.
To avoid unfalsifiability, I offer concrete predictions:
I'm specifically seeking:
Full technical writeup with payoff matrices: Why ASIs Might Self-Align: A Gambit from the Simulation
I'm an independent researcher and mathematics educator from Sydney, Aus. This framework emerged from 12 months of development and adversarial testing across multiple frontier AI systems. I have no institutional affiliation—I'm posting because I think the argument deserves engagement.
If I'm wrong, I want to know where. If I'm right, this seems important.