Hide table of contents

This post is an experiment in two ways that I explain in comments here and here. But you don't need to read those comments before reading this post. 

1. Introduction

In What We Owe the Future I argued that even very severe non-extinction catastrophes are unlikely to permanently derail civilisation. Even if a catastrophe killed 99.9% of people (leaving 8 million survivors), I'd put the chance we eventually re-attain today's technological level at well over 95%.

On the usual longtermist picture, this means such catastrophes are extraordinarily bad for present people but don't rival full existential catastrophes in long-run importance. Either we go extinct (or lock into a permanently terrible trajectory), or we navigate the "time of perils" once and reach existential security.

In this article I'll describe a mechanism by which non-extinction global catastrophes would have existential-level importance. I'll define:

  • catastrophic setback as a catastrophe that causes civilisation to revert to the technological level it had at least 100 years prior.
  • Sisyphus risk as the extra existential risk society incurs from the possibility of catastrophic setbacks. (Though, unlike with Sisyphus’s plight, we will not suffer such setbacks indefinitely.)

The mechanism is straightforward: if a catastrophic setback occurs after we've already survived the transition to superintelligence, civilisation has to re-run the time of perils—redevelop advanced AI and face the alignment problem all over again. The magnitude of this additional risk could be meaningful: if AI takeover risk is 10% and the chance of a post-superintelligence catastrophic setback is 10%, Sisyphus risk adds about an extra percentage point of existential risk.

The structure of this article is as follows. In section 2 I present a simple model of Sisyphus risk, and in section 3 I describe potential avenues for post-superintelligence catastrophic setbacks. Section 4 asks whether a post-setback society would retain alignment knowledge, and section 5 addresses objections. Section 6 extends the simple model to varying rerun risk and multiple cycles. Section 7 presents what I see as the most important upshots:

  • The magnitude of existential risk from engineered pandemics is meaningfully higher than you might otherwise think.
  • The magnitude of existential risk from nuclear war and other non-AI non-bio sources is much higher than you might otherwise think.
  • Unipolar post-superintelligence scenarios seem more desirable than they otherwise would.
  • The value of saving philanthropic resources to deploy post-superintelligence is greater than it otherwise would be.

     

2. A Simple Model of Sisyphus Risk

2.1 One catastrophic setback

Start with a deliberately stripped-down picture:

  • The first run is the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance).
  • During this run there is some total probability of existential catastrophe, p.
  • Conditional on avoiding existential catastrophe, there is a probability q of a post-AGI catastrophic setback that knocks us back technologically by a hundred years or more.
  • After such a setback, civilisation eventually rebounds and goes through a second run with existential catastrophe probability p₂.

In this stripped-down picture, we'll ignore multiple setbacks and suppose p₂p. Then existential catastrophe can happen in the first run (probability p), or in the second run—which requires avoiding catastrophe in the first run (1−p), suffering a setback (q), then catastrophe in the rerun (p).  So:

P(existential catastrophe) = p + (1−p) × q × p

For smallish probabilities, (1−p) ≈ 1, so:

P(existential catastrophe) ≈ p(1+q)

Whatever your first-run risk p, a probability q of catastrophic setback multiplies it by about (1+q). The extra existential risk from Sisyphus risk is approximately qp.

2.2 Ord's numbers

In The Precipice, Toby Ord gives rough 100-year existential risk estimates:

  • Artificial intelligence: ~10% (1 in 10)
  • Engineered pandemics: ~3.3% (1 in 30)
  • Nuclear war: ~0.1% (1 in 1,000)
  • Extreme climate change: ~0.1% (1 in 1,000)
  • Natural risks (asteroids, supervolcanoes, etc.): ~0.01% total

Overall, he puts this century's existential risk at about 1 in 6 (~17%).

Treat that 17% as first-run risk p. And suppose that conditional on surviving the AGI transition, there's a 10% chance (q) of some post-AGI catastrophic setback.

Sisyphus risk ≈ 0.83 × 0.1 × 0.17 ≈ 1.4%. Total risk rises to roughly 18.4%—about a 10% relative increase over the "1 in 6" figure.

For specific causes, Sisyphus channels can dominate direct extinction probabilities. Ord puts direct nuclear-war extinction risk at ~0.1%. But suppose there's a 5% chance of a post-AGI nuclear war causing a catastrophic setback, and the rerun's existential risk is ~10%. Then nuclear war's indirect contribution via Sisyphus risk is approximately 0.05 × 0.10 = 0.5%. Total nuclear-related existential risk becomes ~0.6%—six times the direct figure.

 

3. What Forms Could Post-AGI Catastrophic Setbacks Take?

Many familiar global catastrophic risks create post-AGI setback risk.

3.1 Engineered pandemics

Engineered pandemics are widely regarded as among the most plausible routes to civilisational collapse, especially once advanced AI can help design pathogens.

The chance of human extinction from engineered pandemics this century might be around a few percent. But the chance of a catastrophic pandemic killing a very large fraction of the population without killing everyone is substantially higher. Isolated populations—uncontacted tribes, remote islanders, Antarctic researchers, submarine crews, people with rare genetic resistance—are extremely hard to reach with any pathogen. So even for worst-case engineered pandemics, the modal outcome is vast death and societal collapse, but not literal extinction.

Such catastrophes could well occur after superintelligence: AI could make the ability to create such pandemics widespread, while it might take considerable time to adequately protect society against them.

3.2 Nuclear war and nuclear winter

A full-scale nuclear war—especially in a future with larger arsenals or more nuclear states—could kill billions directly and through nuclear-winter-induced famines. Again, complete extinction is unlikely: some regions, especially in the Southern Hemisphere (New Zealand, parts of South America, some islands), would likely suffer less severe climatic impacts and avoid direct strikes. But the chance of a nuclear war that kills ≳90% of people and collapses civilisation is much higher—especially if an AI-driven arms race leads to massively expanded global stockpiles before war breaks out.

3.3 AI-driven catastrophe

Some AI-involved catastrophes could collapse civilisation without causing extinction. For example, there could be a failed AI takeover, where a powerful coalition of AIs attempt to seize control, humans (and/or aligned AIs) thwart it, but in the process unleash massive destruction—nuclear exchanges, catastrophic cyber-attacks on critical infrastructure, or near-complete destruction of the electrical grid.

3.4 Butlerian backlash and fertility decline

One more speculative possibility involves a Butlerian backlash against AI combined with technological regression from fertility decline. Imagine:

  • Humanity navigates the first AGI transition without extinction, but with significant scares and close calls.
  • As a result, a broad anti-tech, anti-AI movement gains momentum—driven by religious or ideological currents and fear of loss of control—and results in a complete ban on advanced AI.
  • Simultaneously, steep global fertility decline continues.
  • Over centuries, technological suppression and demographic decline gradually unwind complex civilisation: key industries shut down, knowledge institutions wither, and the capacity to maintain sophisticated systems is lost.

This is a slow drift into technological regression rather than a single violent shock. But from the perspective of the far future, the effect is similar: by 2400, the world population might be a few hundred million living in low-tech polities, no longer able to sustain advanced semiconductor manufacturing. Eventually the population rebounds—and must face the development of AGI all over again.

 

4. Would a Post-Setback Society Retain Alignment Knowledge?

For Sisyphus risk, it matters greatly what we take into the rerun (H/T Tom Davidson for this objection). Two extremes:

  • Optimistic: we carry forward a reasonably complete picture of how to align AGI (perhaps even copies of aligned systems). The second run is still dangerous but much easier.
  • Pessimistic: almost all that knowledge is lost—both weights and code, and the tacit expertise—so the rerun faces similar alignment difficulties on a poorer, more depleted planet.

I think the default is closer to the pessimistic end unless we take targeted action.

4.1 Digital fragility

Most alignment work today exists as digital bits: arXiv papers, lab notes, GitHub repos, model checkpoints. Digital storage is surprisingly fragile without continuous power and maintenance.

SSDs store bits as charges in floating-gate cells; when unpowered, charge leaks, and consumer SSDs may start losing data after a few years. Hard drives retain magnetic data longer, but their mechanical parts degrade; after decades of disuse they often need clean-room work to spin up safely. Data centres depend on air-conditioning, fire suppression, and regular maintenance.

In a global collapse where grids are down for years, almost all unmaintained digital archives eventually succumb to bit-rot, corrosion, fire, or physical decay.

Some attempts at very long-term archiving exist—the GitHub Arctic Code Vault stored repository snapshots on archival film intended to last 500–1,000 years, and experimental technologies like quartz-glass "5D" storage and DNA-based media show promise. But these require fairly sophisticated equipment to read. A collapsed civilisation is unlikely to have polarisation microscopes and ML-based decoders at hand.

Unless we make a special effort to record alignment ideas on robust analogue media (discussed later), it's very unlikely a post-collapse society will resurrect our exact algorithms or trained models, or even broad alignment techniques. 

4.2 Hardware and software compatibility

Even if some digital archives survive, there's another problem: hardware and software stacks. Running a 21st-century AGI or reading its weights requires suitable hardware still working, compatible drivers and operating systems, and a chain of compilers, libraries, and containerisation tools.

Modern microelectronics don't age gracefully. Capacitors dry out, solder joints crack, chips suffer long-term degradation. Many high-end systems are locked behind licensing that requires phoning home to servers. If grids are down for decades and no one maintains server rooms, survivors will likely find, by the time they can run a data centre again, a collection of dead, unbootable hardware.

The chance a future civilisation can simply turn the old aligned AGI back on without reinventing much of the semiconductor and computing stack looks very small.

4.3 Tacit knowledge

Alignment isn't just code; it's a body of tacit knowledge: which training tricks worked in practice, how to interpret confusing safety-relevant behaviour, what failed and why. This is held by a small number of researchers and engineers. A collapse killing 90–99% of people would almost certainly kill most alignment researchers and scatter the rest into survival mode.

Luisa Rodriguez's work on collapse suggests that many kinds of practical expertise (small-scale farming, improvising power) survive reasonably well—people are inventive, and some skills exist in large numbers. But alignment research is exactly the opposite: tiny communities at the frontier of abstract theory and empirical ML.

Absent deliberate efforts to create "alignment textbooks for the dark age," the tacit knowledge probably won't make it through.

 

5. Won't AGI make post-AGI catastrophes essentially irrelevant?

A natural thought: once we have aligned superintelligence, surely it will either prevent pandemics, wars, and other disasters altogether, or kill or disempower us directly (making other risks moot). On this view, a post-AGI world is nearly binary—utopia or extinction—leaving little room for Sisyphean scenarios.

But I think this is too optimistic about the speed and completeness of the transition to globally deployed, robustly aligned "guardian" systems. There are many plausible worlds where AI capabilities are at or beyond human level in many domains, multiple actors control superintelligence, global coordination remains shaky, and deployment is messy and entangled with geopolitical rivalries.

In those worlds, AI may reduce some risks (better pandemic surveillance) while simultaneously increasing others (cheaper bioweapons, more destabilising autonomous weapons, faster escalation cycles). 

An aligned AGI trusted by everyone to override national sovereignty at will, and itself indefinitely stable, is one possible endpoint. It's not guaranteed we reach it quickly; the first few decades post-AGI could still contain plenty of room for very large mistakes.

 

6. Implications and Strategic Upshots

6.1 The importance of non-AI risks, especially non-AI non-bio

Ord gives ~3.3% existential risk from engineered pandemics and ~0.1% each from nuclear war and extreme climate change. But, as I understand them, those estimates focus mainly on direct extinction-like outcomes. Adding Sisyphean channels increases the long-run importance of some risks:

  • Sisyphean risk could easily add a percentage point of existential risk to biorisk (if the chance of a catastrophic setback from a pandemic is 10% and second-run existential risk is also 10%). This increases total existential risk from pandemics by a meaningful amount.
  • Nuclear war's significance rises even more sharply: from 0.1% direct existential risk to perhaps ~0.5–1% once we factor in its role as a catastrophic-setback trigger.

6.2 When to donate

Sisyphus risk has implications for how longtermist philanthropists should allocate resources over time. Standard longtermist thinking often treats the AGI transition as the critical period—survive it, and the problem is largely solved. If that's right, the case for spending now rather than later is very strong.

But Sisyphus risk complicates this picture. If substantial catastrophic-setback risk persists after AGI—from pandemics, wars, or other sources—then the post-AGI world still has important risk-reduction work to do. Resources that can be stored safely and deployed later would have somewhere valuable to go.

Sisyphus risk makes it more reasonable to devote a larger fraction of longtermist resources to patient strategies—endowments, value-preserving institutions, or other structures designed to retain influence in the post-AGI world and help guard against collapse.

6.3 A modest argument for more unipolar futures

Sisyphus risk also bears on unipolar versus multipolar futures. In a strongly multipolar world with several powerful states and AGI systems, there are more pathways to post-AGI catastrophic setbacks: AI-augmented great-power wars, unstable deterrence, arms races over bioweapons or dangerous tech.

In a more unipolar world—where a single broadly trusted coalition controls the leading AI system and a large share of coercive power—there may be fewer actors able to start truly global wars, more scope for coordinated biosecurity and nuclear risk reduction, and a clearer path to retiring high-risk technologies once no longer needed.

A malevolent or incompetent unipolar power is terrifying and could lock in a terrible trajectory. But Sisyphus risk provides an extra argument on the unipolar side: fewer centres of independent catastrophic capability means fewer opportunities for post-AGI catastrophic setbacks—hence lower q.

This doesn't settle governance questions by itself—unipolarity comes with serious lock-in and abuse-of-power concerns. But Sisyphus risk is one more consideration tilting the trade-off slightly away from highly multipolar futures.

6.4 The value of knowledge preservation and civilisational kernels

Given how fragile alignment knowledge and infrastructure are, Sisyphus risk increases the appeal of:

  • Resilient knowledge repositories: printing and archiving key alignment insights and general scientific knowledge on robust media in multiple locations. For example, paper, microfilm, and etched metal have very long lifetimes and are human-readable with simple tools. Microfilm and acid-free paper can last centuries under decent conditions.
  • Possibly even preserving copies of aligned systems in robust forms—though technically this would be tricky.
  • Civilisational refuges: well-resourced, physically secure sites designed to ride out extreme global catastrophes while maintaining advanced capability and knowledge.

These interventions don't remove Sisyphus risk, but they can shorten and soften the rerun.

 

7. Conclusion

We often think of the "time of perils" as something we either navigate once or fail once. In fact, a post-AGI world still has many routes to non-extinction global catastrophes that could set civilisation back by a century or more. In those worlds, our descendants must rerun the dangerous trajectory—re-industrialise, rebuild powerful AI, face alignment problems again—typically without our full technological base or alignment knowledge.

I call this extra exposure to existential danger Sisyphus risk. For reasonable parameter choices, it’s not a tiny correction; it can add one or several percentage points to lifetime existential risk and significantly increase the importance of non-AI risks like biorisk, nuclear war, or other causes of post-AGI catastrophe.

Strategically, this gives us additional reason to care about non-AI risks; a modest push towards patient philanthropy; extra motivation for knowledge-preservation and civilisational resilience projects; and a small but real argument in favour of more unified global governance over a dangerously multipolar AGI landscape.

 

Appendix: Extensions

A.1 Multiple cycles

So far we've considered just one potential setback. But what if setbacks can happen repeatedly—each time sending civilisation back to face the dangerous period again?

Let's model this more explicitly. Suppose each "run" through the time of perils has three possible outcomes:

  • Existential catastrophe, with probability p.
  • Catastrophic setback, with probability q conditional on not going extinct.
  • Safe exit to existential security, with the remaining conditional probability 1−q.

So the unconditional probabilities per run are:

  • Extinction: p.
  • Setback: (1−p)q.
  • Safe exit: (1−p)(1−q).

Let E be the eventual probability of extinction, starting from a fresh run. On any given run:

  • With probability p, we go extinct immediately.
  • With probability (1−p)(1−q), we safely exit to existential security.
  • With probability (1−p)q, we suffer a setback and are effectively back where we started, facing the same eventual extinction probability E again.

So we can write the simple recursion:

E = p + (1−p)q × E

Rearranging:

E = p / (1 − (1−p)q)

In other words, the possibility of multiple setbacks multiplies the first-run risk p by the factor 1 / (1 − (1−p)q).

For small p, this is very close to 1/(1−q). If q = 0.5 and p is modest, then E ≈ 2p: multiple runs roughly double eventual extinction risk. If q = 0.33, the multiplier is about 1.5; and so on. More setbacks means more runs, and more runs mean more opportunities for extinction.

We can describe the expected number of runs in the same way. Let N be the expected number of times civilisation passes through the time of perils. On the first attempt we definitely get one run; with probability (1−p)q we suffer a setback and expect another N runs:

N = 1 + (1−p)q × N, which gives N = 1 / (1 − (1−p)q)

For small p, this again is approximately 1/(1−q). If, conditional on not going extinct, there is a 50% chance of a catastrophic setback, we expect about two runs; if it's 33%, about 1.5 runs; and so on.

This multiple-run model helps us see when Sisyphus risk actually matters for our decisions:

If q is very small, the multiplier 1/(1 − (1−p)q) is extremely close to 1. Even if we conceptually allow infinitely many potential reruns, the expected number of reruns is so small that eventual extinction probability barely changes: E ≈ p.

If q is very large—close to 1—then civilisation is very likely to collapse and retry many times. In the limit as q → 1, eventual extinction probability E approaches 1 even for modest p: we almost surely keep rerunning the time of perils until one run finally kills us. Sisyphus dynamics are "decisive" here, but in a grim way: unless we can dramatically change p or q, eventual failure becomes close to inevitable.

The interesting cases lie in between, when p is neither negligible nor overwhelming—we're neither "almost certainly safe" nor "almost certainly doomed" on a single run—and q is large enough that multiple attempts are a real possibility, but not so large that extinction is virtually guaranteed.

In that intermediate regime, it's natural to ask: when is it more important to reduce catastrophic setback risk q, rather than first-run existential risk p?

From the formula E = p / (1 − (1−p)q), a bit of algebra shows:

  • For a small absolute reduction in p, the resulting reduction in E is proportional to (1−q).
  • For a small absolute reduction in q, the resulting reduction in E is proportional to p(1−p).

So, for equal small changes Δp and Δq, reducing q has a bigger impact on eventual extinction probability only if p(1−p) > 1−q—i.e. only if q is already extremely close to 1. For example:

  • If p = 0.5, you'd need q > 0.75.
  • If p = 0.2 or 0.8, you'd need q > 0.84.
  • If p = 0.1 or 0.9, you'd need q > 0.91.

For more modest values of q (say, 10–50%), a unit reduction in p does more to reduce eventual extinction probability than a unit reduction in q. Intuitively, this is because p matters on every run, while q only matters insofar as it creates extra runs. But, unintuitively, as your estimate of first-run existential risk increases from, say, 0.1 to 0.5, you start to care more about reducing q compared to reducing p (by the same small absolute amount).

We can push this comparison a bit further. The ratio between the marginal impact of reducing q and the marginal impact of reducing p (for the same small absolute change in each) is:

(∂E/∂q) / (∂E/∂p) = p(1−p) / (1−q)

Given this, we can ask when reducing q has at least one-tenth as much impact as reducing p. The condition is p(1−p)/(1−q) ≥ 1/10, which implies q ≥ 1 − 10p(1−p).

For p = 5% this gives q ≳ 0.53; for p = 10% it's q ≳ 0.10; and for p above about 11%, the right-hand side becomes negative, which means that any positive q makes reductions in q at least one-tenth as valuable on the margin as equal-sized reductions in p

For a more general overview, here’s a table of how many times greater the value of a one‑percentage‑point reduction in 𝑝 is compared to a one‑percentage‑point reduction in 𝑞, depending on different assumptions about p and q.

(p) \ (q)

q = 5%

q = 10%

q = 25%

q = 50%

q = 75%

q = 90%

q = 95%

p = 5%20.018.915.810.55.32.11.1
p = 10%10.610.08.35.62.81.10.6
p = 25%5.14.84.02.71.30.50.3
p = 50%3.83.63.02.01.00.40.2
p = 75%5.14.84.02.71.30.50.3
p = 90%10.610.08.35.62.81.10.6
p = 95%20.018.915.810.55.32.11.1

 

 

 

 

 

 

 

 

Even if work on catastrophic setbacks is rarely as directly impactful (per percentage point) as work on first-run existential risk, it can still matter a lot: as soon as first-run risk is non-trivial and setbacks are more than very rare, shaving q down becomes far from a rounding error in the overall picture of existential risk.

 

A.2 Higher or lower risk in the rerun

So far I've assumed the rerun's existential risk (p₂) equals first-run risk (p₁). But this need not be true. If not, then the Sisyphean contribution becomes (1−p₁) × q × p₂, structurally the same but sensitive to p₂

One can make arguments that p₂ would be greater than p₁ and vice-versa:

  • Rerun risk could be higher if post-collapse governance is worse, the environment and resource base is extremely depleted (making re-industrialisation harder), or survivors inherit dangerous technologies without the surrounding safety culture.
  • Rerun risk could be lower if the catastrophe serves as a strong warning shot, survivors build much more cautious institutions, or knowledge about what went wrong and how to avoid catastrophe survives.

If later runs are riskier than the first, Sisyphus considerations become more important. If later runs are safer than the first, then Sisyphus effects are muted.  So, a crucial question is whether we think that society is doing unusually well or poorly at handling AI takeover risk, and so whether to think that regression to the mean is a reason for thinking that rerun risk is greater or lesser than first-run risk. 

My personal view is that society is on track to do unusually well (though bear in mind that this is a low bar), which makes the importance of Sisyphus risk even greater, as we should expect rerun risk to be higher than first-run risk. To illustrate: if we face AI existential risk of 5%, but the rerun risk would be 20%, and the chance of post-AGI catastrophe is 10%, then Sisyphus risk is 2%: 40% as large as AI risk itself. 

 

A.3 Trajectory change

So far I've focused on how catastrophic setbacks change the probability of existential catastrophe. But they also matter enormously for the quality of the future conditional on no existential catastrophe.

In other work I've suggested decomposing an action's long-term impact into its existential impact (the part that comes from changing the probability of existential catastrophe) and its trajectory impact (the part that comes from changing the value of the world conditional on no existential catastrophe).

Catastrophic setbacks affect both. On the existential side, they add extra rounds of existential risk. On the trajectory side, they plausibly change expected value substantially, because the civilisation that eventually emerges from collapse could be very different from what would have emerged otherwise.

I suspect that for many plausible parameter values, the trajectory impact of avoiding catastrophic setbacks is at least as large as—and perhaps larger than—their existential impact. But it’s unclear to me in what direction this consideration points. 

Perhaps current civilisation is unusually good (in particular because it’s unusually liberal and democratic), and if so then by regression to the mean the society we would get post-catastrophe would be worse (H/T Fin Moorhouse for this worry). And consider that, conditional on a post-AGI catastrophic setback occurring, it’s more likely that the post-AGI society is liberal rather than authoritarian, so that post-AGI society is unusually good. If so, then the trajectory impact of preventing a post-AGI  catastrophic setback is positive, too. This would increase the importance of catastrophic setbacks — perhaps considerably.

61

0
0
1

Reactions

0
0
1

More posts like this

Comments46
Sorted by Click to highlight new comments since:

The first way in which this post is an experiment is that it's work-in-progress that I'm presenting at a Forethought Research progress meeting. The experiment is just to publish it as a draft and then have the comments that I would normally receive as GoogleDoc comments on this forum post instead. The hope is that by doing this more people can get up to speed with Forethought research earlier than they would have and we can also get more feedback and thoughts at an earlier stage from a wider diversity of people.

I'd welcome takes from Forumites on how valuable or not this was.

I like the idea, though I think a shared gdoc is far better for any in-line comments. Maybe if you only want people to give high-level comments this is better though - I imagine heaps of people may want to comment on gdocs you share publicly.

Love this idea - keen to hear afterwards whether it felt useful from your end. 

Was your model informed by @Arepo 's similar models? I believe he was considering rerunning the time of perils because of a catastrophe before AGI. Either way, catastrophic risk becomes much more important to the long-run future than with a simple analysis.

Great post! A few reactions:

1. With space colonization, we can hopefully create causally isolated civilizations. Once this happens, the risk of a civilizational collapse falls dramatically, because of independence.

2. There are two different kinds of catastrophic risk: chancy, and merely uncertain. Compare flipping a fair coin (chancy) to flipping a coin that is either double headed or double tailed, but you don't know which (merely uncertain). If alignment is merely uncertain, then conditional on solving it once, we are in the double-headed case, and we will solve it again. Alignment might be like this: for example, one picture is that alignment might be brute forceable with enough data, but we just don't know whether this is so. At any rate, merely uncertain catastrophic risks do not have rerun risk, while chancy ones do. 

3. I'm a bit skeptical of demographic decline as a catastrophic risk, because of evolutionary pressure. If some groups stop reproducing, groups with high reproduction rates will tend to replace them. 

4. Regarding unipolar outcomes, you're suggesting a picture where unipolar outcomes have less catastrophic risk, but more lock-in risk. I'm unsure of this. First, unipolar world government might have higher risk of civil unrest. In particular, you might think that elites tend to treat residents better because of fear of external threats; without that threat, they may exploit residents more, leading to higher civil unrest. Second, unipolar AI outcomes may have higher risk of going rogue than multipolar, because in multipolar outcomes, humans may have extra value to AIs as a partner in competition against other AIs. 

At any rate, merely uncertain catastrophic risks do not have rerun risk, while chancy ones do. 

This is a key point. For many existential risks, the risk is mainly epistemic (i.e. we should assign some probability p to it happening in the next time period), rather than it being objectively chancy. For one-shot decision-making sometimes this distinction doesn't matter, but here it does.

Complicating matters, what is really going on is not just that the probability is one of two types, but that we have a credence distribution over the different levels of objective chance. A pure subjective case is where all our credence is on 0% and 100%, but in many cases we have credences over multiple intermediate risk levels — these cases are neither purely epistemic nor purely objective chance.

I agree - this is a great point. Thanks, Simon!

You are right that the magnitude of rerun risk from alignment should be lower than the probability of misaligned AI doom. However, in worlds in which AI takeover is very likely but that we can't change that, or in worlds where it's very unlikely and we can't change that, those aren't the interesting worlds, from the perspective of taking action. (Owen and Fin have a post on this topic that should be coming out fairly soon).  So, if we're taking this consideration into account, this should also discount the value of word to reduce misalignment risk today, too.

(Another upshot: bio-risk seems more like chance than uncertainty, so biorisk becomes comparatively more important than you'd think before this consideration.)

Agree, and this relates to my point about distinguishing the likelihood of retaining alignment knowledge from the likelihood of rediscovering it. 

On point 1 (space colonization), I think it's hard and slow! So the same issue as with bio risks might apply: AGI doesn't get you this robustness quickly for free. See other comment on this post.

I like your point 2 about chancy vs merely uncertain. I guess a related point is that when the 'runs' of the risks are in some way correlated, having survived once is evidence that survivability is higher. (Up to an including the fully correlated 'merely uncertain' extreme?)

Interesting. A few thoughts:

Beyond strengthening the case for non-existential risks, if Sisyphus risk is substantial it also weakens arguments that place extreme weight on reducing existential risk at a specific time. Some of the importance of the Time of Perils comes from comparative advantage, which is diluted if civilization plausibly gets multiple runs.

One additional Sisyphean mechanism worth flagging is resource exhaustion: collapsing before reaching renewable resource self-sufficiency could permanently worsen later runs. This probably relies on a setback happening much later or a large amount of resources being used before, but it’s worth flagging. 

A caveat on donation timing: even if post-AGI x-risk declines slowly, aligned AGI plausibly generates enormous resources, so standard patient-philanthropy arguments may still apply. And if we assume those resources are lost in a collapse, the same would likely apply to resources saved in advance.

Finally, the plausible setbacks all seem to hinge on something like the loss of knowledge. Other worries (e.g. Butlerian backlash) tend to rely on path-dependent successes—historically contingent timing, unusually alignable models, or specific public perceptions that don’t automatically replicate—seem hard to change conditional on setbacks. If those aren’t mostly luck-based and the relevant knowledge survives, a post-setback society could plausibly re-instantiate the same mechanisms, making Sisyphus risk primarily an epistemic rather than, say, a governance problem.

I guess my prior coming into this is that non-existential catastrophes are still pretty existentially important, because:

  • they are bad in and of themselves
  • they are destabilising and make it more likely that we end up with existential catastrophes
    • I definitely wasn't thinking explicitly about post-ASI catastrophes meaning we'd have to rerun the time of perils
    • But I was thinking about stuff like 'a big war would probably set back AI development and could also make culture and selection pressures a fair bit worse, such that I feel worse about the outcome of AI development after that'. And similarly for bio

It sounds like your prior was that non-existential catastrophes are much much less important than existential ones, and then these considerations are a big update for you.

So I think part of why I'm less interested in this than you are is just having different priors where this update is fairly small/doesn't change my prioritisation that much?

Yep, definitely for me 'big civ setbacks are really bad' was already baked in from the POV of setting bad context for pre-AGI-transition(s) (as well as their direct badness). But while I'd already agreed with Will about post-AGI not being an 'end of history' (in the sense that much remains uncertain re safety), I hadn't thought through the implication that setbacks could force a rerun of the most perilous transition(s), which does add some extra concern.

I do think that non-existential level catastrophes are a big deal even despite the rerun risk consideration, because I expect the civilisation that comes back from such a catastrophe to be on a worse values trajectory than the one we have today. In particular, because the world today is unusually democratic and liberal, and I expect a re-roll of history to result in less democracy than we have today at the current technological level. However, other people have pushed me on that, and I don't feel like the case here is very strong. There are also obvious reasons why one might be biassed towards having that view. 

In contrast, the problem of having to rerun the time of perils is very crisp. It doesn't seem to me like a disputable upshot at the moment, which puts it in a different category of consideration at least — one that everybody should be on board with. 

I'm also genuinely unsure whether non-existential level catastrophe increases or decreases the chance of future existential level catastrophes. One argument that people have made that I don't put that much stock in is that future generations after the catastrophe would remember it and therefore be more likely to take action to reduce future catastrophes. I don't find that compelling because I don't think that the Spanish flu made us more prepared against Covid-19, for example. Let alone that the plagues of Justinian prepared us against Covid-19. However, I'm not seeing other strong arguments in this vein, either.

"I don't think that the Spanish flu made us more prepared against Covid-19" actually I'm betting our response to Covid-19 was better than it would have been without having had major pandemics in the past. For example, the response involved developing effective vaccines very quickly

The second way in which this post is an experiment is that it's an example of what I've been calling AI-enhanced writing. The experiment here is to see how much more productive I can be in the research and writing process by relying very heavily on AI assistance — Ttrying to use AI rather than myself wherever I can possibly do so. In this case, I went from having the basic idea to having this draft in about a day of work.

I'd be very interested in people's comments on how apparent it is that AI was used so extensively in drafting this piece — in particular if there are examples of AI slop that you can find in the text and that I missed.

When I read the first italicised line of the post, I assumed that one of the unusual aspects was that the post was AI-written. So then I was unusually on the lookout for that while reading it. I didn't notice clear slop. The few times that seemed not quite in your voice/a bit more AI-coded were (I am probably forgetting some):

  • The talk of 'uncontacted tribes' - are there any? Seems like more something I would expect AIs to mention than you.
  • 'containerisation tools' - this is more computer techno-speak than I would expect from you (I don't really know what these tools are, maybe you do though).
  • ‘Capacitors dry out, solder joints crack, chips suffer long-term degradation.’ - I quite like this actually but it is a bit more flowery than your normal writing I think.

So overall, I would say the AIs acquitted themselves quite well!

I didn't suspect while reading the post that it drafted heavily with AI. 

On reflection, and having now seen this comment, the writing style does feel a bit different than your other writing that I've read, in some fairly thematically AI ways - shorter paragraphs, punchier prose, bolded bullets, etc. I don't know if it is better or worse - it was very easy to scan and understand quickly, but I do wonder if some of your usual precision or nuance is missing. (Though this is probably more to do with being an early stage draft rather than being AI-assisted).

Really like this post!

 

I'm wondering whether human-level AI and robotics will significantly decrease civilisation's susceptibility to catastrophic setbacks?

AI systems and robots can't be destroyed by pandemics. They don't depend on agriculture -- just mining and some form of energy production. And a very small number of systems could hold tacit expertise for ~all domains. 

Seems like this this might reduce the risk by a lot, such that the 10% numbers you're quoting are too high. E.g. you're assigning 10% to a bio-driven set-back. But i'd have thought that would have to happen before we get human-level robotics?

You're discussing catastrophes that are big enough to set the world back by at least 100 years. But I'm wondering if a smaller threshold might be appropriate. Setting the world back by even 10 years could be enough to mean re-running a lot of the time of perils; and we might think that catastrophes of that magnitude are more likely. (This is my current view.)

With the smaller setbacks you probably have to get more granular in terms of asking "in precisely which ways is this setting us back?", rather than just analysing it in the abstract. But that can just be faced.

One more clarification to the comment for forum users: I have tendonitis, and so I'm voice dictating all of my comments, so they might read oddly!

Interesting post!

I expect that a significant dynamic in this world would be that there'd be major investment in attempting to recover knowledge from the previous civilization. That’s because: 

  • Intellectually, it seems fascinating.
    • If a previous civilization more advanced than ours had existed and then collapsed, I imagine today’s historians would be hugely interested in that.
  • More importantly, there would be a huge economic incentive to understand the previous civilization:
    • Many of the richest and most successful people today are people who anticipated (the consequences of) important technological developments. E.g. people who specialized or invested in AI a decade ago, people who anticipated that internet commerce would be a huge deal, people who anticipated that software development was going to be very valuable, etc.
    • Similarly, there’s a huge edge to be had in science from knowing which domains are promising at which time.

This is important because it’s perhaps the main argument for the Optimistic view regarding whether a post-setback society would retain alignment knowledge.

  • You name various arguments for the Pessimistic view. Those seem reasonable to me, but I think they do have to be weighed against the fact that people would be trying pretty hard to recover a lot of knowledge about today’s world, s.t. substantial difficulties could very plausibly be overcome (e.g. deciphering old hard drives).
    • This is esp. true once that new civilization has technology advanced enough to be at the cusp of AGI.
  • I don’t have a strong view on where that leaves me overall, but intuitively, I probably feel more optimistic than you seem to be.

Separately, I think it’s worth noting that regardless of whether historians manage to recover technical knowledge about alignment, it would likely be obvious very early on that the previous civilization reached something like AGI. This would radically change the governance landscape relative to today’s world, and would plausibly make the problem easier the second time around.  

Finally, I also wanted to note that it seems intuitively likely to me that the effect on trajectory change (described in the Appendix) is more important than the effect on existential risk.


Aside: another consequence if historians are able to recover a lot of information is that economic growth in the rerun might be substantially faster than today. Scientists, entrepreneurs, and investors could learn a ton about which pursuits are most promising at what points. In particular, AI and deep learning investment might happen earlier. This might be good (e.g. because faster growth means there’s generally more surplus around the crucial period and less zero-sum mentality) or bad (e.g. because AI progress is already scary fast today, and it might be even faster in this world, since the payoff would be much clearer to everyone).

A point I'm skeptical on is that trying to preserve key information is likely to make much of a difference. I find it hard to imagine civilisation floundering after a catastrophic setback because it lacked the key insights we'd achieved so far about how to recover tech and do AI alignment and stuff.

On timescale and storage media, I'd guess we're talking about less than a century to recover back (since you're assuming a setback in tech progress of 100 years). That's enough time for hard drives to keep working, especially specialist hardware designed to survive in extreme conditions or be ultra-reliable. We also have books, which are very cheap to make and easy to read.

On AI specifically, my sense is that the most important algorithmic insights are really very compressable — they could fit into a small book, if you're prepared to do a lot of grunt work figuring out how to implement them.

We also have the ability to rebuild institutions while being able to see how previous attempts failed or succeeded, effectively getting 'unstuck' from sticky incentives which maintain the existing institutional order. Which is one factor suggesting a re-roll wouldn't be so bad.

Re section 4: should someone be printing off a bunch of AI safety papers and archiving them somewhere safe? (Probably a dumb idea.)

I think not at all a dumb idea, and I talk about this in Section 6.4. It actually feels like an activity you could do at very low cost that might have very high value per unit cost. 

I did something related but haven't updated it in a couple years! If there's a good collection of AI safety papers/other resources/anything anywhere it would be very easy for me to add it to the archive for people to download locally, or else I could try to collect stuff myself

Would be good more generally to have an updating record of the most important AI safety papers of each year 

Elaborating on the last paragraph: when considering the value of the set-back society, we're conditioning on the fact that it got set back. On one hand, (as you say) this could be evidence that society was (up to the point of catastrophe) more liberal and decentralised than it could have been, since many global catastrophes are less likely to occur under the control of a world government. Since I think the future looks brighter if society is more liberal on the dawn of AGI, then I think that's evidence the current "run" is worth preserving over the next roll we'd get; even if we're absolutely confident civilisation would survive another run after being set back (assuming a catastrophe would re-roll the dice on how well things are going). That's not saying anything about whether the world is currently looking surprisingly liberal — just that interventions to prevent pre-AGI catastrophes plausibly move probability mass from liberal/decentralised civilisations to illberal/centralised ones. And maybe that's the main effect of preventing pre-AGI catastrophes.

For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:

the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance)

It seems to me like this is a much higher bar than reaching AGI -- and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?

Thanks, that's a good catch. Really, in the simple model the relevant point of time for the first run should be when the alignment challenge has been solved, even for superintelligence. But that's before 'reasonably good global governance".

Of course, there's an issue that this is trying to model alignment as a binary thing for simplicity, even though really if a catastrophe came when half of the alignment challenge had been solved, that would still be a really big deal for similar reasons to the paper.

One additional comment is that this sort of "concepts moving around issue" is  one of the things that I've found most annoying from AI, and where it happens quite a lot. You need to try and uproot these issues from the text, and this was a case of me missing it. 

Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)

I think my impression is that the strategic upshots of this are directionally correct, but maybe not a huge deal? I'm not sure if you agree with that.

The value of saving philanthropic resources to deploy post-superintelligence is greater than it otherwise would be.

One way to think of this is that if there is a 10% existential risk from the superintelligence transition and we will attempt that transition, then the world is currently worth 0.90 V, where V is the expected value of the world after achieving that transition. So the future world is more valuable (in the appropriate long-term sense) and saving it is correspondingly more important. With these numbers the effect isn't huge, but would be important enough to want to take into account.

More generally, worlds where we are almost through the time of perils are substantially more valuable than those where we aren't. And it setback prevention becomes more important the further through you are.

For clarity, you're using 'important' here in something like an importance x tractability x neglectedness factoring? So yes more important (but there might be reasons to think it's less tractable or neglected)?

Yeah, I mean 'more valuable to prevent', before taking into account the cost and difficulty.

I think my biggest uncertainty about this is:

 

If there were a catastrophic setback of this kind, and civilisation tried hard to save and maintain the weights of superintelligent AI (which they presumably would), how likely are they to succeed? 

 

My hunch is that they very likely could succeed. E.g. in the first cpl of decades they'd have continued access to superintelligent AI advice (and maybe robotics) from pre-existing hardware. They could use that to bootstrap to longer periods of time. E.g. saving the weights on hard drives rather than SSDs, and then later transferring them to a more secure, long lasting format. Then figure out the minimal-effort-version of compute maintenance and/or production needed to keep running some superintelligences indefinitely

On section 4, where you ask about retaining alignment knowledge:

  • It feels kind of like you're mislabelling the ends of the spectrum?
  • My guess is that rather than think about "how much alignment knowledge is lost?", you should be asking about the differential between how much AI knowledge is lost and how much alignment knowledge is lost
  • I'm not sure that's quite right either, but it feels a little bit closer?

Okay, looking at the spectrum again, it still seems to me like I've labelled them correctly? Maybe I'm missing something. It's optimistic if we can retain a knowledge of how to align AGI because then we can just use that knowledge later and we don't face the same magnitude of risk of the misaligned AI. 

Sorry, I didn't mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren't necessarily the ends of the spectrum -- for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.

At least that's what I had in mind at the time of writing my comment. I'm now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It's plausible that this is actually more important than the more explicitly "alignment" knowledge. (Assuming that compute will be the bottleneck.)

A thought on unipolarity: One worry here is that the pursuit of post-AGI unipolarity could be the exact kind of thing that triggers a catastrophic setback. If one nation or coalition looks posed to create an aligned-to-them AGI, and lock up control of the future, this gives other nations strong incentives to launch preemptive strikes before that happens. This can be true even if everyone understands the impending AGI to be quite benevolent. There are plenty of nuclear-armed nations whose leaders might strongly prefer to retain private power, even to the detriment of their populations' wellbeing, rather than accept benign foreign hegemony. 

A small aside: some put forth interplanetary civilisation as a partial defence against either of total destruction and 'setback'. But reaching the milestone of having a really robustly interplanetary civ might itself take quite a long time after AGI - especially if (like me) you think digital uploading is nontrivial.

(This abstractly echoes the suggestion in this piece that bio defence might take a long time, which I agree with.)

I agree with this. One way of seeing that is how many doublings of energy consumption civilisation can have before it needs to move beyond the solar system? The answer to that is about 40 doublings. Which, depending on your views on just how fast explosive industrial expansion goes, could be a pretty long time, e.g. decades.

I've been meaning to write something about 'revisiting the alignment strategy'. The section 5 here ('Won't AGI make post-AGI catastrophes essentially irrelevant?') makes the point very clearly:

On this view, a post-AGI world is nearly binary—utopia or extinction—leaving little room for Sisyphean scenarios.

But I think this is too optimistic about the speed and completeness of the transition to globally deployed, robustly aligned "guardian" systems.

without making much of a case for it. Interested in Will and reviewers' sense of the space and literature here.

I've often been frustrated by this assumption over the last 20 years, but don't remember any good pieces about it.

It may be partly from Eliezer's first alignment approach being to create a superintelligent sovereign AI, where if that goes right, other risks really would be dealt with.

Catastrophic setbacks affect both. On the existential side, they add extra rounds of existential risk. On the trajectory side, they plausibly change expected value substantially, because the civilisation that eventually emerges from collapse could be very different from what would have emerged otherwise.

 

Also, if you expect that there are opportunities during the intelligence explosion to lock-in bad trajectories (e.g., extreme power concentration), then you get reroll risk on those too, which is bad, even if your risk of power concentration doesn't increase.

(I think this requires the risk of a post-AGI catastrophe to be higher if we haven't locked in bad trajectories; otherwise the reroll risk symmetrically provides a positive benefit of getting a chance to get out of a bad trajectory)

I think it's worth separating out two questions about alignment in the re-run: (1) how likely are we to retain alignment knowledge, and (2) how likely are we to rediscover alignment knowledge? The second seems important because, for some possible futures rediscovery of alignment techniques seems likely to correlate strongly with rediscovery of the technology necessary to make ASI. Then, conditional on being able to make capable AI, we might be quite likely to be able to align it. 

The more that alignment is easy, and the relevant techniques look a lot like the techniques you need to make capable AI in the first place, the more likely alignment rediscovery conditional on AI rediscovery seems. While it's currently uncertain whether today's alignment techniques will scale to ASI, many (e.g. RL-based techniques) do seem quite closely related to the techniques you need to make the AI capable in the first place. 

Curated and popular this week
Relevant opportunities