Hide table of contents

Even if we solve the AI alignment problem, we still face post-alignment problems, which are all the other existential problems[1] that AI may bring.

People have identified various imposing problems that we may need to solve before developing ASI. An incomplete list of topics: misuse; animal-inclusive AI; AI welfare; S-risks from conflict; gradual disempowerment; permanent mass unemployment; risks from malevolent actors/AI-enabled coups/gradual concentration of power; moral error.

If we figure out how to resolve one of these problems, we still have to deal with all the others. If even one problem remains unsolved, the future could be catastrophically bad. That fact diminishes the promise of working on problems individually.

A global moratorium on superintelligence buys us more time to work on alignment as well as all of the post-alignment problems. Pausing AI is in the common interest of many causes.[2]

Cross-posted from my website.

We can't delay until after ASI

If we figure out how to align ASI, can it solve post-alignment problems for us? Or can we use ASI to enable a Long Reflection? No.

To build an aligned ASI, one of two conditions must hold:

  1. The ASI has locked-in values.
  2. The ASI is corrigible: it will do what its masters say, and will allow its goals to be changed.

If values are locked in, we can't defer any problems related to moral philosophy; we must solve them in advance.[3]

If the ASI is corrigible, then that lets us take time to do a Long Reflection, figuring out The Good with the help of a superintelligent assistant. But a corrigible ASI creates other problems. It means the first person to get access to the newly-created ASI could use it to take over the world. If the ASI is widely accessible, bad actors could use it to do enormous harm. Corrigibility increases catastrophic risks from misuse and totalitarianism.

If we want a post-ASI Long Reflection, then we still need the AI to be aligned, and we need some sort of impartial governance that prevents rogue individuals from co-opting the Reflection. By strong default, ASI will end liberal democracy. On the current trajectory, we will end up with a small group of people—either AI company leaders or government leaders—having dictatorial control over advanced AI. At minimum, we need to solve the AI misuse and power concentration problems before developing ASI; and we need to have a way to avoid value lock-in without exacerbating misuse and concentration risks.

Perhaps there's some version of value alignment/corrigibility that finds the right middle ground to avoid the problems on both sides. But anything resembling a solution looks very far off, and not enough people take these problems seriously.

What's the alternative to pausing?

Advocating to pause AI is the most important response to post-alignment problems, but it might not be the most cost-effective. Achieving a globally coordinated pause would be difficult. Maybe it's more cost-effective to work on various post-alignment problems individually, or to search for other mitigations that reduce risk from many post-alignment problems simultaneously.

I can't confidently say that advocating for a pause is the best thing to do, but nothing else looks clearly better.

Two arguments in favor of prioritizing AI pause advocacy as an answer to post-alignment problems:

  1. If timelines are short, then we don't have time to solve post-alignment problems.
  2. Pausing AI helps with all post-alignment problems simultaneously by giving us more time to work on them.

The most compelling argument against pause advocacy is that it's intractable. It's out of scope of this essay to go in depth on tractability, but I expect that achieving a pause is less difficult than solving every post-alignment problem without pausing. In an alternative world where (say) we're home free as long as we solve the problem of AI-enabled totalitarianism, then directly working on totalitarianism might be better than pause advocacy. But there are many bad outcomes to avert, which makes pausing AI—as difficult as that would be—easier than solving all the post-alignment problems in a short time span.

Research agendas on post-alignment problems rarely propose "pause/slow down AI development" as a mitigation. This may be because the authors don't believe it's a good response. But the research agendas don't consider-and-ultimately-reject the idea of pausing AI; instead, they don't address it at all. If I'm wrong, and a pause is not the best answer to post-alignment problems, then there is work to be done to articulate why other responses are better.


  1. Existential in the classic sense of "a permanent loss of most of the potential flourishing of the future". ↩︎

  2. This wording is borrowed from Rationality: Common Interest of Many Causes. ↩︎

  3. Our best bet might be something like Coherent Extrapolated Volition. Unfortunately, no AI developers are working on how to do that. ↩︎

7

1
0

Reactions

1
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities