The Governance Problem and the "Pretty Good" X-Risk

Zach Stein-Perlman

The Governance Problem and the "Pretty Good" X-Risk

Zach Stein-Perlman

14 min readAug 28, 2021

Comments 4

Sorted by

New & upvoted

MaxRa

Really interesting post, thanks! Some random reactions.

"Pretty good" governance failure is possible. We could end up with an outcome that many or most influential people want, but that wiser versions of ourselves would strongly disapprove of. This scenario is plausibly the default outcome of aligned superintelligence: great uses of power are a tiny subset of the possible uses of power, the people/institutions that currently want great outcomes constitute a tiny share of total influence, and neither will those who want non-great outcomes be persuaded nor will those who want great outcomes acquire influence much without us working to increase it.

My first gut reaction is skepticism that this is a likely or stable state. The Earthly utopia scenario will likely not happen given that seemingly most explorations of humanity's future prominently feature its expansion to space. Additionally, I suspect that a large fraction of people who seriously start thinking about the longterm future of humanity fall into the camp that you consider "people/institutions that currently want great outcomes" If this is true, one might suspect that this will become a much stronger faction and an aligned AI will have to consider those ambitions, too?

Robin Hanson speculated that the debate between people who want to use our cosmic endownment and those who want to stay local might be the cultural debate of the future. He calls it becoming grabby vs. non-grabby. He worries that a central government will try to restrict grabby expansion because it would be nearly impossible to keep the growing civilization under its control:

If within a few centuries we have a strong world government managing capitalist competition, overpopulation, value drift, and much more, we might come to notice that these and many other governance solutions to pressing problems are threatened by unrestrained interstellar colonization. Independent colonies able to change such solutions locally could allow population explosions and value drift, as well as capitalist competition that beats out home industries. That is, colony independence suggests unmanaged colony competition. In addition, independent colonies would lower the status of those who control the central government.
So authorities would want to either ban such colonization, or to find ways to keep colonies under tight central control. Yet it seems very hard to keep a tight lid on colonies. The huge distances involved make it hard to require central approval for distant decisions, and distant colonists can’t participate as equals in governance without slowing down the whole process dramatically. Worse, allowing just one sustained failure, of some descendants who get grabby, can negate all the other successes. This single failure problem gets worse the more colonies there are, the further apart they spread, and the more advanced technology gets.

https://www.overcomingbias.com/2021/07/the-coming-cosmic-control-conflict.html

I'm kind of sceptical that the desire to have absolute control would be strong enough to stamp down any expansatory and exploratory ambitions. I suspect that humans and institutions will converge considerably towards the "making most of our endowment" stance. With the increasing wealth, we will learn more about how much value we will be able to create and how much more value is possible compared to our prosaic imaginations, so an aligned AI will also work towards helping us achieve those sooner or later.

Zach Stein-Perlman

Thanks for your comments!

My first gut reaction is skepticism that [a "pretty good" scenario] is a likely or stable state.

I certainly agree that Earthly utopia won't happen; I just wrote that to illustrate how prosaic values would be disastrous in some circumstances. But here are some similar things that I think are very possible:

Scenarios where some choices that are excellent by prosaic standards unintentionally make great futures unlikely or impossible.
Scenarios where the choices that would tend to promote great futures are very weird by prosaic standards and fail to achieve the level of consensus necessary for adoption.

In retrospect, I should have thought and written more about failure scenarios instead of just risk factors for those scenarios. I expect to revise this post, and failure scenarios would be an important addition. For now, here's my baseline intuition for a "pretty good" future:

After an intelligence explosion, a state controls aligned superintelligence. Political elites
- are not familiar with ideas like long reflection and indirect normativity,
- do not understand why such ideas are important,
- are constrained from pursuing such goals ( or perhaps because opposed factions can veto such ideas), or
- do not get to decide what to do with superintelligence because the state's decisionmaking system is bound by prior decisions about how powerful AI should be used (either directly, by forbidding great uses of AI, or indirectly, by giving decisionmaking power to groups unlikely to choose a great future)
So the state initially uses AI in prosaic ways and, roughly speaking, thinks of AI in prosaic ways. I don't have a great model of what happens to our cosmic endowment in this scenario, but since we're at the point where unwise individuals/institutions are empowered, the following all feel possible:
- We optimize for something prosaic
- We lock in a choice that disallows intentionally optimizing for anything
- We enter a stable state in which we do not choose to optimize for anything

I don't have much to say about Hanson right now, but I'll note that a future that involves status-seeking humans making decisions about cosmic-scale policy (for more than a transition period to locking in something great) is probably a failure; success looks more like optimizing the universe.

I suspect that a large fraction of people who seriously start thinking about the longterm future of humanity fall into the camp that you consider "people/institutions that currently want great outcomes"

Historically, sure. But I think that's due to selection: the people who think about the longterm future are mostly rationalist/EA-aligned. I would be very surprised if a similar fraction of a more representative group had the wisdom/humility/whatever to want a great future, much less the background to understand why we even have a "pretty good" future problem.

If this is true, one might suspect that this will become a much stronger faction

I suspect that humans and institutions will converge considerably towards the "making most of our endowment" stance.

This would surprise me. I expect poor discourse (in the US, at least) about how to use powerful AI. In particular, I expect:

The discourse will focus on prosaic issues like privacy and the future of work.
People will assume that the universe-scale future looks like "humans flying around in spaceships" and debate what those humans should do (rather than "superintelligent von Neumann probes optimizing for something" and debate what they should optimize for — much less recognize that we shouldn't be thinking about what they should optimize for; we should delegate that decision to a better system than current human judgment).

(Also, your comment implies that aligned superintelligence will try to optimize for all humans' preferences. I would be surprised if this occurs; I expect aligned superintelligence to try to do what its controller says.)

I would be very excited to call to discuss this further. Please PM me if you're interested.

MaxRa

If we create aligned superintelligence, how we use it will involve political institutions and processes. Superintelligence will probably be controlled by a state or a group of states. This is more likely the more AI becomes popularly appreciated and the more legibly powerful AI is created before the intelligence explosion.

It seems really useful to me to understand better how likely states will end up calling the shots. I wonder if there are potential options for big tech to keep sovereignty about AI. I'd suspect a company would prefer staying in control and will consider all options it has available. Just some random initial thoughts:

negotiate about moving headquarters with different countries to get as much autonomy as possible
somehow "diversify" across nations and continents and play them off against each other
construct advanced AI systems such that they only do a few narrow tasks that don't seem obviously urgent to be nationalized (something like Codex 3.0, or personal assistants, …) and internally assembling so much ability that they can resist nationalization?

Zach Stein-Perlman

It seems really useful to me to understand better how likely states will end up calling the shots.

Yes, absolutely. I think this largely depends on the extent to which political elites appreciate AI's importance; I expect that political elites will appreciate AI and take action in a few years, years before an intelligence explosion. I want to read/think/talk about this.

While big tech companies will probably come up with more strategies, I'm skeptical about their ability to not be nationalized or closely supervised by states. In response to your specific suggestions:

I think states are broadly able to seize property in their territory. To secure autonomy, I think a corporation would have to get the government to legally bind itself. I can't imagine the US or China doing this. Perhaps a US corporation could make a deal with another government and move its relevant hardware to that state before the US appreciates AI or before the US has time to respond? That would be quite radical. Given the major national security implications of AI, even such a move might not guarantee autonomy. But I think corporations would probably have to move somehow to maintain autonomy if there was political will and a public mandate for nationalization.
I don't understand. But if the US and China appreciate AI's national security implications, they won't be distracted.
I don't understand "assembling . . . ability," but corporations intentionally making AI feel nonthreatening is interesting. I hadn't thought about this. Hmm. This might be a factor. But there's only so much that making systems feel nonthreatening can do. If political elites appreciate AI, then it won't matter whether currently-deployed AI systems feel nonthreatening: there will be oversight. It's also very possible that the US will have a Sputnik moment for AI and then there's strong pressure for a national AI project independent of the current state of private AI in the US.

Comments

More from the author

220

FLI open letter: Pause giant AI experiments

Zach Stein-Perlman·3y ago·3m read

134

Maybe Anthropic's Long-Term Benefit Trust is powerless

Zach Stein-Perlman·2y ago·3m read

128

Introducing AI Lab Watch

Zach Stein-Perlman·2y ago·2m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·5d ago·Curated 1d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

150

Let's taboo the V-word

lincolnq·5d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·2d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

EA Organisation Updates thread: July 2026

Dane Valerie·4d ago·1m read

A proposal for food retail and services: the internal animal welfare feebate

Stijn Bruers 🔸·5h ago·6m read

announcing High Impact Aliens

tzukitchan·1d ago·1m read

Zach Stein-Perlman

Thanks for your comments!

My first gut reaction is skepticism that [a "pretty good" scenario] is a likely or stable state.

Scenarios where some choices that are excellent by prosaic standards unintentionally make great futures unlikely or impossible.
Scenarios where the choices that would tend to promote great futures are very weird by prosaic standards and fail to achieve the level of consensus necessary for adoption.

After an intelligence explosion, a state controls aligned superintelligence. Political elites
- are not familiar with ideas like long reflection and indirect normativity,
- do not understand why such ideas are important,
- are constrained from pursuing such goals ( or perhaps because opposed factions can veto such ideas), or
- do not get to decide what to do with superintelligence because the state's decisionmaking system is bound by prior decisions about how powerful AI should be used (either directly, by forbidding great uses of AI, or indirectly, by giving decisionmaking power to groups unlikely to choose a great future)
So the state initially uses AI in prosaic ways and, roughly speaking, thinks of AI in prosaic ways. I don't have a great model of what happens to our cosmic endowment in this scenario, but since we're at the point where unwise individuals/institutions are empowered, the following all feel possible:
- We optimize for something prosaic
- We lock in a choice that disallows intentionally optimizing for anything
- We enter a stable state in which we do not choose to optimize for anything

I suspect that a large fraction of people who seriously start thinking about the longterm future of humanity fall into the camp that you consider "people/institutions that currently want great outcomes"

If this is true, one might suspect that this will become a much stronger faction

I suspect that humans and institutions will converge considerably towards the "making most of our endowment" stance.

This would surprise me. I expect poor discourse (in the US, at least) about how to use powerful AI. In particular, I expect:

The discourse will focus on prosaic issues like privacy and the future of work.
People will assume that the universe-scale future looks like "humans flying around in spaceships" and debate what those humans should do (rather than "superintelligent von Neumann probes optimizing for something" and debate what they should optimize for — much less recognize that we shouldn't be thinking about what they should optimize for; we should delegate that decision to a better system than current human judgment).

I would be very excited to call to discuss this further. Please PM me if you're interested.

I am not aware of an existing name for the important problem of getting a superintelligence that does what its operator wants to do what is best. This problem roughly requires wisdom and caution to avoiding locking in object-level values prematurely and coordination among people with influence over using superintelligence.

Nick Bostrom defined the "political problem," complementing the control problem, as "how to achieve a situation in which individuals or institutions empowered by such AI use it in ways that promote the common good." To the extent that value is binary, it matters less whether AI promotes the common good on net and more whether AI does astronomical good. To the extent that superintelligence (not previous AI) is all that matters after superintelligence exists, it only matters how we use the superintelligence. I assume Bostrom used this less carving-at-the-joints-y definition for simplicity and to decrease inferential distance for people outside the community; I'm pretty sure that my "governance problem" is closer to how we should be thinking about the problem of using AI well.

Will MacAskill once called some related issues the "second-level alignment problem," but it's not clear what exactly he meant.

Note that, roughly, P(win) = P(aligned powerful AI) * P(great use) = P(survive until powerful AI) * P(powerful AI is aligned) * P(great use). This suggests a decomposition of the problem of achieving an existential win into three subproblems: the survival problem, the alignment problem, and the governance problem. ↩︎
That is, some uses of superintelligence would have near-optimal expected value, where optimal expected value is roughly what we would achieve if we were thoughtful, wise, coordinated, and successful, by our standards. ↩︎
Acausal trade is a conceivable source of value that does not necessarily require our colonizing the universe well. But it is prima facie even more politically challenging. Regardless, the prospect of it and other speculative, potentially radically effective strategies gives us additional reason to increase our collective ability to do unintuitive things with superintelligence. ↩︎
These look similar in practice — rather than just telling the superintelligence to optimize for X, we'd probably have it tell us what optimizing for X would look like first, so we're effectively hearing the object-level way to optimize for X and then telling it to pursue that path. ↩︎
Similarly to note 4, level number isn't really meaningful; it just matters that there's a chain of delegation that ends in something great. ↩︎
More uncertain scenarios would occur if (1) the controller of superintelligence does not make decisions in a predictable way (e.g., it's a group of states with different goals, or it's an international organization without a clear mandate for using superintelligence) or (2) there is a multipolar outcome of some sort — e.g., if there is slow takeoff (in particular, no threshold-y behavior) or superintelligence is not able to form a singleton. ↩︎
Since almost all of the resources eventually available to us involve colonizing the universe and it's prima facie unlikely that what sounds normal to current humans is optimal. ↩︎
I am sympathetic to long reflection but will not defend it here. I merely use it as a prima facie example of a system that could have the two necessary properties for successful governance: acceptability and great decisionmaking. ↩︎
Scott Alexander's Meditations on Moloch. While superintelligence could kill Moloch dead, Moloch might choose how we use it. That would be ironic. ↩︎

The Governance Problem and the "Pretty Good" X-Risk

The Governance Problem and the "Pretty Good" X-Risk

I. Introduction

II. Good Uses of Superintelligence

III. Short-Term Issues

IV. Unipolar Failure Modes

V. Multipolar Failure Modes

VI. Conclusion