Ryan Greenblatt's Quick takes

Ryan Greenblatt

Reducing the probability that AI takeover involves violent conflict seems leveraged for reducing near-term harm

Often in discussions of AI x-safety, people seem to assume that misaligned AI takeover will result in extinction. However, I think AI takeover is reasonably likely to not cause extinction due to the misaligned AI(s) effectively putting a small amount of weight on the preferences of currently alive humans. Some reasons for this are discussed here. Of course, misaligned AI takeover still seems existentially bad and probably eliminates a high fraction of future value from a longtermist perspective.

(In this post when I use the term “misaligned AI takeover”, I mean misaligned AIs acquiring most of the influence and power over the future. This could include “takeover” via entirely legal means, e.g., misaligned AIs being granted some notion of personhood and property rights and then becoming extremely wealthy.)

However, even if AIs effectively put a bit of weight on the preferences of current humans it's possible that large numbers of humans die due to violent conflict between a misaligned AI faction (likely including some humans) and existing human power structures. In particular, it might be that killing large numbers of humans (possibly as collateral damage) makes it easier for the misaligned AI faction to take over. By large numbers of deaths, I mean over hundreds of millions dead, possibly billions.

But, it's somewhat unclear whether violent conflict will be the best route to power for misaligned AIs and this also might be possible to influence. See also here for more discussion.

So while one approach to avoid violent AI takeover is to just avoid AI takeover, it might also be possible to just reduce the probability that AI takeover involves violent conflict. That said, the direct effects of interventions to reduce the probability of violence don't clearly matter from an x-risk/longtermist perspective (which might explain why there hasn't historically been much effort here).

(However, I think trying to establish contracts and deals with AIs could be pretty good from a longtermist perspective in the case where AIs don't have fully linear returns to resources. Also, generally reducing conflict seems maybe slightly good from a longtermist perspective.)

So how could we avoid violent conflict conditional on misaligned AI takeover? There are a few hopes:

Ensure a bloodless coup rather than a bloody revolution
Ensure that negotiation or similar results in avoiding the need for conflict
Ensure that a relatively less lethal takeover strategy is easier than more lethal approaches

I'm pretty unsure about what the approaches here look best or are even at all tractable. (It’s possible that some prior work targeted at reducing conflict from the perspective of S-risk could be somewhat applicable.)

Separately, this requires that the AI puts at least a bit of weight on the preferences of current humans (and isn't spiteful), but this seems like a mostly separate angle and it seems like there aren't many interventions here which aren't covered by current alignment efforts. Also, I think this is reasonably likely by default due to reasons discussed in the linked comment above. (The remaining interventions which aren’t covered by current alignment efforts might relate to decision theory (and acausal trade or simulation considerations), informing the AI about moral uncertainty, and ensuring the misaligned AI faction is importantly dependent on humans.)

Returning back to the topic of reducing violence given a small weight on the preferences of current humans, I'm currently most excited about approaches which involve making negotiation between humans and AIs more likely to happen and more likely to succeed (without sacrificing the long run potential of humanity).

A key difficulty here is that AIs might have a first mover advantage and getting in a powerful first strike without tipping its hand might be extremely useful for the AI. See here for more discussion (also linked above). Thus, negotiation might look relatively bad to the AI from this perspective.

We could try to have a negotiation process which is kept secret from the rest of the world or we could try to have preexisting commitments upon which we'd yield large fractions of control to AIs (effectively proxy conflicts).

More weakly, just making negotiation at all seem like a possibility, might be quite useful.

I’m unlikely to spend much if any time working on this topic, but I think this topic probably deserves further investigation.