AI risk

# 49

This article forms part of 80000 Hours's explanation of risks from artificial intelligence, and focuses on how an AI system could cause an existential catastrophe. Our full problem profile on risks from AI looks at why we’re worried things like this will happen.

At 5:29 AM on July 16, 1945, deep in the Jornada del Muerto desert in New Mexico, the Manhattan Project carried out the world’s first successful test of a nuclear weapon.

From that moment, we’ve had the technological capacity to wipe out humanity.

But if you asked someone in 1945 to predict exactly how this risk would play out, they would almost certainly have got it wrong. They may have thought there would have been more widespread use of nuclear weapons in World War II. They certainly would not have predicted the fall of the USSR 45 years later. Current experts are concerned about India–Pakistan nuclear conflict and North Korean state action, but 1945 was before even the partition of India or the Korean War.

That is to say, you’d have real difficulty predicting anything about how nuclear weapons would be used. It would have been even harder to make these predictions in 1933, when Leo Szilard first realised that a nuclear chain reaction of immense power could be possible, without any concrete idea of what these weapons would look like.

Despite this difficulty, you wouldn’t be wrong to be concerned.

In our problem profile on AI, we describe a very general way in which advancing AI could go wrong. But there are lots of specifics we can’t know much about at this point. Maybe there will be a single transformative AI system, or maybe there will be many; there could be very fast growth in the capabilities of AI, or very slow growth. Each scenario will look a little different, and carry different risks. And the specific problems that arise in any one scenario are necessarily less likely to happen than the overall risk.

Despite not knowing how things will play out, it may still be useful to look at some concrete possibilities of how things could go wrong.

In particular, we argued in the full profile that sufficiently advanced systems might be able to take power away from humanshow could that possibly happen?

# How could a power-seeking AI actually take power?

Here are seven possible techniques that could be used by a power-seeking AI (or multiple AI systems working together) to actually gain power.[1]

These techniques could all interact with one another, and it’s difficult to say at this point (years or decades before the technology exists) which are most likely to be used. Also, systems more intelligent than humans could develop plans to seek power that we haven’t yet thought of.

## 1. Hacking

Software is absolutely full of vulnerabilities. The US National Institute of Standards and Technology reported over 8,000 vulnerabilities found in systems across the world in 2021 — an average of 50 per day.

# 49

7 comments, sorted by Click to highlight new comments since:
New Comment

This has always been the least convincing part of the AI risk argument for me. I'll probably sketch out more in depth objections in a post someday, but heres a preliminary argument:

First, the scenarios where the AI takes over quickly  seem to assume a level of omnipotence and omnisicence on the part of an AGI that is extremely unlikely.  For example, the premise of "every single person in the world suddenly dies" (with no explanation given). No plan in the history of intelligence has reached that level of perfection. There is no test data on "subjugate all of humanity at once", and because knowledge requires empirical testing and evidence, mistakes will be made. I think that since taking over the world is insanely hard, this will be enough to cause failure.

Secondly, the scenarios where the AI takes over slowly have the problem that if the accumulation is slow, there's enough time for multiple different AI with different goals to exist and take form.  If the AI risk reasoning is correct, it's likely they'll deduce that the other AI's are their ultimately biggest threat. They'll either war with each other, or prematurely attack humanity to ensure no more AI's are made.

Once either of these AI is discovered, the problem reduces down to a conventional war between an AI and the entire rest of planet earth. I'd be interested in seeing a military analysis of how a conventional war with AI would go. My intuition is that if it occurred today, the AI would be screwed, as it needs electricity to live and we don't. Also pretty much all military equipment existing has at least some manual components. That may change as time goes on.

This is great, thank you. Honestly it feels a little telling that this has barely been explored? Despite being THE x-risk? I get that the intervention point happens before it gets to this point, but knowing the problem is pretty core to prevention.

A force smarter/more powerful than us is scary, no matter what form it takes. But we (EA) feels a little swept up in one particular vision of AI timelines that doesn't feel terribly grounded. I understand its important to assume the worst, but its also important to imagine what would be realistic and then intermingle the two. Maybe this is why the EA approach to AI risk feels blinkered to me. So much focus is on the worst possible outcome and far less on the most plausible outcome?

(or maybe I'm just outside the circles and all this is ground being trodden, I'm just not privy to it)

I agree that I'd love to see more work on this! (And I agree that the last story I talk about, of a very fast takeoff AI system with particularly advanced capabilities, seems unlikely to me - although others disagree, and think this "worst case" is also the most likely outcome.)

It's worth noting again though that any particular story is unlikely to be correct. We're trying to forecast the future, and good ways of forecasting should feel uncertain at the end, because we don't know what the future will hold. Also, good work on this will (in my opinion) give us ideas about what many possible scenarios will look like . This sort of work (e.g. the first half of this article, rather than the second), often feels less concrete,  but is, I think, more likely to be correct - and can inform actions that target many possible scenarios rather than one single unlikely event.

All that said, I'm excited to see work like OpenPhil's nearcasting project which I find particularly clarifying and which will, I hope, improve our ability to prevent a catastrophe.

This profile by 80k is pretty bad in terms of just glossing over all the intermediate steps and reducing it all to "But one day, every single person in the world suddenly dies."

Universal Paperclips is slightly better about this, showing the process of the AI gaining our trust before betraying us, but the key power-grab step is still reduced to just "release the hypnodrones".

There are other places that have fleshed out the details of how misaligned power-seeking might play out, such as Holden Karnofsky's post AI Could Defeat All Of Us Combined.

That particular story, in which I write "one day, every single person in the world suddenly dies", is about a fast takeoff self-improvement scenario. In such scenarios, a sudden takeover is exactly what we should expect to occur, and the intermediate steps set out by Holden and others don't apply to such scenarios. Any guessing about what sort of advanced technology would do this necessarily makes the scenario less likely, and I think such guesses (e.g. "hypnodrones") are extremely likely to be false and aren't useful or informative.

For what it's worth, I  personally agree that slow takeoff scenarios like those described by Holden (or indeed those I discuss in the rest of this article) are far more likely.  That's why I focus many different ways in which an AI could take over - rather than on any particular failure story. And, as I discuss, any particular combination of steps is necessarily less likely than the claim that any or all of these capabilities could be used.

But a significant fraction of people working on AI existential safety disagree with both of us, and think that a story which literally claims that a sufficiently advanced system will suddenly kill all humans is the most likely way for this catastrophe to play out! That's why I also included a story which doesn't explain these intermediate steps, even though my inside view is that this is less likely to occur.

I'm one of the AI researchers worried about fast takeoff. Yes, it's probably incorrect to pick any particular sudden-death scenario and say it's how it'll happen, but you can provide some guesses and a better illustration of one or more possibilities. For example, have you read Valuable Humans In Transit? https://qntm.org/transit

Fantastic writeup, thank you! Our university group just assigned What Failure Looks Like as a core reading for our AI safety reading group, but this has a clearer breakdown of distinct capabilities that could threaten us. We'll include it in future groups.