In late 2021, MIRI hosted a series of conversations about AI risk with a number of other EAs working in this problem area. As of today, we've finished posting the (almost entirely raw and unedited) results of that discussion.

To help with digesting the sequence now that it's out, and to follow up on threads of interest, we're hosting an AMA this Wednesday (March 2) featuring researchers from various organizations (all speaking in their personal capacity):

  • Paul Christiano (ARC)
  • Richard Ngo (OpenAI)
  • Rohin Shah (DeepMind)
  • Nate Soares (MIRI)
  • Eliezer Yudkowsky (MIRI)

You're welcome to post questions, objections, etc. on any vaguely relevant topic, whether or not you've read the whole sequence.

The AMA is taking place on LessWrong, and is open to comments now: If you don't have a LessWrong account, feel free to post questions below and I'll cross-post them.

5 comments, sorted by Click to highlight new comments since: Today at 9:12 AM
New Comment

Do you believe that AGI poses a greater existential risk than other proposed x-risk hazards, such as engineered pandemics? Why or why not?

Thanks for the question! I cross-posted it here; Nate Soares replies:

For sure. It's tricky to wipe out humanity entirely without optimizing for that in particular -- nuclear war, climate change, and extremely bad natural pandemics look to me like they're at most global catastrophes, rather than existential threats. It might in fact be easier to wipe out humanity by enginering a pandemic that's specifically optimized for this task (than it is to develop AGI), but we don't see vast resources flowing into humanity-killing-virus projects, the way that we see vast resources flowing into AGI projects. By my accounting, most other x-risks look like wild tail risks (what if there's a large, competent, state-funded successfully-secretive death-cult???), whereas the AI x-risk is what happens by default, on the mainline (humanity is storming ahead towards AGI as fast as they can, pouring billions of dollars into it per year, and by default what happens when they succeed is that they accidentally unleash an optimizer that optimizes for our extinction, as a convergent instrumental subgoal of whatever rando thing it's optimizing).

I responded here - cross-posting here for convenience:

Hi, I'm the user who asked this question. Thank you for responding!

I see your point about how an AGI would intentionally destroy humanity versus engineered bugs that only wipe us out "by accident", but that's conditional on the AGI having "destroy humanity" as a subgoal. Most likely, a typical AGI will have some mundane, neutral-to-benevolent goal like "maximize profit by running this steel factory and selling steel". Maybe the AGI can achieve that by taking over an iron mine somewhere, or taking over a country (or the world) and enslaving its citizens, or even wiping out humanity. In general, my guess is that the AGI will try to do the least costly/risky thing needed to achieve its goal (maximizing profit), and (setting aside that if all of humanity were extinct, the AGI would have no one to sell steel to) wiping out humanity is the most expensive of these options and the AGI would likely get itself destroyed while trying to do that. So I think that "enslave a large portion of humanity and export cheap steel at a hefty profit" is a subgoal that this AGI would likely have, but destroying humanity is not.

It depends on the use case - a misaligned AGI in charge of the U.S. Armed Forces could end up starting a nuclear war - but given how careful the U.S. government has been about avoiding nuclear war, I think they'd insist on an AGI being very aligned with their interests before putting it in charge of something so high stakes.

Also, I suspect that some militaries (like North Korea's) might be developing bioweapons and spending 1 to 100% as much on it annually as OpenAI and DeepMind spend on AGI; we just don't know about it.

Based on your AGI-bioweapon analogy, I suspect that AGI is a greater hazard than bioweapons, but not by quite as much as your argument implies. While few well-resourced actors are interested in using bioweapons, a who's who of corporations, states, and NGOs will be interested in using AGI. And AGIs can adopt dangerous subgoals for a wide range of goals (especially resource extraction), whereas bioweapons can basically only kill large groups of people.

Toby Ord's definition of an existential catastrophe is "anything that destroys humanity's longterm potential." The worry is that misaligned AGI which vastly exceeds humanity's power would be basically in control of what happens with humans, just as humans are, currently, basically in control of what happens with chimpanzees. It doesn't need to kill all of us in order for this to be a very, very bad outcome.

E.g. the enslavement by the steel-loving AGI you describe sounds like an existential catastrophe, if that AGI is sufficiently superhuman. You describe a "large portion of humanity" enslaved in this scenario, implying a small portion remain free — but I don't think this would happen. Humans with meaningful freedom are a threat to the steel-lover's goals (e.g. they could build a rival AGI) so it would be instrumentally important to remove that freedom.

The AGI would rather write programs to do the grunt work, than employ humans, as they can be more reliable, controllable, etc. It could create such agents by looking into its own source code and copying / modifying it. If it doesn't have this capability it will spend time researching (could be years) until it does. On a thousand-year timescale it isn't clear why an AGI would need us for anything besides say, specimens for experiments.

Also as reallyeli says, having a single misaligned agent with absolute control of our future seems terrible no matter what the agent does.