RobBensinger

Wiki Contributions

Comments

Survey on AI existential risk scenarios

Fascinating results! I really appreciate the level of thought and precision you all put into the survey questions.

Were there any strong correlations between which of the five scenarios respondents considered more likely?

Predict responses to the "existential risk from AI" survey

Survey results for Q2, Q1 (hover for spoilers):

OpenAI: ~21%, ~13%

FHI: ~27%, ~19%

DeepMind: (no respondents declared this affiliation)

CHAI/Berkeley: 39%, 39%

MIRI: 80%, 70%

Open Philanthropy: ~35%, ~16%

"Existential risk from AI" survey results

Some reasons I can imagine for focusing on 90+% loss scenarios:

  • You might just have the empirical view that very few things would cause 'medium-sized' losses of a lot of the future's value. It could then be useful to define 'existential risk' to exclude medium-sized losses, so that when you talk about 'x-risks' people fully appreciate just how bad you think these outcomes would be.
  • 'Existential' suggests a threat to the 'existence' of humanity, i.e., an outcome about as bad as human extinction. (Certainly a lot of EAs -- myself included, when I first joined the community! -- misunderstand x-risk and think it's equivalent to extinction risk.)

After googling a bit, I now think Nick Bostrom's conception of existential risk (at least as of 2012) is similar to Toby's. In https://www.existential-risk.org/concept.html, Nick divides up x-risks into the categories "human extinction, permanent stagnation, flawed realization, and subsequent ruination", and says that in a "flawed realization", "humanity reaches technological maturity" but "the amount of value realized is but a small fraction of what could have been achieved". This only makes sense as a partition of x-risks if all x-risks reduce value to "a small fraction of what could have been achieved" (or reduce the future's value to zero).

I still think that the definition of x-risk I proposed is a bit more useful, and I think it's a more natural interpretation of phrasings like "drastically curtail [Earth-originating intelligent life's] potential" and "reduce its quality of life (compared to what would otherwise have been possible) permanently and drastically". Perhaps I should use a new term, like hyperastronomical catastrophe, when I want to refer to something like 'catastrophes that would reduce the total value of the future by 5% or more'.

"Existential risk from AI" survey results

Oh, your survey also frames the questions very differently, in a way that seems important to me. You give multiple-choice questions like :

Which of these is closest to your estimate of the probability that there will be an existential catastrophe due to AI (at any point in time)?

  • 0.0001%
  • 0.001%
  • 0.01%
  • 0.1%
  • 0.5%
  • 1%
  • 2%
  • 3%
  • 4%
  • 5%
  • 6%
  • 7%
  • 8%
  • 9%
  • 10%
  • 15%
  • 20%
  • 25%
  • 30%
  • 35%
  • 40%
  • 45%
  • 50%
  • 55%
  • 60%
  • 65%
  • 70%
  • 75%
  • 80%
  • 85%
  • 90%
  • 95%
  • 100%

... whereas I just asked for a probability.

Overall, you give fourteen options for probabilities below 10%, and two options above 90%. (One of which is the dreaded-by-rationalists "100%".)

By giving many fine gradations of 'AI x-risk is low probability' without giving as many gradations of 'AI x-risk is high probability', you're communicating that low-probability answers are more normal/natural/expected.

The low probabilities are also listed first, which is a natural choice but could still have a priming effect. (Anchoring to 0.0001% and adjusting from that point, versus anchoring to 95%.) On my screen's resolution, you have to scroll down three pages to even see numbers as high as 65% or 80%. I lean toward thinking 'low probabilities listed first' wasn't a big factor, though.

"Existential risk from AI" survey results

My survey's also a lot shorter than yours, so I could imagine it filtering for respondents who are busier, lazier, less interested in the topic, less interested in helping produce good survey data, etc.

"Existential risk from AI" survey results

I have sometimes wanted to draw a sharp distinction between scenarios where 90% of humans die vs. ones where 40% of humans die; but that's largely because the risk of subsequent extinction or permanent civilizational collapse seems much higher to me in the 90% case. I don't currently see a similar discontinuity in '90% of the future lost vs. 40% of the future lost', either in 'the practical upshot of such loss' or in 'the kinds of scenarios that tend to cause such loss'. But I've also spent a lot less time about Toby thinking about the full range of x-risk scenarios.

"Existential risk from AI" survey results

Excited to have the full results of your survey released soon! :) I read a few paragraphs of it when you sent me a copy, though I haven't read the full paper.

Your "probability of an existential catastrophe due to AI" got mean 0.23 and median 0.1. Notably, this includes misuse risk along with accident risk, so it's especially striking that it's lower than my survey's Q2, "[risk from] AI systems not doing/optimizing what the people deploying them wanted/intended", which got mean ~0.401 and median 0.3.

Looking at different subgroups' answers to Q2:

  • MIRI: mean 0.8, median 0.7.
  • OpenAI: mean ~0.207, median 0.26. (A group that wasn't in your survey.)
  • No affiliation specified: mean ~0.446, median 0.35. (Might or might not include MIRI people.)
  • All respondents other than 'MIRI' and 'no affiliation specified': mean 0.278, median 0.26.

Even the latter group is surprisingly high. A priori, I'd have expected that MIRI on its own would matter less than 'the overall (non-MIRI) target populations are very different for the two surveys':

  • My survey was sent to FHI, MIRI, DeepMind, CHAI, Open Phil, OpenAI, and 'recent OpenAI'.
  • Your survey was sent to four of those groups (FHI, MIRI, CHAI, Open Phil), subtracting OpenAI, 'recent OpenAI', and DeepMind. Yours was also sent to CSER, Mila, Partnership on AI, CSET, CLR, FLI, AI Impacts, GCRI, and various independent researchers recommended by these groups. So your survey has fewer AI researchers, more small groups, and more groups that don't have AGI/TAI as their top focus.
  • You attempted to restrict your survey to people "who have taken time to form their own views about existential risk from AI", whereas I attempted to restrict to anyone "who researches long-term AI topics, or who has done a lot of past work on such topics". So I'd naively expect my population to include more people who (e.g.) work on AI alignment but haven't thought a bunch about risk forecasting; and I'd naively expect your population to include more people who have spent a day carefully crafting an AI x-risk prediction, but primarily work in biosecurity or some other area. That's just a guess on my part, though.

Overall, your methods for choosing who to include seem super reasonable to me  -- perhaps more natural than mine, even. Part of why I ran my survey was just the suspicion that there's a lot of disagreement between orgs and between different types of AI safety researcher, such that it makes a large difference which groups we include. I'd be interested in an analysis of that question; eyeballing my chart, it looks to me like there is a fair amount of disagreement like that (even if we ignore MIRI).

"Existential risk from AI" survey results

People might also cluster more if we did the exact same survey again, but asking them to look at the first survey's results.

"Existential risk from AI" survey results

Yeah, a big part of why I left the term vague is because I didn't want people to get hung up on those details when many AGI catastrophe scenarios are extreme enough to swamp those details. E.g., focusing on whether the astronomical loss threshold is 80% vs. 50% is  besides the point if you think AGI failure almost always means losing 98+% of the future's value.

I might still do it differently if I could re-run the survey, however. It would be nice to have a number, so we could more easily do EV calculations.

"Existential risk from AI" survey results

Then perhaps it's good that I didn't include my nonstandard definition of x-risk, and we can expect the respondents to be at least somewhat closer to Ord's definition.

I do find it odd to say that '40% of the future's value is lost' isn't an x-catastrophe, and in my own experience it's much more common that I've wanted to draw a clear line between '40% of the future is lost' and '0.4% of the future is lost', than between 90% and 40%. I'd be interested to hear about cases where Toby or others found it illuminating to sharply distinguish 90% and 40%.

Load More