Predict responses to the "existential risk from AI" survey

by RobBensinger3 min read28th May 20218 comments

36

AI forecastingEstimation of existential riskAI alignment
Frontpage

I sent a short survey to ~117 people working on long-term AI issues, asking about the level of existential risk from AI; 44 responded.

In ~6 days, I'm going to post the anonymized results. For now, I'm posting the methods section of my post so anyone interested can predict what the results will be.

[Added June 1: Results are now up, though you can still make predictions below before reading the results.]

 

Methods

You can find a copy of the survey here. The main questions (including clarifying notes) were:

1. How likely do you think it is that the overall value of the future will be drastically less than it could have been, as a result of humanity not doing enough technical AI safety research?

2. How likely do you think it is that the overall value of the future will be drastically less than it could have been, as a result of AI systems not doing/optimizing what the people deploying them wanted/intended?

_________________________________________

Note A: "Technical AI safety research" here means good-quality technical research aimed at figuring out how to get highly capable AI systems to produce long-term outcomes that are reliably beneficial.

Note B: The intent of question 1 is something like "How likely is it that our future will be drastically worse than the future of an (otherwise maximally similar) world where we put a huge civilizational effort into technical AI safety?" (For concreteness, we might imagine that human whole-brain emulation tech lets you gather ten thousand well-managed/coordinated top researchers to collaborate on technical AI safety for 200 subjective years well before the advent of AGI; and somehow this tech doesn't cause any other changes to the world.)

The intent of question 1 *isn't* "How likely is it that our future will be astronomically worse than the future of a world where God suddenly handed us the Optimal, Inhumanly Perfect Program?". (Though it's fine if you think the former has the same practical upshot as the latter.)

Note C: We're asking both 1 and 2 in case they end up getting very different answers. E.g., someone might give a lower answer to 1 than to 2 if they think there's significant existential risk from AI misalignment even in worlds where humanity put a major civilizational effort (like the thousands-of-emulations scenario) into technical safety research.

I also included optional fields for "Comments / questions / objections to the framing / etc." and "Your affiliation", and asked respondents to

Check all that apply:

☐ I'm doing (or have done) a lot of technical AI safety research.

☐ I'm doing (or have done) a lot of governance research or strategy analysis related to AGI or transformative AI.

I sent the survey out to two groups directly: MIRI's research team, and people who recently left OpenAI (mostly people suggested by Beth Barnes of OpenAI). I sent it to five other groups through org representatives (who I asked to send it to everyone at the org "who researches long-term AI topics, or who has done a lot of past work on such topics"): OpenAI, the Future of Humanity Institute (FHI), DeepMind, the Center for Human-Compatible AI (CHAI), and Open Philanthropy.

The survey ran for 23 days (May 3–26), though it took time to circulate and some people didn't receive it until May 17.

 

Results

[Image redacted]

Each point is a response to Q1 (on the horizontal axis) and Q2 (on the vertical axis). Circles denote technical safety researchers, squares strategy researchers; triangles said they were neither, and diamonds said they were both.

Purple represents OpenAI, red FHI, brown DeepMind, green CHAI or UC Berkeley, orange MIRI, blue Open Philanthropy, and black "no affiliation specified". (This includes unaffiliated people, as well as people who decided to leave their affiliation out.)

[Rest of post redacted]

 

 

Added: I've included some binary predictions below on request, though I don't necessarily think these are the ideal questions to focus on. E.g., I expect it might be more useful to draw a rough picture of what you expect the distribution to look like (or, say, what you expect the range of MIRI views is, or the range of governance/strategy researchers' views).

 

Q1:

Q2:

 

 

(Cross-posted to LessWrong)

36

8 comments, sorted by Highlighting new comments since Today at 4:27 PM
New Comment

I'll share some low-confidence answers, plus some reasoning. 

1. How likely do you think it is that the overall value of the future will be drastically less than it could have been, as a result of humanity not doing enough technical AI safety research?

My own answer: ~4% (note: this is my all-things-considered belief, not just my independent impression)

Predicted mean survey answer: 14%

Predicted median survey answer: 6%

---

Reasoning for my own answer:

  • I previously wrote "Conditional on TAI being developed and deployed someday (which rules out e.g. it being impossible or an x-catastrophe occurring before then), I fairly arbitrarily estimate a ~10% chance of that precipitating an existential catastrophe."
    • (That phrasing implies that AI could only cause existential catastrophe once TAI is developed and deployed. I think that this is misleading, though perhaps technically true, in the sense that any AI that causes an x-catastrophe is thereby as transformative as the industrial revolution.)
    • And I also wrote: "I had a very vague sense that there was a 2% chance of x-catastrophe from anything other than TAI by 2055. This was based on basically nothing. Maybe I was just trying to be broadly consistent with my other views and with e.g. Ord's views, but without checking in detail what consistency should entail."
      • I can't remember if this was conditioning on there being no TAI-induced x-catastrophe by then, but I think it implicitly was
    • I'll defer to that past thinking of mine
    • This implies something very roughly like a 7% existential risk from AI specifically (accounting for the chance that it's for some reason impossible to build TAI or that an x-catastrophe occurs before it's built - which need not be before 2055)
  • My interpretation of this question captures only a subset of total x-risk from AI
    • "Drastically less" implies existential catastrophe, not just a more minor trajectory change, so the answer has to be equal to or less than total x-risk from AI
    • And I think existential catastrophe from AI could occur even if we put a huge civilizational effort into technical AI safety
      • E.g., we could have an aligned AI but then it's misused (including in ways most humans are happy with but that are still morally horrible or squander our potential)
      • Before reading Note B, I interpreted the question as "How likely do you think it is that the overall value of the future will be drastically less than it could have been, with the key reason being that humanity didn't do enough of the right kinds of technical AI safety research?" For that I said 2.5%.
        • Note B made me change my answer to 4%, and also makes me feel that the question is a bit weird.
        • If we had ten thousand well managed/coordinated top researchers collaborating on technical AI safety for 200 subjective years, it seems like they'd end up just also doing a lot of moral philosophy, political science, AI governance, etc. And if we say they have to stick to technical AI safety, they'll just find ways to do the other things but make it look like technical AI safety. I think fairly early on they'll notice that the biggest remaining issues aren't really technical AI safety issues, and that it'd be crazy to just keep going further and further on the technical stuff.

---

Reasoning for my predicted survey result:

  • My impression is that people at MIRI would probably have a mean x-risk from AI estimate of ~50%, while people at the other places you mentioned would have a mean estimate of ~10% and a median of 8%.
  • With (even) less confidence, I'd say people at MIRI would give a mean of 40% to question 1, and people elsewhere would give a mean of 7% and a median of 5%.
  • Maybe the survey selects from more pessimistic or more optimistic people than average. But I didn't try to account for that.
  • I'm guessing MIRI people will be something like a quarter of your respondents.
  • This suggests the mean survey response would be ~17.5% (40*0.25 + 10*0.75)
  • It also suggests the median may be close to the median of the non-MIRI people, i.e. close to 5%.

---

I notice that my all-things-considered belief is decently far from what I predict survey respondents will say, even though I expect survey respondents will know much more about AI x-risk and what technical AI safety research could achieve than I do. This feels a bit weird. 

But I think it's less that I'm very confident in my independent impressions / inside-views here, and more that I think the survey will overweight MIRI and (less importantly) that I also defer to people who don't research long-term AI topics specifically. (To be clear, I don't mean I trust MIRI's judgement on this less than I trust each other group's judgement, just that I give them less than a third as much weight as all of the other mentioned groups combined.)

---

...I realised at this point I'd become nerd-sniped, and so forbade myself from doing question 2.

Thanks for registering your predictions, Michael!

Predicted mean survey answer: 14%

Predicted median survey answer: 6%

Results (hover to read):

 Mean answer for Q1 was ~30.1%, median answer 20%.

  • My impression is that people at MIRI would probably have a mean x-risk from AI estimate of ~50%, while people at the other places you mentioned would have a mean estimate of ~10% and a median of 8%.

Looking only at people who declared their affiliation: MIRI people's mean probability for x-catastrophes from "AI systems not doing/optimizing what the people deploying them wanted/intended" was 80%  (though I'm not sure this is what you mean by "x-risk from AI" here), with median 70%.

People who declared a non-MIRI affiliation had a mean Q2 probability of 27.8%, median 26%.

  • With (even) less confidence, I'd say people at MIRI would give a mean of 40% to question 1, and people elsewhere would give a mean of 7% and a median of 5%.

For Q1, MIRI-identified people gave mean 70% (and median 80%). Non-MIRI-identified people gave mean ~18.7%, median 10%.

  • I'm guessing MIRI people will be something like a quarter of your respondents.

5/27 of respondents who specified an affiliation said they work at MIRI (~19%). (By comparison, 17/~117 ~= 15% of recipients work at MIRI.)

Interesting, thanks!

(I've added some ruminations on my failings and confusions in a comment on your results post.)

This is a total nerd-snipe, but I feel like I'm missing information about how strong selection effects are (i.e., did only people sympathetic to AI safety answer the survey? Was it only sent to people within those organizations who are sympathetic?)

That said, I'm guessing an average of around 20% for both questions, both widely spread. For instance, one could have 15% for the first question and 30% for the second question.  I'll be surprised if either question is sub-10% or above 60%.

Time taken to think about this: Less than half an hour. I tried to divide this organization by organization, but then realized uncertainties about respondent affiliation were too wide for that to be very meaningful.

SPOILER: My predictions for the mean answers from each org. The first number is for Q2, the second is for Q1 (EDIT: originally had the order of the questions wrong):

OpenAI: 15%, 11%
FHI: 11%, 7%
DeepMind: 8%, 6%
CHAI/Berkeley: 18%, 15%
MIRI: 60%, 50%
Open Philanthropy: 8%, 6%

Survey results for Q2, Q1 (hover for spoilers):

OpenAI: ~21%, ~13%

FHI: ~27%, ~19%

DeepMind: (no respondents declared this affiliation)

CHAI/Berkeley: 39%, 39%

MIRI: 80%, 70%

Open Philanthropy: ~35%, ~16%

Thanks for this Rob--I was going to post this myself but you beat me to it :)


Also, wow--I was systematically wrong. I think my (relative) x-risk optimism affected my predictions majorly.

I've added six prediction interfaces, for people to give their own probability for each Q, their guess at the mean survey respondent answer, and their guess at the median answer.