Thanks Rob, interesting question. Here are the correlation coefficients between pairs of scenarios (sorted from max to min):
So it looks like there are only weak correlations between some scenarios.
It's worth bearing in mind that we asked respondents not to give an estimate for any scenarios they'd thought about for less than 1 hour. The correlations could be stronger if we didn't have this requirement.
I helped run the other survey mentioned , so I'll jump in here with the relevant results and my explanation for the difference. The full results will be coming out this week.
We asked participants to estimate the probability of an existential catastrophe due to AI (see definitions below). We got:
Our question isn't directly comparable with Rob's, because we don't condition on the catastrophe being "as a result of humanity not doing enough technical AI safety research" or "as a result of AI systems not doing/optimizing what the people deploying them wanted/intended". However, that means that our results should be even higher than Rob's.
Also, we operationalise existential catastophe/risk differently, though I think the operationalisations are similar to the point that they wouldn't effect my estimate. Nonetheless:
I think it's probably a combination of things, including this difference in operationalisation, random noise, and Rob's suggestion that "respondents who were following the forum discussion might have been anchored in some way by that discussion, or might have had a social desirability effect from knowing that the survey-writer puts high probability on AI risk. It might also have made a difference that I work at MIRI."
I can add a bit more detail to how it might have made a difference that Rob works at MIRI:
Define an existential catastrophe as the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development (Bostrom, 2013).
Define an existential catastrophe due to AI as an existential catastrophe that could have been avoided had humanity's development, deployment or governance of AI been otherwise. This includes cases where:AI directly causes the catastrophe.AI is a significant risk factor in the catastrophe, such that no catastrophe would have occurred without the involvement of AI.Humanity survives but its suboptimal use of AI means that we fall permanently and drastically short of our full potential.
Define an existential catastrophe due to AI as an existential catastrophe that could have been avoided had humanity's development, deployment or governance of AI been otherwise. This includes cases where:
We also asked participants to estimate the probability of an existential catastrophe due to AI under two other conditions.
Within the next 50 years
In a counterfactual world where AI safety and governance receive no further investment or work from people aligned with the ideas of “longtermism”, “effective altruism” or “rationality” (but there are no other important changes between this counterfactual world and our world, e.g. changes in our beliefs about the importance and tractability of AI risk issues).
I'm curious what you think would count as a current ML model 'intentionally' doing something? It's not clear to me that any currently deployed ML models can be said to have goals.
To give a bit more context on what I'm confused about: the model that gets deployed is the one that does best at minimising the loss function during training. Isn't Russell's claim that a good strategy for minimising the loss function is to change users' preferences? Then, whether or not the model is 'intentionally' radicalising people is beside the point
(I find talk about the goals of AI systems pretty confusing, so I could easily be misunderstanding, or wrong about something)
FYI, broken link here:
I think my views on this are pretty similar to those Beckstead expresses here
On crux 4: I agree with your argument that good alignment solutions will be put to use, in worlds where AI risk comes from AGI being an unbounded maximiser. I'm less certain that they would be in worlds where AI risk comes from structural loss of control leading to influence-seeking agents (the world still gets better in Part I of the story, so I'm uncertain whether there would be sufficient incentive for corporations to use AIs aligned with complex values rather than AIs aligned with profit maximisation).
Do you have any thoughts on this or know if anyone has written about it?
Thanks for the reply! Could you give examples of:
a) two agendas that seem to be "reflecting" the same underlying problem despite appearing very different superficially?
b) a "deep prior" that you think some agenda is (partially) based on, and how you would go about working out how deep it is?
My sense of the current general landscape of AI Safety is: various groups of people pursuing quite different research agendas, and not very many explicit and written-up arguments for why these groups think their agenda is a priority (a notable exception is Paul's argument for working on prosaic alignment). Does this sound right? If so, why has this dynamic emerged and should we be concerned about it? If not, then I'm curious about why I developed this picture.
What do you think are the biggest mistakes that the AI Safety community is currently making?
Paul Christiano is a lot more optimistic than MIRI about whether we could align a Prosaic AGI. In a relatively recent interview with AI Impacts he said he thinks "probably most of the disagreement" about this lies in the question of "can this problem [alignment] just be solved on paper in advance" (Paul thinks there's "at least a third chance" of this, but suggests MIRI's estimate is much lower). Do you have a sense of why MIRI and Paul disagree so much on this estimate?