Summary
I believe that advanced AI systems will likely be aligned with the goals of their human operators, at least in a narrow sense. I’ll give three main reasons for this:
- The transition to AI may happen in a way that does not give rise to the alignment problem as it’s usually conceived of.
- While work on the alignment problem appears neglected at this point, it’s likely that large amounts of resources will be used to tackle it if and when it becomes apparent that alignment is a serious problem.
- Even if the previous two points do not hold, we have already come up with a couple of smart approaches that seem fairly likely to lead to successful alignment.
This argument lends some support to work on non-technical interventions like moral circle expansionor improving AI-related policy, as well as work on special aspects of AI safety like decision theory or worst-case AI safety measures.
Great point – I agree that it would be value to have a common scale.
I'm a bit surprised by the 1-10% estimate. This seems very low, especially given that "serious catastrophe caused by machine intelligence" is broader than narrow alignment failure. If we include possibilities like serious value drift as new technologies emerge, or difficult AI-related cooperation and security problems, or economic dynamics riding roughshod over human values, then I'd put much more than 10% (plausibly more than 50%) on something not going well.
Regarding the "other thoughtful people" in my 80% estimate: I think it's very unclear who exactly one should update towards. What I had in mind is that many EAs who have thought about this appear to not have high confidence in successful narrow alignment (not clear if the median is >50%?), judging based on my impressions from interacting with people (which is obviously not representative). I felt that my opinion is quite contrarian relative to this, which is why I felt that I should be less confident than the inside view suggests, although as you say it's quite hard to grasp what people's opinions actually are.
On the other hand, one possible interpretation (but not the only one) of the relatively low level of concern for AI risk among the larger AI community and societal elites is that people are quite optimistic that "we'll know how to cross that bridge once we get to it".