Summary
I believe that advanced AI systems will likely be aligned with the goals of their human operators, at least in a narrow sense. I’ll give three main reasons for this:
- The transition to AI may happen in a way that does not give rise to the alignment problem as it’s usually conceived of.
- While work on the alignment problem appears neglected at this point, it’s likely that large amounts of resources will be used to tackle it if and when it becomes apparent that alignment is a serious problem.
- Even if the previous two points do not hold, we have already come up with a couple of smart approaches that seem fairly likely to lead to successful alignment.
This argument lends some support to work on non-technical interventions like moral circle expansionor improving AI-related policy, as well as work on special aspects of AI safety like decision theory or worst-case AI safety measures.
Yeah, it's also much lower than my inside view, as well as what I thought a group of such interviewees would say. Aside from Lukas's explanation, I think maybe 1) the interviewees did not want to appear too alarmist (either personally or for EA as a whole) or 2) they weren't reporting their inside views but instead giving their estimates after updating towards others who have much lower risk estimates. Hopefully Robert Wiblin will see my email at some point and chime in with details of how the 1-10% figure was arrived at.