On crux 4: I agree with your argument that good alignment solutions will be put to use, in worlds where AI risk comes from AGI being an unbounded maximiser. I'm less certain that they would be in worlds where AI risk comes from structural loss of control leading to influence-seeking agents (the world still gets better in Part I of the story, so I'm uncertain whether there would be sufficient incentive for corporations to use AIs aligned with complex values rather than AIs aligned with profit maximisation).
Do you have any thoughts on this or know if anyone has written about it?
Thanks for the reply! Could you give examples of:
a) two agendas that seem to be "reflecting" the same underlying problem despite appearing very different superficially?
b) a "deep prior" that you think some agenda is (partially) based on, and how you would go about working out how deep it is?
My sense of the current general landscape of AI Safety is: various groups of people pursuing quite different research agendas, and not very many explicit and written-up arguments for why these groups think their agenda is a priority (a notable exception is Paul's argument for working on prosaic alignment). Does this sound right? If so, why has this dynamic emerged and should we be concerned about it? If not, then I'm curious about why I developed this picture.
What do you think are the biggest mistakes that the AI Safety community is currently making?
Paul Christiano is a lot more optimistic than MIRI about whether we could align a Prosaic AGI. In a relatively recent interview with AI Impacts he said he thinks "probably most of the disagreement" about this lies in the question of "can this problem [alignment] just be solved on paper in advance" (Paul thinks there's "at least a third chance" of this, but suggests MIRI's estimate is much lower). Do you have a sense of why MIRI and Paul disagree so much on this estimate?