SamClarke

SamClarke's Posts

Sorted by New

SamClarke's Comments

My personal cruxes for working on AI safety

On crux 4: I agree with your argument that good alignment solutions will be put to use, in worlds where AI risk comes from AGI being an unbounded maximiser. I'm less certain that they would be in worlds where AI risk comes from structural loss of control leading to influence-seeking agents (the world still gets better in Part I of the story, so I'm uncertain whether there would be sufficient incentive for corporations to use AIs aligned with complex values rather than AIs aligned with profit maximisation).

Do you have any thoughts on this or know if anyone has written about it?

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

Thanks for the reply! Could you give examples of:

a) two agendas that seem to be "reflecting" the same underlying problem despite appearing very different superficially?

b) a "deep prior" that you think some agenda is (partially) based on, and how you would go about working out how deep it is?

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

My sense of the current general landscape of AI Safety is: various groups of people pursuing quite different research agendas, and not very many explicit and written-up arguments for why these groups think their agenda is a priority (a notable exception is Paul's argument for working on prosaic alignment). Does this sound right? If so, why has this dynamic emerged and should we be concerned about it? If not, then I'm curious about why I developed this picture.

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

What do you think are the biggest mistakes that the AI Safety community is currently making?

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

Paul Christiano is a lot more optimistic than MIRI about whether we could align a Prosaic AGI. In a relatively recent interview with AI Impacts he said he thinks "probably most of the disagreement" about this lies in the question of "can this problem [alignment] just be solved on paper in advance" (Paul thinks there's "at least a third chance" of this, but suggests MIRI's estimate is much lower). Do you have a sense of why MIRI and Paul disagree so much on this estimate?