I’ve written a draft report evaluating a version of the overall case for existential risk from misaligned AI, and taking an initial stab at quantifying the risk from this version of the threat. I’ve made the draft viewable as a public google doc here (Edit: arXiv version here, video presentation here, human-narrated audio version here). Feedback would be welcome.
This work is part of Open Philanthropy’s “Worldview Investigations” project. However, the draft reflects my personal (rough, unstable) views, not the “institutional views” of Open Philanthropy.
One thing that I think would really help me read this document would be (from Joe) a sense of "here's the parts where my mind changed the most in the course of this investigation".
Something like (note that this is totally made up) "there's a particular exploration of alignment where I had conceptualized it as kinda like about making the AI think right but now I conceptualize it as about not thinking wrong which I explore in section a.b.c".
Also maybe something like a sense of which of the premises Joe changed his mind on the most – where the probabilities shifted a lot.
Great answer, thanks.