I’ve written a draft report evaluating a version of the overall case for existential risk from misaligned AI, and taking an initial stab at quantifying the risk from this version of the threat. I’ve made the draft viewable as a public google doc here (Edit: arXiv version here, video presentation here, human-narrated audio version here). Feedback would be welcome.
This work is part of Open Philanthropy’s “Worldview Investigations” project. However, the draft reflects my personal (rough, unstable) views, not the “institutional views” of Open Philanthropy.
A pattern I think I've seen with a fair number of EAs is that they'll start with a pretty well-calibrated impression of how serious AGI risk is; but then they'll worry that if they go around quoting a P(doom) like "25%" or "70%" (especially if the cause is something as far-fetched as AI), they'll look like a crackpot. So the hypothetical EA tries to find a way to justify a probability more like 1-10%, so they can say the moderate-sounding "AI disaster is unlikely, but the EV is high", rather than the more crazy-sounding "AI disaster is likely".
This obviously isn't the only reason people assign low probabilities to AI x-catastrophe, and I don't at all know whether that pattern applies here (and I haven't read Joe's replies here yet); and it's rude to open a conversation by psychologizing. Still, I wanted to articulate some perspectives from which there's less background pressure to try to give small probabilities to crazy-sounding scenarios, on the off chance that Joe or some third party found it helpful:
The latter two points especially are what I was trying (and probably failing) to communicate with "'existential catastrophe from misaligned, power-seeking AI by 2070' is true."
Define a 'science AGI' system as one that can match top human thinkers in at least two big ~unrelated hard-science fields (e.g., particle physics and organic chemistry).
If the first such systems are roughly as opaque as 2020's state-of-the-art ML systems (e.g., GPT-3) and the world order hasn't already been upended in some crazy way (e.g., there isn't a singleton), then I expect an AI-mediated existential catastrophe with >95% probability.
I don't have an unconditional probability that feels similarly confident/stable to me, but I think those two premises have high probability, both individually and jointly. This isn't the same proposition Joe was evaluating, but it maybe illustrates why I have a very different high-level take on "probability of existential catastrophe from misaligned, power-seeking AI".