I’ve written a draft report evaluating a version of the overall case for existential risk from misaligned AI, and taking an initial stab at quantifying the risk from this version of the threat. I’ve made the draft viewable as a public google doc here (Edit: arXiv version here, video presentation here, human-narrated audio version here). Feedback would be welcome.
This work is part of Open Philanthropy’s “Worldview Investigations” project. However, the draft reflects my personal (rough, unstable) views, not the “institutional views” of Open Philanthropy.
It's great to see a new examination of what the core AI risk argument is (or should be). I like the focus on "power-seeking", and I think this is a clearer term than "influence-seeking".
I want to articulate a certain intuition that's pinging me. You write:
You also treat this as ~equivalent to:
This is equivalent to saying you're ~95% confident there won't be such a disaster between now and 2070. This seems like an awful lot of confidence to me!
(For the latter probability, you say that you'd "probably bump this up a bit [from 5%] -- maybe by a percentage point or two, though this is especially unprincipled (and small differences are in the noise anyway) -- to account for power-seeking scenarios that don’t strictly fit all the premises above". This still seems like an awful lot of confidence to me!)
To put my immediate reaction into words: From my perspective, the world just looks like the kind of world where "existential catastrophe from misaligned, power-seeking AI by 2070" is true. At least, that seems like the naive extrapolation I'd make if no exciting surprises happened (though I do think there's a decent chance of exciting surprises!).
If the proposition is true, then it's very important to figure that out ASAP. But if the current evidence isn't enough to raise your probability above ~6%, then what evidence would raise it higher? What would a world look like where this claim was obviously true, or at least plausibly true, rather than being (with ~94% confidence) false?
Another way of stating my high-level response: If the answer to a question is X, and you put a lot of work into studying the question and carefully weighing all the considerations, then the end result of your study shouldn't look like '94% confidence not-X'. From my perspective, that's beyond the kind of mistake you should make in any ordinary way, and should require some mistake in methodology.
(Caveat: this comment is my attempt to articulate a different framing than what I think is the more common framing in public, high-visibility EA writing. My sense is that the more common framing is something like "assigning very-high probabilities to catastrophe is extreme, assigning very-low probabilities is conservative". For a full version of my objection, it would be important that I go into the details of your argument rather than stopping here.)
Great comment :)