I’ve written a draft report evaluating a version of the overall case for existential risk from misaligned AI, and taking an initial stab at quantifying the risk from this version of the threat. I’ve made the draft viewable as a public google doc here (Edit: arXiv version here, video presentation here, human-narrated audio version here). Feedback would be welcome.
This work is part of Open Philanthropy’s “Worldview Investigations” project. However, the draft reflects my personal (rough, unstable) views, not the “institutional views” of Open Philanthropy.
There are some obvious responses to my argument here, like: 'X seems likely to you because of a conjunction fallacy; we can learn from this test that X isn't likely, though it's also not vanishingly improbable.' If a claim is conjunctive enough, and the conjuncts are individually unlikely enough, then you can obviously study a question for months or years and end up ~95% confident of not-X (e.g., 'this urn contains seventeen different colors of balls, so I don't expect the ball I randomly pick to be magenta').
I worry there's possibly something rude about responding to a careful analysis by saying 'this conclusion is just too wrong', without providing an equally detailed counter-analysis or drilling down on specific premises.
(I'm maybe being especially rude in a context like the EA Forum, where I assume a good number of people don't share the perspective that AI is worth worrying even at the ~5% level!)
You mention the Multiple Stages Fallacy (also discussed here, as "the multiple-stage fallacy"), which is my initial guess as to a methodological crux behind our different all-things-considered probabilities.
But the more basic reason why I felt moved to comment here is a general worry that EAs have a track record of low-balling probabilities of AI risk and large-AI-impacts-soon in their public writing. E.g.:
Back in Sep. 2017, I wrote (based on some private correspondence with researchers):
80,000 Hours is summarizing a research field where 80+% of specialists think that there's >10% probability of existential catastrophe from event A; they stick their neck out to say that these 80+% are wrong, and in fact so ostentatiously wrong that their estimate isn't even in the credible range of estimates, which they assert to be 1-10%; and they seemingly go further by saying this is true for the superset 'severe catastrophes from A' and not just for existential catastrophes from A.'
If this were a typical technical field, that would be a crazy thing to do in a career summary, especially without flagging that that's what 80,000 Hours is doing (so readers can decide for themselves how to weight the views of e.g. alignment researchers vs. ML researchers vs. meta-researchers like 80K). You could say that AI is really hard to forecast so it's harder to reach a confident estimate, but that should widen your range of estimates, not squeeze it all into the 1-10% range. Uncertainty isn't an argument for optimism.
There are obvious social reasons one might not want to sound alarmist about a GCR, especially a weird/novel GCR. But—speaking here to EAs as a whole, since it's a lot harder for me to weigh in on whether you're an instance of this trend than for me to weigh in on whether the trend exists at all—I want to emphasize that there are large potential costs to being more quiet about "high-seeming numbers" than "low-seeming numbers" in this domain, analogous to the costs e.g. of experts trying to play down their worries in the early days of the COVID-19 pandemic. Even if each individual decision seems reasonable at the time, the aggregate effect is a very skewed group awareness of reality.