[I originally wrote this as a Facebook post, but I'm cross-posting here in case anybody finds it useful.]

Here's my current overview of the AI x-risk debate, along with a very short further reading list:

At a *very* overly simplified but I think still useful level, it looks to me like there are basically three "camps" for how experts relate to AI x-risks. I'll call the three camps "doomers", "worriers", and "dismissers". (Those terms aren't original to me, and I hope the terminology doesn't insult anybody - apologies if it does.)

1) Doomers: These are people who think we are almost certainly doomed because of AI. Usually this is based on the view that there is some "core" or "secret sauce" to intelligence that for example humans have but chimps don't. An AI either has that kind of intelligence or it doesn't - it's a binary switch. Given our current trajectory it looks entirely possible that we will at some point (possibly by accident) develop AIs with that kind of intelligence, at which point the AI will almost immediately become far more capable than humans because it can operate at digital speeds, copy itself very quickly, read the whole internet, etc. On this view, all current technical alignment proposals are doomed to fail because they only work on AIs without the secret sauce, and they'll completely fall apart for AIs with the secret sauce because those AIs will be fundamentally different than previous systems. We currently have no clue how to get a secret-sauce-type-AI to be aligned in any way, so it will almost certainly be misaligned by default. If we suddenly find ourselves confronted with a misaligned superintelligence of this type, then we are almost certainly doomed. The only way to prevent this given the state of current alignment research is to completely stop all advanced AI research of the type that could plausibly lead to secret-sauce-type-AGIs until we completely solve the alignment problem.

People in this camp often have very high confidence that this model of the world is correct, and therefore give very high estimates for "P(doom)", often >95% or even >99%. Prominent representatives of this view include Eliezer Yudkowsky and Connor Leahy.

For a good, detailed presentation of this view, see An artificially structured argument for expecting AGI ruin by Rob Bensinger.

[EDIT: Another common reason for being a Doomer is if you have really short timelines (i.e., you think we're going to hit AGI very soon), by default you think it'll be misaligned and take over, and because of short timelines you think we won't have time to figure out how to prevent this. You could of course also be a Doomer if you are just very pessimistic that humanity will solve the alignment problem even if we do have more time. But my impression is that most Doomers have such high P(doom) estimates mainly because they have very short timelines and/or because they subscribe to something like the secret sauce of intelligence theory.]

2) Worriers: These people often given a wide variety of reasons for why very advanced AI might lead to existential catastrophe. Reasons range from destabilizing democracy and the world order, to enabling misuse by bad actors, to humans losing control of the world economy, to misaligned rogue AIs deliberately taking over the world and killing everybody. Worriers might also think that the doomer model is entirely plausible, but they might not be as confident that it is correct.

Worriers often give P(doom) estimates ranging anywhere from less than 0.1% to more than 90%. Suggestions for what to do about it also vary widely. In fact, suggestions vary so widely that they often contradict each other: For example, some worriers think pushing ahead with AGI research is the best thing to do, because that's the only way they think we can develop the necessary tools for alignment that we'll need later. Others vehemently disagree and think that pushing ahead with AGI research is reckless and endangers everybody.

I would guess that the majority of people working on AGI safety or policy today fall into this camp.

Further reading for this general point of view:

- Hendrycks, et al, An Overview of Catastrophic AI Risks

- Yoshua Bengio, FAQ on Catastrophic AI Risks

(Those sources have lots of references you can look up for more detail on particular subtopics.)

3) Dismissers: People in this camp say we shouldn't worry at all about AGI x-risk and that it shouldn't factor at all into any sort of policy proposals. Why might someone say this? Here are several potential reasons:

a) AGI of the potentially dangerous type is very far away (and we are very confident of this), so there's no point doing anything about it now. See for example this article.

b) The transition from current systems to the potentially dangerous type will be sufficiently gradual that society will have plenty of time to adjust and take the necessary steps to ensure safety (and we are very confident of this).

c) Alignment / control will be so easy that it'll be solved by default, no current interventions necessary. Yann LeCunn seems to fall into this category.

d) Yeah maybe it's potentially a big problem, but I don't like any of the proposed solutions because they all have tradeoffs and the proposed solutions are worse than the problems they seek to address. I think a lots of dismissers fall into this category, including for example many of those who argue against any sort of government intervention on principle, or people who say that focusing on x-risk distracts from current harms.

e) Some people seem to have a value system where actually AGI taking over and maybe killing everybody isn't actually such a bad thing because it's the natural evolution of intelligence, or something like that.

There are also people who claim that they have an epistemology where they only ever worry about risks that are rigorously based in lots of clear scientific evidence, or something along those lines. I don't understand this perspective at all though, for reasons nicely laid out by David Krueger here.

Part of my frustration with the general conversation on this topic is that people on all sides of the discussion often seem to talk past each other, use vague arguments, or (frequently) opt for scoring rhetorical points for their team over actually stating their views or making reasoned arguments.

For a good overview of the field similar to this post but better written and with a bit more on the historical background, see A Field Guide to AI Safety by Kelsey Piper.

If you want to get into more detail on any of this, check out stampy.ai or any of these free courses:

- ML Safety

- AI Safety Fundamentals - Alignment

- AI Safety Fundamentals - Governance





More posts like this

Sorted by Click to highlight new comments since:

There are other things that differentiate the camps beyond technical views, how much you buy 'civilizational inadequacy' vs viewing that as a consequence of sleepwalk bias, but one way to cash this out is if you're in the green/yellow&red/black zones on the scale of alignment difficulty, Dismissers are in the green (although they shouldn't be imo even given that view), Worriers are in the yellow/red and Doomers in black (and maybe the high end of red).

I don't think "secret sauce" is a necessary ingredient for the "doomer" view. Indeed, Connor Leahy is so worried precisely because he thinks that there is no secret sauce left (see reference to "General Cognition Engines" here)! I'm also now in this camp, and think, post-GPT-4, there is reason to freak out because all that is basically needed is more data and compute (money) to get to AGI, and the default outcome of AGI is doom.

Fair. I suppose there are actually two paths to being a doomer (usually): secret sauce theory or extremely short timelines.

Looking over that comment, I realize I don't think I've seen anybody else use the term "secret sauce theory", but I like it. We should totally use that term going forward. :)

I'm not sure if the secret sauce adds anything re doomerism. Many non-doomers are arguably non-doomers because they think the secret sauce makes the AI humanlike enough that things will be fine by default - the AI will automatically be moral, "do the right thing", "see reason", or "clearly something intelligent would realise killing everyone to make paperclips is stupid" or something (and I think this kind of not-applying-the-Copernican-Revolution-to-mindspace is really dangerous).

Who is a worrier but thinks that pushing ahead with AGI research is good? I've never seen anyone suggest that.

Arguably Sam Altman?

I imagine a reasonable fraction of people at the top AI labs, especially Anthropic, believes this.


Is there a camp of people who think that there is no "secret sauce"? For example, they might argue that ChatGPT is basically how intelligence works, when something like that is placed in the human body we collectively do all the wonders of the modern world. In chimps, not so much maybe because they can't talk. In computers, they get the ability to chat like us. There is no secret sauce, just statistical pattern matching wired to a set of biological or digital APIs.

Many people who worry about AI x-risk believe some variation of this.

Curated and popular this week
Relevant opportunities