Suppose you want to argue for the longtermist importance of shaping AI development.
My key point is: you can choose to make an argument that is more specific, or more general. Specific arguments include more details about what AI development looks like and how exactly it threatens humanity’s longterm potential; general arguments include fewer. The most specific arguments specify a complete threat model.
I think this is an important distinction, and that people being more clear about what kind of argument they are making/motivated by would improve the quality of discussion about how much AI should be prioritised as a cause area, and how to prioritise between different AI-related interventions.
I'll clarify the distinction by giving examples of arguments that sit near the ends of the general vs specific spectrum.
Here's a highly specific argument:
- It's plausible that TAI will arrive this century
- If TAI arrives this century, then a scenario broadly similar to What failure looks like is plausible (see here for a summary if you’re unfamiliar)
- In this scenario, most of the value of the longterm future is lost
Here's a highly general argument:
- It's plausible that TAI will arrive this century
- If TAI arrives this century, it's probably one of our best bets for positively shaping the longterm future
Of course, many arguments fall somewhere in between these extremes. For example, Richard Ngo's AGI safety from first principles argues specifically for the plausibility of "AI takeover" (a scenario where the most consequential decisions about the future get made by AI systems with goals that aren’t desirable by human standards) - which is just one among many possible risks from advanced AI. And many of its arguments won't apply if the field of AI moves away from focusing on machine learning. But the argument isn't fully specific, because the author doesn't argue for the plausibility of any specific variety of AI takeover scenario (there are many), instead focusing on making a more general case for the plausibility of AI takeover.
Joe Carlsmith's report on existential risk from power-seeking AI is similar, and also falls somewhere between the extremes. As do arguments that appeal to the inner alignment problem being necessary for AI existential safety, but very difficult.
Why this distinction matters
Mostly, for the reason I mentioned at the beginning: when arguing about the longtermist importance of shaping AI development, I think that if people were clearer about how general/specific their arguments are/need to be, then this would improve discussion about (1) how much AI should be prioritised as a cause area, and (2) how to prioritise between different AI-related interventions.
For an example of (1): I sometimes hear people arguing against longtermist case for shaping AI development by pointing out that there’s wide disagreement and uncertainty about which risk scenarios are actually plausible, as if this is an argument against the entire case - when actually it’s just an argument against one kind of case you can make. Conversely, I’ve heard other discussions being vaguer than they should be about how exactly AI leads to existential catastrophe.
For an example of (2): for prioritising interventions within AI as a cause area, if the strongest arguments for this kind of work are very general, this suggests a broader portfolio of work than if the strongest arguments pick out a particular class of threat models.
Finally, I personally think that the strongest case that we can currently make for the longtermist importance of shaping AI development is fairly general - something along the lines of the most important century series - and yet this doesn't seem to be the "default" argument (i.e. the one presented in key EA content/fellowships/etc. when discussing AI). Instead, it seems the "default" is something more specific, focusing on the alignment problem, and sometimes even particular threat models (e.g. recursively self-improving misaligned superintelligent AI). I would really like to see this redressed.
Acknowledgments: this distinction is alluded to in this post and this podcast episode. We discussed it while working on the Survey on AI existential risk scenarios. Thanks to Alex Holness-Tofts for helpful feedback.
I agree that the general argument is the strongest one, in the sense that it is most likely to be correct / robust.
The problem with general arguments is that they tell you very little about how to solve the problem. "The climate is messed up, probably due to human activity" doesn't tell you much about what to do to fix the climate. In contrast, "On average the Earth is warming due to increased concentrations of greenhouse gases" tells you a lot more (e.g. reduce emissions of GHGs, take them out of the atmosphere, find a way of cooling the Earth to balance it out).
If I were producing key EA content/fellowships/etc, I would be primarily interested in getting people to solve the problem, which suggests a focus on specific arguments, even though the general argument is "stronger".
Note that general arguments can motivate you to learn more about the problem to develop more specific arguments, which you can then solve. (E.g. if you didn't know about the greenhouse effect, the observation that the Earth is warming can motivate you to figure out why, before attempting a solution.) So if you're trying to get people to produce novel specific arguments for AI risk, then talking about the general argument makes sense.
I think this is true for some kinds of content/fellowships/etc, but not all. For those targeted at people who aren't already convinced that AI safety/governance should be prioritised (which is probably the majority), it seems more important to present them with the strongest arguments for caring about AI safety/governance in the first place. This suggests presenting more general arguments.
Then, I agree that you want to get people to help solve the problem, which requires talking about specific failure modes. But I think that doing this prematurely can lead people to dismiss the case for shaping AI development for bad reasons.
Another way of saying this: for AI-related EA content/fellowships/etc, it seems worth separating motivation ("why should I care?") and action ("if I do care, what should I do?"). This would get you the best of both worlds: people are presented with the strongest arguments, allowing them to make an informed decision about how much AI stuff should be prioritised, and then also the chance to start to explore specific ways to solve the problem.
I think this maybe applies to longtermism in general. We don't yet have that many great ideas of what to do if longtermism is true, and I think that people sometimes (incorrectly) dismiss longtermism for this reason.
Imagine Alice, an existing AI safety researcher, having such a conversation with Bob, who doesn't currently care about AI safety:
Alice: AGI is decently likely to be built in the next century, and if it is it will have a huge impact on the world, so it's really important to deal with it now.
Bob: Huh, okay. It does seem like it's pretty important to make sure that AGI doesn't discriminate against people of color. And we better make sure that AGI isn't used in the military, or all nations will be forced to do so thanks to Moloch, and wars will be way more devastating.
Alice: Great, I'm glad you agree.
Bob: Okay, so what can we do to shape AGI?
Alice: Well, we should ensure that AGIs don't pursue goals that weren't the ones we intended, so you work on learning from human feedback.
Bob: <works on those topics>
I feel like something has gone wrong in this conversation; you have tricked Bob into working on learning from human feedback, rather than convincing him to do so. Based on how you convinced him to care, he should really be (say) advocating against the use of AI in the military.
(This feels very similar to a motte and bailey, where the motte is "AI will be a huge deal so we should influence it" and the bailey is "you should be working on learning from human feedback".)
I think it's more accurate to say that you've answered "why should I care about X?" and "if I do care about Y, what should I do?", without noticing that X and Y are actually different.
(Apologies for my very slow reply.)
I agree with this. If people become convinced to work on AI stuff by specific argument X, then they should definitely go and try to fix X, not something else (e.g. what other people tell them needs doing in AI safety/governance).
I think when I said I wanted a more general argument to be the "default", I was meaning something very general, that doesn't clearly imply any particular intervention - like the one in the most important century series, or the "AI is a big deal" argument (I especially like Max Daniel's version of this).
Then, it's very important to think clearly about what will actually go wrong, and how to actually fix that. But I think it's fine to do this once you're already convinced that you should work on AI, by some general argument.
I'd be really curious if you still disagree with this?
I agree with that, and that's what I meant by this statement above:
The "highly specific argument" references Christiano's "What failure looks like," which describes two different scenarios that still leave a bunch of things to our imagination.
The point being, on the spectrum from general to specific, I'd argue that we aren't yet at at the stage where we can make "highly specific" arguments with much confidence. (Partly because few people are conceptualizing TAI scenarios in much detail and because the ones that do don't seem to reach much of a consensus.)
I think "not having consensus over highly specific arguments for the longtermist importance of AI governance" is a challenging situation to be in because a lot of potential efforts are bottlenecked by figuring out what area of the problem is the most urgent to address.
(All of this mostly underscores what you're saying, and it's an obvious point. Still, I thought I'd make the comment because I squinted my eyes at the phrase "here's a highly specific argument.")
Thanks, I agree with all of this