My understanding is that most AI safety work that plausibly reduces some s-risks may reduce extinction risks as well, and I'm thinking that some futures where we go extinct because of AI (especially with a single AI taking over) wouldn't involve astronomical suffering, if the AI has no (or sufficiently little) interest in consciousness or suffering, whether
- terminally,
- because consciousness or suffering is useful to some goal (e.g. it might simulate suffering incidentally or for the value of information), or
- because there are other agents who care about suffering it has to interact with or whose preferences it should follow (they could all be gone, ruling out s-risks from conflicts).
I am interested in how people are weighing (or defeating) these considerations against the s-risk reduction they expect from (particular) AI safety work.
EDIT: Summarizing:
- AI safety work (including s-risk-focused work) also reduces extinction risk.
- Reducing extinction risk increases some s-risks, especially non-AGI-caused s-risks, but also possibly AGI-caused s-risks.
So AI safety work may increase s-risks, depending on tradeoffs.
I think that's an important question. Here are some thoughts (though I think this topic deserves a much more rigorous treatment):
Creating an AGI with an arbitrary goal system (that is potentially much less satiable than humans') and arbitrary game theoretical mechanisms—via an ML process that can involve an arbitrary amount of ~suffering/disutility—generally seems very dangerous. Some of the relevant considerations are weird and non-obvious. For example, creating such an arbitrary AGI may constitute wronging some set of agents across the multiverse (due to the goal system & game theoretical mechanisms of that AGI).
I think there's also the general argument that, due to cluelessness, trying to achieve some form of a vigilant Long Reflection process is the best option on the table, including by the lights of suffering-focused ethics (e.g. due to weird ways in which resources could be used to reduce suffering across the multiverse via acausal trading). Interventions that mitigate x-risks (including AI-related x-risks) seem to increase the probability that humanity will achieve such a Long Reflection process.
Finally, a meta point that seems important: People in EA who have spent a lot of time on AI safety (including myself), or even made it their career, probably have a motivated reasoning bias towards the belief that working on AI safety tends to be net-positive.