In my mind, interventions against s-risks from AI seem like the impartial[1] top priority of our time, being more tractable[2], important[3], and neglected[4] than alignment. Hence I’m surprised that they’re not as central as alignment to discussions of AI safety. This is a quick-and-dirty post to try to understand why so few people in the wider EA and AI safety community prioritize s-risks. (It’s a long-form version of this tweet.)

I’ll post a few answers of my own and, in some cases, add why I don’t think they are true. Please vote on the answers that you think apply or add your own.

I don’t expect to reach many people with this question, so please interpret the question as “Why do so few EAs/LWians care about s-risks from AI?” and not just “Why don’t you care about s-risks from AI?” So as a corollary, please feel free to respond even if you personally do care about s-risks!

(Here are some ways to learn more: “Coordination Challenges for Preventing AI Conflict,” “Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda,” and Avoiding the Worst (and

  1. ^

    Some people have a particular idea for how to solve alignment and so have a strong personal fit for alignment research. Thank you for everything you’re doing! Please continue. This post is not for you. 

    But many others seem resigned, seem to have given up hope in affecting how it all will play out. I don’t think that’s necessary!

  2. ^

    Tractability. With alignment we always try to align an AI with something that at least vaguely or indirectly resembles human values. So we’ll make an enemy of most of the space of possible values. We’re in an adversarial game that we’re almost sure to lose. Our only winning hand is that we’re early compared to the other agents, but just by a decade or two.

    Maybe it’s just my agreeableness bias speaking, but I don’t want to be in an adversarial game with most superintelligences. Sounds hopeless.

    That’s related to the deployment problem. If existing agents don’t want to be aligned, you have a deployment problem. (And you have to resort to morally ambiguous and highly intractable solutions like pivotal acts and long reflections to solve it.) If you have something to offer that they all want, you’ve solved the deployment problem.

    Averting s-risks mostly means preventing zero-sum AI conflict. If we find a way (or many ways) to do that, every somewhat rational AI will voluntarily adopt them, because who wants to lose out on gains from trade. Our current earliness may be enough to seed public training data with any solutions we find and with Schelling points that they can use to coordinate.

    Another intuition pump is that alignment aims at a tiny patch in value space whereas averting s-risks only aims to avert a bunch of outlier scenarios that shouldn’t be so hard to avert. When you’re at a shooting range, it’s much easier not to kill any of the people next to you than to hit the center of the target.

  3. ^

    Importance. If I imagine trading extreme suffering for extreme bliss personally, I end up with ratios of 1 to 300 million – e.g., that I would accept a second of extreme suffering for ten years of extreme bliss. The ratio is highly unstable as I vary the scenarios, but the point is that I disvalue suffering many orders of magnitude more than I value bliss.

    Clearly there are some people who feel differently, but the intuition that suffering is worse than bliss is good is widely shared. (And the factor doesn’t need to be as big as mine. Given the high tractability and neglectedness, averting s-risks from AI may even be interesting for somewhat positive-leaning utilitarians.)

    Plus, a high-probability non-dystopic not-quite-utopia may be better in expectation than a lot of low-probability utopias with dystopic counterfactuals. But I guess that depends on countless details.

    Arguably, extinction is somewhat more likely than dystopic s-risk lock-ins. But my guess is that s-risks are only a bit less likely than multipolar takeoffs, maybe 1–10% as likely, and that multipolar takeoffs are very likely, maybe 90%. (The GPT-3 to -4 “takeoff” has been quite slow. It could stop being slow at any moment, but while it’s still slow, I’ll continue updating towards month- or year-long takeoffs rather than minute-long ones.) As soon as there are multiple AIs, one coordination failure can be enough to start a war. Yes, maybe AIs are generally great at coordinating with each other. But that can be ruined by a single sufficiently powerful one that is not. (And sufficiently powerful can mean just, like, 1% as powerful as the others.) Anything from 0.1–10% s-risk between now and shortly after we have a superintelligence seems about right to me.

  4. ^

    Neglectedness. Alignment is already critically neglected, especially the approaches that Tammy calls “hard alignment.” Paul Christiano estimated some numbers in this excellent Bankless podcast interview. S-risks from AI are only addressed by the Center on Long-Term Risk, to some extent by the Center for Reducing Suffering, and maybe incidentally by a number of other groups. So in total maybe 1/10th the number of people work on it. (But the ideal solution is not for people in alignment to switch to s-risks but for people outside both camps to join s-risk research!)

Mentioned in
Sorted by Click to highlight new comments since: Today at 12:05 PM

Too sad. Some people think that maybe working on s-risks is unpopular because suffering is too emotionally draining to think about, so people prefer to ignore it.

Another version of this concern is that sad topics are not in vogue with the rich tech founders who bankroll our think tanks; that they’re selected to be the sort of people who are excited about incredible moonshots rather than prudent risk management. If these people hear about averting suffering, reducing risks, etc. too often from EA circles, they’ll become uninterested in EA-aligned thinking and think tanks.

I want to argue with the Litany of Gendlin here, but what work on s-risks really looks like in the end is writing open source game theory simulations and writing papers. All try academic stuff that makes it easy to block out thoughts of suffering itself. Just give it a try! (E.g., at a CLR fellowship.)

I don’t know if that’s the case, but s-risks can be reframed:

  1. We want to unlock positive-sum trades for the flourishing of our descendants (biological or not).
  2. We want to distribute the progress and welfare gains from AI equitably (i.e. not have some sizable fractions of future beings suffer extremely).
  3. Our economy only works thanks to trust in institutions and jurisprudence. The flourishing of the AI economy will require that new frameworks be developed that live up to the challenges of the new era!

These reframings should of course be followed up with a detailed explanation so as not to be dishonest. Their purpose is just to show that one can pivot one’s thinking about s-risks such that the suffering is not so front and center. This would, if anything, reduce my motivation to work on them, but that’s just me.


but what work on s-risks really looks like in the end is writing open source game theory simulations and writing papers

Research that involves game theory simulations can be net-positive, but it also seems very dangerous, and should not be done unilaterally. Especially when it involves publishing papers and source code.

Heh, I actually agree. I’m currently wondering whether it’s net positive or negative if all this research, though unpublished, still ends up in the training data of at least one AI. It could help that AI avoid coordination failures. But there will be other AIs that haven’t read it, and they’re too many, maybe it’s unhelpful or worse? Also it probably depends a lot on what exactly the research says. Wdyt?

I am very unclear on why research that involves game theory simulations seems dangerous to you. I think I'm ignorant of something leading you to this conclusion. Would you be willing to explain your reasoning or send me a link to something so I can better understand where you're coming from?

Please pretend that I posted this as question and that my top-level comments are the answers. 

The cross-posting seems to change the type from question to post, and I don’t know how to change it back to question.

Too unpopular. Maybe people are motivated by what topics are in vogue in their friend circles, and s-risks are not?

Too unknown. Finally there’s the obvious reason that people just don’t know enough about s-risks. That seems quite likely to me.

Taken as intended and not as a question for me (I am personally quite concerned abouts-risks, but think working on them involves similar insights to working on AI x-risks), I think the most common reason is people seeing these scenarios as quite unlikely. Roughly: astronomically bad outcomes are, like astronomically good outcomes, only a very tiny slice of possible outcomes of randomly selected optimization functions, but unlike astronomically good outcomes don't have existing powerful intelligences trying to aim for them. I think there are reasonable counters to this, but that's my impression of the most common thought here.

Yeah, that seems likely. Astronomically bad seems much more likely than astronomically good to me though.

Could you expound on this or maybe point me in the right direction to learn why this might be? 

I tend to agree with the intuition that s-risks  are unlikely because they are a small part of possibility space and that nobody is really aiming for them. I can see a risk that systems trained to produce eudaimonia will instead produce -1 x eudaimonia, but I can't see how that justifies thinking that astronomic bad is more likely than astronomic good. Surely a random sign flip is less likely than not.


Mainly the reason I don't think about it more[1] is that I don't see any realistic scenarios where AI will be motivated to produce suffering. And I don't think it's likely to incidentally produce lots of suffering either, since I believe that too is a narrow target.[2] I think accidental creation of something like suffering subroutines are unlikely.

That said, I think it's likely on the default trajectory that human sadists are going to expose AIs (most likely human uploads) to extreme torture just for fun. And that could be many times worse than factory farming overall because victims can be run extremely fast and in parallel, so it's a serious s-risk.


  1. ^

    It's still my second highest cause-wise priority, and a non-trivial portion of what I work on is upstream of solving s-risks as well. I'm not a monster.

  2. ^

    Admittedly, I also think "maximise eudaimonia" and "maximise suffering" are very close to each other in goal-design space (cf. the Waluigi Effect), so many incremental alignment strategies for the former could simultaneously make the latter more likely.


A bunch of scenarios are collected in the s-risk sub wiki

[comment deleted]1mo-3-4

I can think of plenty of scenarios that are “realistic” by AI safety standards… Scenarios that are inspired by stuff that terrorists do all the time when they’re fighting powerful governments, so lots of precedents in history, and whose realism only suffers a bit because they would not be technically possible for humans with today’s technology.


You mean threats? I'm not sure what you're pointing towards with the terrorist thing.

I have meta-uncertainty here. I think I could think of realistic-ish scenarios if I gave it enough thought. (Though I'd have to depreciate the probability in proportion to how much effort I spend searching for it.) Tbh, I just haven't given it enough thought. Do you have any recs for quick write-ups of some scenarios?

NUs. Some people may think that you have to be a negative utilitarian to care about s-risks. They are not negative utilitarians, so they steer clear of the topic.

I don’t think you have to be a negative utilitarian to care about s-risks. S-risks are about suffering, but people can be concerned about suffering among other values. Classic utilitarianism is about minimizing suffering and maximizing happiness. One does not exclude the other. Neither does concern for suffering exclude self-preservation, caring for one’s family, wanting to uphold traditions or making one’s ancestors proud. All values are sometimes in conflict, but that is not cause to throw out concern for suffering in particular. 

My vague mental model of the general human population today says that concern for involuntary suffering is shared by the vast majority of people. Probably as widely shared as an aversion to death and extinction, and more widely shared than grabby alien type of values (not losing any galaxies to cosmic expansion; making consciousness as small, fast, and energy-efficient as possible and converting the energy of all suns to it; etc.).

 Averting s-risks mostly means preventing zero-sum AI conflict. If we find a way (or many ways) to do that, every somewhat rational AI will voluntarily adopt them, because who wants to lose out on gains from trade.

I don't really understand this argument: if there is some game theoretic solution such that all intelligences will avoid conflict, then shouldn't we expect that AI's would find and implement it themselves so that they can get gains from trade?

In order for this to be an argument for us working on s-risks, I would think that you need to show that only some subset of intelligences will avoid conflicts, which means we need to ensure we build only that subset.

I agree with your reasoning here—while I think working on s-risks from AI conflict is a top priority, I wouldn't give Dawn's argument for it. This post gives the main arguments for why some "rational" AIs wouldn't avoid conflicts by default, and some high-level ways we could steer AIs into the subset that would.

Agreed, and thanks for linking the article!

This article for example makes the case. 

Iirc, one problem is that there are ways to trade in positive sum ways, but they are multiple ways, and they don’t mix. So to agree on something, you first have to agree on the method you want to use to agree, but some may be at an advantage using one method and others using another method. 

More empirically, there have been plenty of situations in which groups of smart humans after long deliberation have made bad decision because they thought another was bluffing, because they thought they could get away with a bluff, because their intended bluff got out of control, etc.

Tobias Baumann has thought a bit about whether perfectly rational all-knowing superintelligences might still fail to realize certain gains from trade. I don’t think he arrived at a strong conclusion even in that ideal case. (Idealized models of AIs don’t ring true to me and are at best helpful to establish hypothetical limits of sorts, I think.) But in practice even superintelligences will have some uncertainty over whether another is lying, concealing something, might not have something that they think they have, etc. Such imperfect knowledge of each other has historically led to a lot of unnecessary bloodshed.

Another source of problems is behavior in single-shot vs. iterated games. An AI might be forced into a situation where it has to allow a smaller s-risk to prevent a greater s-risk.

Folks at CLR have a ton of research into all the various failure modes, and it’s not clear to me at all what constellations of attitudes minimize or maximize s-risk. I’ve been hypothesizing that the Tit-For-Tat-heavy European culture may (if learned by AIs) lead to fewer worse suffering catastrophes whereas the more “Pavlovian” (in the game theory sense) cultures of South Korea or Australia (iirc?) may cause more smaller catastrophes. 

But that’s just as vague speculation as it sounds. My takeaway is rather that I think that any multipolar scenarios will lead to tons of small and large bargaining failures, and some of those may involve extreme suffering on an unprecedented scale.

Egoism, presentism, or substratism. The worst s-risks will probably not befall us (humans presently alive) or biological beings at all. Extinction, if it happens, will. Maybe death or the promise of utopia has a stronger intuitive appeal to people if they themselves have a risk/chance of experiencing it?

I suspect that one aspect is what I would label as no path, but which we could also just describe as personal fit or as perceived personal tractability.

If you are in the process of studying machine learning an AI as part of a computer science degree, if you have the money for graduate school, if you are connected to the right institutions and have the right signals of competence, then sure: you can apply to work at an AI alignment research organization and go make a contribution[1]. But there are lots of people who don't have a clear path. Should the nurse[2] who reads a book about AI go back to get another bachelor's degree in a brand new field? If that person has enough money to support himself/herself for a few years of a study and to pay for tuition, then maybe, but that feels like a big ask.

In writing this quick thought I'm only really thinking about the subset of people who both A) are familiar with the topic, and B) are convinced that it is real and worth working on. There are, of course, lots of people who don't fall into both of these categories.

  1. ^

    But remember that these orgs are super selective. Even if you have a computer science degree, an interest in AI alignment, and decent general work skills (communication, time management, organization, etc.), you have a slim chance of being employed there. I don't have the exact numbers, but someone internal to an AI research org could maybe provide a rough estimate of "what percentage of reasonably qualified applicants get job offers from us."

  2. ^

    I picked nurse arbitrarily, but you could fill in the blank with some other job or career: bookkeeper, project manager, literary translator, civil engineer, recruiter, etc.

Indeed, I think I’m in the same predicament. Around 2020, pretty much due to bio anchors, I started to think much more about how I could apply myself more to x- and s-risks from AI rather than priorities research. I tried a few options, but found that direct research probably didn’t have sufficiently quick feedback loops to keep my attention for long. What stuck in the end was improving the funding situation through, which is already one or two steps removed from the object level work. I imagine if I didn’t have any CS background, it would’ve been even harder to find a suitable angle.

Difference-making risk aversion, bounded utility function (bounded above and below), less sympathy for betting everything on one shots, or other reasons to not be a longtermist generally. This doesn't explain why longtermism is popular but s-risk work in particular isn't, though.

Quite plausible, thanks! I’ve been wondering whether the “infinity shades” from infinite ethics may play into this. Then again I don’t know many people who are very explicit about their particular way of resolving infinite ethics.

NNTs. Some might argue that “naive negative utilitarians that take ideas seriously” (NNTs) want to destroy the world, so that any admissions that s-risks are morally important in expectation should happen only behind closed doors and only among trusted parties.

That sounds to me like, “Don’t talk about gun violence in public or you’ll enable people who want to overthrow the whole US constitution.” Directionally correct but entirely disproportionate. Just consider that non-negative utilitarians might hypothetically try to kill everyone to replace them with beings with greater capacity for happiness, but we’re not self-censoring any talk of happiness as a result. I find this concern to be greatly exaggerated.

In fact, moral cooperativeness is at the core of why I think work on s-risks is a much stronger option than alignment, as explained in the tractability section above. So concern for s-risks could even be a concomitant of moral cooperativeness and can thus even counter any undemocratic, unilateralist actions by one moral system.

Note also that there is a huge chasm between axiology and morality. I have pretty strong axiological intuitions but what morality follows from that (even just assuming the axiology axiomatically – no pun intended) is an unsolved research question that would take decades and whole think tanks to figure out. So even if someone values empty space over earth today, they’re probably still not omnicidal. The suffering-focused EAs I know are deeply concerned about the causal and acausal moral cooperativeness of their actions. (Who wants to miss out on moral gains from trade after all!) And chances are this volume of space will be filled by some grabby aliens eventually, so assured permanent nonexistence is not even on the table.

Too unlikely. I’ve heard three versions of this concern. One is that s-risks are unlikely. I simply don’t think it is as explained above, in the post proper. The second version is that it’s 1/10th of extinction, hence less likely, hence not a priority. The third version of this take is that it’s just psychologically hard to be motivated for something that is not the mode of the probability distribution of how the future will turn out (given such clusters as s-risks, extinction, and business as usual). So even if s-risks are much worse and only slightly less likely than extinction, they’re still hard for people to work on.

There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.

The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications specifically for s-risks. With all ITN factors taken together but ignoring probabilities, s-risk work beats other x-risk work by a factor of 10^12 for me (your mileage may vary), so if it’s just 10x less likely, that’s not decisive for me.

I don’t have a response to the third version.

S-risk is probably just 1/10th of that – wild guess

This feels high to me – I acknowledge that you are caveating this as just a guess, but I would be interested to hear more of your reasoning.

One specific thing I'm confused about: you described alignment as "an adversarial game that we’re almost sure to lose." But conflict between misaligned AI's is not likely to constitute an s-risk, right? You can't really blackmail a paperclip maximizer by threatening to simulate torture, because the paperclip maximizer doesn't care about torture, just paperclips.

Maybe you think that multipolar scenarios are likely to result in AI's that are almost but not completely aligned?

Maybe you think that multipolar scenarios are likely to result in AI's that are almost but not completely aligned?

Exactly! Even GPT-4 sounds pretty aligned to me, maybe dangerously so. And even if that might have nothing to do with any real goals it might have deep down if it’s a mesa optimizer, the appearance could still lead to trouble in adversarial games with less seemingly aligned agents. 

Personal fit. Surely, some people have tried working on s-risks in different roles for some substantial period of time but haven’t found an angle from which they can contribute given their particular skills.

I wonder the opposite question: Why should we work in AI hell? I'm a 16 year-old boy, I'm AI outsider, we may have a big knowledge gap on AGI. I think it would we great if you can provide me some persuasive arguments to work in reducing s-risks. I have troubles reading the essays of s-risks online(have read CLR, CRS, Brian Tomasik, Magnus Vinding, Tobias Baumann's) , because it's too hard and theoretical for me. Also, there are some (basic) questions that I can't find the answer: 1.How likely do you think AI can develop sentience, and it's animal-like(I mean, the sentience contains suffering like animals)? What are the arguments? You keep talking on AI suffering, but it's really hard to imagine AI suffer in common sense.

2.Can you list scenarios that even AI don't become sentient at last, but it causes astronomical suffering for humans and animals?(Some I have heard of are threatening scenarios:when different AI systems threatens each other with causing human suffering and near-miss scenarios)

Thanks for your replying.

Hi Jack! Wonderful to hear that you’ve been reading up on all these sources already! 

Rethink Priorities has identified lots of markers that we can draw on to get a bit of a probabilistic idea of whether invertebrates are sentient. I wonder which of these might carry over to digital sentience. (It’s probably hard to arrive at strong opinions on this, but if we did, I’d also be worried that those could be infohazardous.) The concept of reinforcement learning (testable through classic conditioning) is a marker that I think is particularly fundamental. When I talk about sentience, I typically mean positive and negative feedback or phenomenal consciousness. That is intimately tied to reinforcement learning because an agent has no reason to value or disvalue certain feedback unless it is inherently un-/desirable to the agent. This doesn’t need to be pain or stress (just as we also can correct someone without causing them pain or stress) and it’s unclear how intense it is anyway, but at least when classic conditioning behavior is present, I’m extra cautious and when it’s absent less worried that the system might be conscious.

You’ve probably seen Tobias’s typology of s-risks. I’m particularly worried about agential s-risks where the AI, though it might not have phenomenal consciousness itself, creates beings that do, such as emulations of animal brains. But there are also incidental s-risks, which are worrying particularly if the AI ends up in the situation where it has to create a lot of aligned subagents, e.g., because it has expanded a lot and is incurring communication delays. But generally I think you’ll hear the most convincing arguments in 1:1 conversations with people from CLR, CRS, probably MIRI, and others.  

Dawn -- an important issue, and I don't know the answer.

I haven't read much about S-risk, so treat my comments as very naive.

My hunch is that people who have read the science fiction novel 'Surface Detail' (2010) by Iain M. Banks are likely to take S-risk seriously; and those who haven't, not so much. The novel portrays a world in which people's mind-states are uploaded and updated, and if their bodies die, and they're considered 'bad people', their mind-states are tormented in a virtual hell for subjective eons. It's a harrowing read.

But, the premise depends on psychopathic religious fundamentalists using machine intelligences to impose the digital hell on digital people, to align with their theological notions of who deserves punishment.

Outside the realm of vengeful religions leading powerful psychopaths to impose digital suffering on digital sentiences, it's somewhat difficult to imagine any realistic situations in which any entities would want to deliberately inflict wide-scale suffering on others.

And, even if they did, others who are less psychopathic might notice, and intervene to save the tormented digital souls (as they did in the novel). 

I haven’t read the novel, so I can’t comment on that part but, as I commented above, “I can think of plenty of scenarios that are ‘realistic’ by AI safety standards… Scenarios that are inspired by stuff that terrorists do all the time when they’re fighting powerful governments, so lots of precedents in history, and whose realism only suffers a bit because they would not be technically possible for humans with today’s technology.”

PS -- for folks who disagree-voted on this post, I'm curious what you disagreed with?


My guess is that people disagree with the notion that the novel is a significant reason for most people who take s-risks seriously. I too was a bit puzzled by that part, but I found it enlightening as a comment even if I disagreed with it.

My impression is that readers of the EA forum have, since 2022, become much more prone to downvoting stuff just because they disagree with it. LW seems to be slightly better at understanding that "karma" and "disagreement" are separate things, and that you should up-karma stuff if you personally benefited from reading it, and separately up-agree or down-agree depending on whether you think it's right or wrong.

Maybe I'm wrong, but perhaps the forum could use a few reminders to let people know the purpose of these buttons. Like an opt-out confirmation popup with some guiding principles for when you should up or downvote each dimension.

rime - thanks for your helpful reply. 

I agree that it would be nice on EA Forum for people to stay disciplined about upvotes versus agree-votes. 

It would also be very helpful if there was a norm of people disagree-voting offering, at least some of the time, explicit reasons for their disagreement -- even if only brief comments.

My mention of the Banks novel wasn't intended to be taken too literally as an explanation for why some people take S-risk seriously. (Maybe that was seen as dismissive or mocking, but it certainly wasn't meant to be.) For me personally, Surface Detail was just the only scenario I've seen portrayed in fiction, so far, where there would be any sustainable rationale for AIs to impose long-term suffering on sentient beings.