AI Safety and Cross-Species Robustness: A brief critical review

Jim Buhler

0. Summary

This piece makes the following three points (which match the numbered sections):

We have, I argue, no good reason to presume that reducing existential risks from AI, which is explicitly intended to help humans,^[1] is good for other animals. This is because animal farming would not exist without empowered humans, and assuming that these same empowered humans will outweigh this by sufficiently increasing wild animal welfare seems untenable. And to the extent that x-risk reduction appears as the dominant consequence of most (if not all) AI safety work, this refutes the case for AI x-risk reduction and AI safety being robustly good for all sentient beings. Importantly, while I therefore reject the position that reducing AI safety is net good for animals, this post does not take a stance on “it is net bad” vs “we should be agnostic” (although I preach for the latter elsewhere).
Still, this need not defeat the case for AI alignment, AI control, restraining AI development, or other AI safety work not primarily aimed at benefiting animals. Such work may remain positive if we assume specific interspecies tradeoffs, set some particular uncertain effects aside, or couple it with a sufficient amount of effective AIxAnimals work. However, these three paths for saving the case for AI safety all have rocks on them. There are serious challenges to address for each.
AI x-risk reduction is not the only cause area that has indirect—indeterminate or negative—effects on beings it does not target. This piece is only one specific case study showing why the value of a project should not be estimated without considering its off-target effects, despite how normalized this practice is.

Note: This essay was written and posted more quickly than I would have ideally liked. I find parts of my analysis underdeveloped, but I endorse all my claims and provide extensive literature to back them for those who want to dig deeper.

1. AI safety may not help everyone

A thought like “AI developers and future humans have both the capacity and will to benefit animals, so let’s assume they will if their values remain in control” has a strong intuitive appeal. However, I think it ceases to be compelling the moment we shed light on the demanding assumptions this case turns out to require.

In §1.1, I show that reducing AI x-risks indisputably increases future farmed animal suffering (even if “just a little”). Assuming future farmed animal welfare is most likely negative, this means that the only way for AI x-risk reduction to be net positive for non-human animals is if i) it overall increases future wild animal welfare, and ii) to an extent that outweighs the harm caused to farmed animals. In §1.2, I argue that condition (i) is hardly met. In §1.3, I explain how condition (ii) is an additional substantial challenge. While reducing AI x-risks could overall benefit animals (see Thürler 2025; Dickens 2025; 2026; Xia 2025; many of the comments here), I overall argue that this position is not any more defensible than the exact opposite one. §1.4 discusses this conclusion and clarifies its scope.

1.1 Saving humans means saving animal farming for at least some time

Animal farming would presumably^[2] not exist, in any significant way, in a world without empowered humans. This is an instance of the meat-eating problem. Saving human lives or otherwise increasing the expected total human population indirectly increases farmed animal suffering.

Of course, in worlds where humanity remains in control, we might expect moral and technological progress to lead to a ban or severe restriction of animal breeding, anyway.^[3] But, assuming it eventually does,^[4] we still have to account for, at least, all the farmed animal suffering that would occur before then, which would be greater than in the counterfactual where AI takeover disempowers humans and ends animal farming beforehand.

1.2 Saving humans has unclear implications for wild animal welfare

Here are four very different sets of hypotheses one could make to support that AI x-risk reduction could overall increase future wild animal welfare (in increasing order of popularity):

A) Empowered humans would voluntarily create many animals whose existence is a gift.
B) The x-risks AI poses to humans may also lead to a severe reduction (or even extinction) of wildlife,^[5] and not coming into existence is a harm for wild animals.^[6]
C) Empowered humans would voluntarily intervene in nature in substantial and robustly positive ways.^[7]
D) Empowered humans would incidentally reduce the number of wild animals in the long run, and wild animals are better off not being born,^[8] such that this benefits them.

I do not think any of these four can be presumed. Let me discuss them one by one.

First, while Sebo (2023, §6) invites (classical) utilitarians to consider terraforming other celestial bodies to create as many happy “(post-)nonhumans” as possible, I am not aware of anyone ever defending (A). If humans ever benevolently create beings with great lives in any significant numbers, I think everyone would agree these would overwhelmingly likely be biological humans or digital minds.

Second, (B) is, of course, severely compromised by its assumption that coming into existence is in the interest of wild animals. Questioning the popular (among wild animal ethicists) paradigm that wild animals would be better off not being born,^[9] and calling for suspension of judgment on the question,^[10] is one thing. Assuming the exact opposite is another (see Buhler n.d.).

Third, (C) relies on multiple contentious assumptions. No matter how animal-friendly people’s values get, genuine concern for the non-anthropogenic suffering of wild animals might remain fringe, and wild animal interventions have massive backfire risks. It is far from obvious that humans will ever have the necessary motivation and coordination to voluntarily and substantially affect wild animal welfare on large scales (Knutsson 2021, footnote 6; Tomasik 2019; Matthews 2021). If they do, it is even less clear that their impact will turn out positive, all things considered (Delon & Purves 2018; Rowe 2019; Graham 2025a). And if it does, this would still have to make up for all the ways humans may, in parallel, incidentally decrease the quality of life of wild animals (e.g., with painful pesticides and animal repellents, aquatic noise, and anthropogenic wildfires).

Fourth, let’s call into question (D), the most popular one. (D) has been most saliently defended by Adelstein (2025), who, after qualitatively discussing some considerations pointing in both directions,^[11] “lean[s] towards yes at maybe 65% confidence”. Yet, nothing said in the post (or anywhere else, for that matter) appears to make 65% a more reasonable credence than, say, 37% (which would give the opposite verdict). Which of these two numbers may be considered most appropriate is determined by opaque judgment calls, which we have little reason, if any, to believe correlate with the truth to any extent.^[12] Backing Adelstein's conclusion, Brauner and Grosse-Holz (2018) write:

For wild animals, we can extrapolate from a historical trend of decreasing wild animal populations. Even if wild animals were spread to other planets for terraforming, the animal / human ratio would likely be lower than today.

But no argument is given for why we can simply do the first—given the high potential for dramatic changes—or assume the second. Finally, the view that humans would incidentally reduce the number of wild animals (for the better) has also been defended by Grilo (2025), although with vastly different assumptions from the above authors (ignoring the long-term future, most notably). There too, the precise credences seem highly arbitrary, and Grilo himself signals having decisively changed his mind, a few months later, in this comment. Overall, saying that (D) seems wildly speculative would be both a lazy pun and an understatement.

1.3 Trading off farmed and wild animal welfare

Recall this from the introduction of §1:

reducing AI x-risks indisputably increases future farmed animal suffering (even if “just a little”). This means that the only way for AI x-risk reduction to be net positive for non-human animals is if i) it overall increases future wild animal welfare, and ii) to an extent that outweighs the harm caused to farmed animals.

Let us generously assume that at least one of the following two is correct:

C) Empowered humans would voluntarily intervene in nature in substantial and robustly positive ways.
D) Empowered humans would incidentally reduce the number of wild animals in the long run, and wild animals are better off not being born, such that this benefits them.

This means condition (i) is met. And assuming future wild animals are far more numerous than future farmed animals, it is tempting to believe that (ii) inevitably follows from (i). However, it is not the number of animals that matters here, but the scale of humans’ impact on them. And while all future farmed animal suffering will be causally downstream of human activities, the positive difference humans may make for future wild animals may remain small relative to total wild animal welfare (even assuming, e.g., maximally successful large-scale interventions).^[13]

1.4 What all this does and does not imply

Taking stock of everything said so far in §1, let me point out that there are many dystopian side trails on the path to animal utopia. Assuming empowered humans (thanks to AI x-risk reduction) will go far enough down the utopian path, and soon enough, to make up for all the negative externalities they caused on their way (e.g., factory farming—including that of small creatures) seems overconfident. After all, our current period arguably is, to date, the worst in history for non-human animals, even though concern for their welfare has never been greater. We need far more than the speculations criticized so far to suppose that we will get the inverse trend in the future.^[14]

Importantly, I have only talked about saving/empowering humans through AI x-risk reduction. Yet, AI alignment, AI control, restraining AI development, or other AI safety work not primarily aimed at benefiting animals can also improve the quality of the future, all else equal. However, this “all else equal” is crucial. In reality, it seems highly plausible that the dominant consequence of most (if not all) AI safety work seems to be its influence on x-risks, maybe including that of AIS work exclusively targeted at causing trajectory changes.^[15] The more this is true, the more my refutation of the case that AI x-risk work is robustly good for all sentient beings equally applies to AI safety work, more broadly.

Finally, note that I have (intentionally) not demonstrated that AI x-risk or AI safety work overall harms non-human animals, but only that the case for the opposite is inconclusive. It is very plausible that we must remain agnostic on the sign of these off-target effects.

2. Can we defend AI safety regardless?

2.1 Background on why cross-species robustness would have been convenient

The goodness of many altruistic projects is sensitive to parameters about which we should be highly uncertain. For instance, whether advancing technological progress is advisable may depend heavily on its uncertain effects on some large-scale risks,^[16] and the badness of these large-scale risks may itself be uncertain.^[17] Hence, our choice (not) to support such projects may depend on somewhat arbitrary best guesses on crucial parameters or on the assumption that these can be ignored altogether.^[18] This is not to say that we should give up such projects. They may still have good reasons in their favor. However, their lack of robustness to key uncertainties must be acknowledged.

In contrast, other projects might be robust to these critical uncertainties by remaining positive regardless of such parameters,^[19] which gives them a serious advantage. Reducing (AI) x-risks has been identified, by some, as a potential example. And if the case that reducing AI x-risks overall benefits non-human animals (red-teamed in §1) was convincing, this could have made AI x-risk reduction robust to uncertainty on moral weights and on the expected future size of human vs animal populations. This would have made cause prioritization easier. But AI x-risk reduction does not necessarily have to be robust in such a way to be worth supporting.

2.2 Three ways of doing without cross-species robustness

First, one can defend that the above uncertainties are not that severe, and endorse prioritizing some species over others. For example, one may believe that non-human animal welfare, as a whole, has negligible relevance in the long-term future, assuming it does not matter nearly as much as human welfare in principle, and/or that humans will vastly outnumber other animals in the long run. But then, they must, of course, convincingly defend this against the case for deep uncertainty on these questions. It is not trivial to claim that AI safety helps humans more than it potentially harms other animals.^[20] And it is even less trivial to claim that it does so to an extent large enough for AI safety to be more promising than helping humans in ways that we can reasonably assume do not harm other animals (e.g., maybe treating cluster headaches), or helping the latter in ways that do not harm the former (e.g., a lot of animal welfare work).^[21]

Second, one may argue that we are clueless about AI safety’s all-things-considered impact on non-human animals in a way that justifies bracketing them all out. But this also comes with unresolved problems:

i) Why would humans and other animals be privileged categories, here? In §1, I argue that AI safety inevitably harms farmed animals and has indeterminate effects on wild animal welfare. Can I bracket in farmed animals and bracket out humans + wild animals, such that AI safety now looks negative? Why would this be any less justified than bracketing in humans and bracketing out all the groups we could form within “other animals”?^[22]
ii) What are we bracketing over, exactly? Causal effects? Possibilities? Individuals? Space-time regions? Groups sharing similar characteristics? Something else? (See Clifton 2025; this comment thread). Each value-bearer comes with its own problems, and not all can justify a form of bracketing that tells us AI safety is good. For instance, space-time-region bracketing, Clifton’s (2025) favorite, does not allow us to cleanly separate humans and other animals.
iii) Is doing what is optimal under whatever form of bracketing that might work (i.e., setting aside the above two problems) better than doing something robustly good without bracketing out anything? (See this comment thread.)

Third, one could defend that while AI Safety work is not robust across species on its own, it is if coupled with AIxAnimals work (c.f. hedging—see St. Jules 2020; and especially this comment). If AI safety has overall negative effects on animals, these effects would be compensated for by animal-focused work. Now, for this hedging proposal to work, we need:

AIxAnimals work to itself be robustly good for non-human animals.
for it to have (unintended) effects on humans that are positive or small enough to be overshadowed by the benefits of AI safety.^[23]
to get the exact hedging allocation right. It’s not enough to say “we will make up for the potential harm we do to animals by supporting some animal-focused work”. We need to make sure such support is strong enough to actually compensate, and deep uncertainty about the relevant parameters hits hard here, too.^[24]

3. Conclusion: The relevance of off-target effects

I have argued that we lack good reason to believe that reducing AI existential risks is beneficial for non-human animals, and that the case for its cross-species robustness is therefore unconvincing. We can still believe AI safety is good in expectation, but we cannot pretend it robustly helps everyone. To defend AI x-risk reduction or other forms of AI safety not primarily intended to benefit animals, we must prioritize some species over others, bracket uncertain effects in a particular way, or pair AI safety with animal-focused interventions in an appropriate manner. And each of these strategies faces challenges that have to be addressed.

Note that while the present piece has focused on AI x-risk reduction, it is only one example of a cause area that could, while focusing on helping some group of beings, unintentionally harm others in significant ways. One important takeaway is that we cannot escape considering the off-target effects of the interventions we are considering. This goes against common sense and makes cause prioritization hard. But this is what we signed up for if we decided to aim for impartiality.

Acknowledgments

For helpful comments on earlier versions of this manuscript, I thank Niki Dupuis, Karen Singleton, Kevin Xia, Max Taylor, Jo_, Zoe Lu, and Nithin Ravi. I am also grateful to Shaïman Thürler and Charbel-Raphaël Segerie for sharing their thoughts on the topic with me. All claims and omissions remain my own.

^{^}
I am including potential future digital minds in “humans”. This means that, for the sake of argument, I will set aside the scenarios where AI x-risk reduction helps biological humans (and maybe even animals) but harms digital minds, despite the relevance of these scenarios outside the very specific topic at hand (see, e.g., Long et al. 2025; Buhler 2023a; Horta 2025; Tomasik 2019).
^{^}
Preventing an existential catastrophe from misaligned AI takeover could, in theory, fail in a way where a misaligned transformative AI disempowers humans while preserving animal farming, to the point where farmed animals would have been better off with humans (or aligned AIs) in charge. But this seems undeniably unlikely given how random this would be relative to a convergent instrumental goal.
^{^}
See, e.g., West 2017; Christiano 2013b; Reese 2018; Vaintrob & West 2025; Masterson 2026; GradientDissenter 2026.

^{^}

Which is far from guaranteed (see, e.g., Tse 2022; Taylor 2023; Kateman 2022; Özden 2024; Stewart & Dupuis 2025; Stewart 2026; Singleton 2025). For instance, as I argue in Buhler (2025c), it seems quite plausible that the farming of fish, shrimp, and insects will blossom, even in a world where people deeply care about animal welfare.

^{^}

See Harling 2020; Jurkovich 2021; Jo_ 2025; Stewart & Dupuis 2025; utilistrutil 2023; Thürler 2025; Turchin & Denkenberger 2020; Soares 2023; Yudkowsky 2024.

^{^}

As argued by Balluch (2017), Mikkelson (2018), Godfrey-Smith (2024), and Laing (in this 2025 comment). Relatedly, see also Plant (2016), Reus (2018), Groff and Ng (2019), Tyé (2022; 2023) Browning and Veit (2023), and York (2024).

^{^}

See Bruers 2019; 2018; Vinding 2015; Knutsson 2021, §4; O’Brien 2024; Dickens 2015; Tomasik 2018a; Pearce 1995.

^{^}

Case supported by the authors listed here, among others.

^{^}

See earlier footnotes for support and pushback.

^{^}

See, e.g., Soryl et al. 2021, p.11; Browning & Veit 2023, §Conclusion; Browning & Horta 2025.

^{^}

Relatedly, see also Jebari and Sandberg (2022), Tomasik (2018b), Šimčikas (2022), Rowe (2020); Graham (2025b); Jo_ (2025), and utilitrustis (2023).

^{^}

DiGiovanni (2026) writes: “A brute intuition in favor of [a given] verdict, a “verdict-level intuition”, isn’t a reason. [...] The main counterargument: Our verdict-level intuitions might give us evidence of good reasons that we can’t explicitly articulate. But, in contexts with poor feedback on how well these intuitions track good reasons, this evidence seems weak.” See also Buhler 2025a; DiGiovanni 2025a; Tarsney et al. 2024, §3; Mogensen 2021.

^{^}

I clarify what I mean and do not mean here, in this comment.

^{^}

Also, note that to the extent that the view that AI x-risk reduction benefits animals is supposed to make this cause area robust to key uncertainties (§§0&2), it is a paradoxically uncertain and non-robust position to hold. See DiGiovanni (2025c) and Buhler (2025e, §3) on what differentiates robust considerations from (nearly) worthless evidence.

^{^}

See my discussions of “uncertainty regarding the value of the future” in Buhler (2023b) and of “non-X-risk lock-in scenarios” in Buhler (2025e, §2.1).

^{^}

Relatedly, see Clancy (2024), Clancy & Rodriguez (2024), Ord (2024), and Christiano (2013a; 2014).

^{^}

See Buhler (2025b) for an overview.

^{^}

See DiGiovanni 2025a; Davis 2025; Buhler 2025a; Tarsney et al. 2024, §3; Mogensen 2021; Graham 2025a.

^{^}

See Beckstead 2013; Bostrom 2014; Tomasik 2015; Karnofsky 2016; Roussos 2021; Mogensen & Thorstad 2022; DiGiovanni 2025b; 2025c.

^{^}

Relatedly, see Graham (2025a) and the cluelessness literature.

^{^}

See the cases— referenced in footnote #18—for prioritizing projects that are robustly good over those that best maximize expected value under uncertain assumptions.

^{^}

Relatedly, see Graham’s (2025a) discussion of “spotlighting”, Rowe’s (2019) discussion of “nontarget effects”, DiGiovanni’s (2025a) discussion of the “many different ways of carving up the set of effects”, and this comment thread under his post.

^{^}

Otherwise we would have the reverse non-robustness problem: a portfolio that helps non-human animals while having negative or indeterminate effects on humans.

^{^}

Consider the following analogy: Supporting AI progress may be robustly positive for humanity if paired with AI safety work. But finding an appropriate allocation between the two to make this “may” a “(probably) is” would be challenging.

Show all footnotes

Aidan KankyokuMar 253

This post is persuasive overall and caused me to update negatively on the debate motion.

I don't think this is load-bearing but this sentence:

However, it is not the number of animals that matters here, but the scale of humans’ impact on them.

I'm not certain about this distinction. Humans increasing or decreasing the number might be the largest impact, and at the same time, this seems to me to imply a greater concern for anthropogenic harm than non-anthropogenic harm. Is that what you meant?

Jim BuhlerMar 264

this seems to me to imply a greater concern for anthropogenic harm than non-anthropogenic harm. Is that what you meant?

Oh no sorry, increased WAW welfare compared to the "natural" situation counts as impact too.

What I'm saying is: say you help 1 million wild animals out of many or 1 million farmed animals out of fewer. You can't say the former is better because there are more wild animals. It doesn't matter how many there are. What matters is how many you help and how much. And there is an asymmetry here where farmed animals are probably 100% helped if humans are disempowered---the problem is totally fixed---whereas, even in the best case scenario, empowered humans will be nowhere near totally fixing wild animal suffering. This asymmetry may compensate for the fact that there are many more wild animals to help.

Humans increasing or decreasing the number might be the largest impact

As in (D) is more plausible than (C) (in my typology)? I'd agree. Anyway, my argument holds independently of what people find more likely between (C) and (D).

Effective Altruism Forum
EA Forum

AI Safety and Cross-Species Robustness: A brief critical review

66

0. Summary

1. AI safety may not help everyone

1.1 Saving humans means saving animal farming for at least some time

1.2 Saving humans has unclear implications for wild animal welfare

1.3 Trading off farmed and wild animal welfare

1.4 What all this does and does not imply

2. Can we defend AI safety regardless?

2.1 Background on why cross-species robustness would have been convenient

2.2 Three ways of doing without cross-species robustness

3. Conclusion: The relevance of off-target effects

Acknowledgments

66

Reactions

More posts like this