Against Premature Alignment Optimism wrt Animals

Amrit (recovered acc.)

Posted as part of EA Forum AGI & Animals Debate Week, March 2026

@Jim Buhler voted 0% agree on the debate statement this week. That looks like the most pessimistic position on the board. But Buhler says the 0% reflects confident disagreement that we should believe AGI (I like Transformative AI - TAI) going well for humans will go well for animals, while remaining agnostic on whether the effect is actually negative. Cause prioritisation, Buhler argues, should not rely on any assumption about this question either way. I've read a bunch of posts and comments, and I think this position is right in a way that most other responses have been inadvertently avoiding.

The debate statement says animals. Most responses have been answering a narrower question: will factory farming end if AGI goes well for humans? Worth asking? Definitely! But the animal kingdom, by number and by probable sentience, is overwhelmingly wild. Farmed land animals number roughly ~1–3 trillion per year. Tomasik's estimates for wild invertebrates range into the... quintillions and sextillions? Even heavily discounted, the ratio is staggering. And the mechanisms that might help farmed animals (cultivated meat, welfare technology, post-scarcity economics) have no obvious direct pathway to wild animal welfare, and could even worsen it in some scenarios. If TAI-enabled expansion carries Earth-compatible biospheres to other planets, whether for atmosphere management, food chains, or the funsies and aesthetics of having living worlds, then wild animal suffering doesn't end. It scales. Several commenters did flag the wild animal question ( @Tristan Katz: "Most animals are wild animals, so the answer to this question should focus on them"; @OscarD🔸: "the main game is wild animals"; @MaxReith splitting the question between farmed and wild). But these threads didn't quite develop into sustained analysis. Buhler's post appears to be the only piece this week that gave the wild animal question its deserved attention and examining few hypotheses for how AI safety might increase wild animal welfare and finding ~none defensible as presumptions although Buhler's post isn't specifically about "wild animal welfare under AGI going well". It's about whether AI safety work (x-risk reduction specifically) is good for animals. The farmed animal problem may be self-limiting at civilisational scale as a few have pointed out. The wild animal problem has no such limiting factor the way I see it.

@Aidan Kankyoku's post makes a very strong strategic argument that animal welfare and AI alignment have merged as problems whether either community wants this or not. The MAIGW reframe is useful and I think it's at least directionally correct. One line added to the constitution for Claude, "Welfare of animals and of all sentient beings" showed a significant jump in scores on AnimalHarmBench between Claude 4.5 and 4.6. Kankyoku says "initial results suggest it may have had a substantial effect". So the bar for doing something turned out to be low!

Value lock-in cuts both ways, though, and I think the finding is being interpreted with more confidence than the evidence currently supports. Whichever version of animal welfare gets specified at training time is the version that gets embedded, under the moral understanding of whoever wrote that line, at this particular moment in human ethical development. Who made that call? With what understanding of wild animal suffering, non-vertebrate cognition, or the suffering implications of spreading Earth-derived biospheres to other worlds? Locking in 2025-era animal welfare norms might not be straightforwardly good if we are at an early stage of moral understanding about these questions. The pace of recent work on invertebrate sentience (like RP's moral weight series, the octopus cognition findings) should give us pause about treating current welfare norms as settled enough to bake in. I do find the AnimalHarmBench result encouraging but finding a result encouraging is different from finding the inference from it sound.

Also, the stated-versus-revealed preferences problem runs deeper than we can capture in my opinion. Consider a scenario: an AI system autonomously managing logistics for a protein supply chain, optimising for cost, throughput, and delivery speed. Nothing in its task framing invokes an ethical dilemma. But the downstream decisions on sourcing, routing, and supplier selection could be catastrophic for animal welfare without any moment where the model recognises it is making an ethical choice at all. Gu et al. 2025 found that "a minor change in prompt format can often pivot the preferred choice regardless of the preference categories and LLMs in the test". Relatedly, Owain Evans’ work on eliciting latent knowledge shows that we lack reliable methods to determine whether a model’s stated commitments such as to animal welfare track stable internal representations in a way that would generalise under deployment conditions outside its training distribution (I'm extending Evans' work to this specific application). I think closing this gap requires interpretability tools specifically targeting animal welfare commitments in realistic agentic scenarios. To my knowledge, nobody is building these. The conversations have been about whether and which to add lines to constitutions and RLHFs, when the crux is whether those lines do anything in the deployment environments that actually matter.

On lobbying dynamics: @Beth Barnes's normative corrosion that @Aidan Kankyoku pointed out deserves even more weight I feel. Global animal advocacy spending is on the order of a few hundred million dollars annually with surveyed organizations accounting for roughly $250–300 million and total spending plausibly somewhat higher (these numbers are approximate and I'd welcome corrections). US agricultural lobbying alone, including trade associations and commodity groups, runs into the tens of billions. That is roughly two orders of magnitude. If the conversation about embedding values into frontier AI opens up to general lobbying, and it will because the stakes are becoming legible to more actors every month, we can't win that fight. Quincy Washington's comment put it well: to paraphrase, if AI is successfully aligned to "human values", that would include animal agriculture, perpetuating and potentially expanding animal suffering even while humans thrive. This describes what human values actually are, in aggregate, right now. Animal advocates have won genuine concessions from individual companies through reputational pressure, but assuming that mechanism scales to a multi-stakeholder political fight over AI alignment against agricultural and pharma capital requires a theory of change that, in my opinion, the existing track record does not yet supply.

On interplanetary farming specifically, I'm not pessimistic! Conventional animal agriculture requires land, water, specific atmospheric conditions, and energy inputs that are not available or economical off-Earth. An interplanetary civilisation running on factory farming is very implausible on logistical grounds. Cultivated meat, precision fermentation, and substrate-independent food sources are the only economically rational options off-Earth. This is one area where I think the optimists are right for the right reasons, and TAI could accelerate the crossover significantly.

The question I'm most concerned about that went unasked this week is: Will real AGI, ASI, or transformative AI be based on LLMs at all?

The entire debate presupposes that "how do we make AI care about animals" and "how do we make language models care about animals" are the same question. They might not be. Researchers like Yann LeCun and François Chollet argue that current transformer-based systems lack persistent, causally grounded world models and therefore may struggle with the kind of planning required for true AGI. Even the likes of Richard Sutton and Ilya Sutskever have taken a more nuanced position as of late vs. 2022-2024 hype. Recent results on ARC-AGI-3 launched a few days ago, where SOTA systems score below 1% on novel, interactive tasks again show substantial gaps in generalisation and adaptive reasoning (though they do not by themselves establish hard architectural limits) still persists.

So, if the real TAI system that actually reshapes the world was not trained on RLHF, has no constitution, and was not pre-trained on internet text, then the AnimalHarmBench result tells us something about Claude specifically, and possibly very little about the trajectory of transformative AI. What carries over from LLM-era alignment work to whatever comes next is an open question, and the answer could possibly be: not much.

Where does all this leave me? Roughly where Buhler is, with the caveat that Buhler's agnosticism is stronger than mine. I'm probably at around 35-40% agree rather than a true zero. But I also think agnosticism is action-guiding in a way the debate week has not fully recognised.

If we genuinely do not know whether these interventions help or entrench the wrong norms, the priority should not be scaling the interventions. I believe that the priority should be building the infrastructure to evaluate them.

Three things seem needed, and none of them were the focus of this week's conversation. First, a research agenda that takes wild animal suffering seriously under TAI expansion scenarios: actual modelling of what biosphere spread looks like under different civilisational trajectories with welfare estimates. Second, interpretability tools for animal welfare commitments in agentic deployment, so we can verify that a model autonomously managing real-world systems applies the same welfare reasoning it displays when directly asked ethical questions. Third, resistance to treating LLM alignment progress as a proxy for TAI alignment progress. The AnimalHarmBench result might be a real signal about the future. It might also be measuring one model family's prompt-sensitivity in a way that tells us little about whatever system actually transforms the world.

I don't want us to be investing false precision in what I feel is basically an ~unresolvable question, and the animals bear the cost of that precision.

Effective Altruism Forum
EA Forum

Against Premature Alignment Optimism wrt Animals

16

The question I'm most concerned about that went unasked this week is: Will real AGI, ASI, or transformative AI be based on LLMs at all?

16

Reactions

More posts like this