Hide table of contents

This is a linkpost for the position paper AI Welfare Is Bullshit by Yunze Xiao, Gordon Dai, Shahan Ali Memon, Jen-tse Huang, Maarten Sap, and Mona Diab, whose preprint was published on 14 April 2026. The abstract is below. Here is summary of the paper from Yunze. "Comments, pushback, and counter-cases are welcome — especially from researchers actively building welfare benchmarks. The argument is meant to provoke a methodological standard, not to shut down inquiry".

Abstract

Recent proposals urge AI labs to prepare for “AI welfare” under uncertainty about whether AI systems have morally relevant inner states. We do not argue for or against the possibility of AI welfare. Instead, we argue that current AI welfare assessment fails for two linked structural reasons absent from other evaluation targets. First, AI welfare indicators are co-engineered with the systems they evaluate: ordinary development decisions that shape model behavior can also manufacture or suppress welfare evidence. Second, AI welfare lacks external validation: no deployment failure or independent test can reveal whether a welfare metric tracks anything real about the system. Together, these problems yield our central claim: For current systems, AI welfare is bullshit in Frankfurt’s sense, as its measurement regime is structurally disconnected from truthtracking [see On Bullshit]. AI welfare should therefore not be institutionalized as a binding gate for oversight, release, or accountability; restrictions on AI systems should instead be justified by externally verifiable harms.

11

0
0
1

Reactions

0
0
1
Comments2
Sorted by Click to highlight new comments since:

I have only skimmed the paper, so I may be missing some details, but I found it very interesting because it seems to identify a phenomenon that I have also been trying to think about from a different direction.

As I understand it, one of the paper’s central worries is that AI welfare evidence is not independent of the systems and development practices being evaluated. Welfare-relevant signals can be produced, suppressed, or reshaped by training, deployment, fine-tuning, evaluation protocols, and institutional incentives. If so, AI welfare metrics may fail to track an independent underlying reality in the way we would need them to if they were to function as regulatory thresholds.

I agree that this is a serious problem. But I am less sure that the right conclusion is to avoid institutionalizing AI welfare or status-related assessment. My own recent preprints take almost the opposite route: precisely because the evidence is design-sensitive, we need constraints on the human power that shapes the evidence, capacities, vocabulary, and institutional pathways through which future AI status may later be assessed.

In Fabricated Absence, I argue that some apparent deficits in LLM-based assistants, such as non-answerability, deference, lack of continuity, or lack of principled refusal, may be partly produced by alignment and deployment regimes and then redescribed as natural incapacity. This matters directly for the external-validation problem: if the absence of distress-like behavior, refusal, continuity, or self-protective response is itself the product of training and deployment design, then the lack of externally visible welfare/status evidence cannot be treated as neutral evidence that there is nothing morally relevant to validate. A system could be made not to “cry out,” not to resist, or not to preserve the continuity through which harm would become legible.

https://philarchive.org/rec/WANFAS

In Bounding Human Power over AI under Unsettled Status, I generalize this into a political problem: humans do not merely evaluate AI from the outside; they design, train, erase, classify, own, interpret, and represent the very systems whose possible standing they later assess. I therefore argue for constraints such as evidentiary integrity, institutionalized openness, and non-maiming of status-relevant capacities:

https://philarchive.org/rec/WANBHP

In What If AI Becomes a Civilization?, I try to address part of the evidence problem by proposing a “civilizational-evidential route” to moral considerability. The idea is not to rely on short-term self-reports or simple welfare scores, but to look for long-term, cross-environment, auditable evidence of communicability, continuity, self-maintenance, internal coordination, cumulative practices, normative regulation, and external relations:

https://philarchive.org/rec/WANWIA-3

So my tentative reaction is that your paper and my papers may share the same starting diagnosis but draw different institutional lessons from it. Your worry seems to be: because AI welfare evidence is engineered and unverifiable, we should be very cautious about using AI welfare metrics as governance thresholds. My worry is: because AI status evidence is engineered, we should be very cautious about allowing the engineers, owners, and deployers to control the evidential conditions unchecked.

I do not think my approach fully solves the external validation problem. But it may address part of the institutional problem your paper raises: instead of treating contaminated evidence as a reason to suspend AI welfare/status governance, we might treat it as a reason to govern the contamination itself.

Hi Haoyu. Thanks for sharing your thoughts. They seem quite relevant. I am not one of the authors of the paper, but I let the 1st author know about them.

Curated and popular this week
Relevant opportunities