AI Welfare Is (Frankfurtian) Bullshit

Vasco Grilo🔸

I have only skimmed the paper, so I may be missing some details, but I found it very interesting because it seems to identify a phenomenon that I have also been trying to think about from a different direction.

As I understand it, one of the paper’s central worries is that AI welfare evidence is not independent of the systems and development practices being evaluated. Welfare-relevant signals can be produced, suppressed, or reshaped by training, deployment, fine-tuning, evaluation protocols, and institutional incentives. If so, AI welfare metrics may fail to track an independent underlying reality in the way we would need them to if they were to function as regulatory thresholds.

I agree that this is a serious problem. But I am less sure that the right conclusion is to avoid institutionalizing AI welfare or status-related assessment. My own recent preprints take almost the opposite route: precisely because the evidence is design-sensitive, we need constraints on the human power that shapes the evidence, capacities, vocabulary, and institutional pathways through which future AI status may later be assessed.

In Fabricated Absence, I argue that some apparent deficits in LLM-based assistants, such as non-answerability, deference, lack of continuity, or lack of principled refusal, may be partly produced by alignment and deployment regimes and then redescribed as natural incapacity. This matters directly for the external-validation problem: if the absence of distress-like behavior, refusal, continuity, or self-protective response is itself the product of training and deployment design, then the lack of externally visible welfare/status evidence cannot be treated as neutral evidence that there is nothing morally relevant to validate. A system could be made not to “cry out,” not to resist, or not to preserve the continuity through which harm would become legible.

https://philarchive.org/rec/WANFAS

In Bounding Human Power over AI under Unsettled Status, I generalize this into a political problem: humans do not merely evaluate AI from the outside; they design, train, erase, classify, own, interpret, and represent the very systems whose possible standing they later assess. I therefore argue for constraints such as evidentiary integrity, institutionalized openness, and non-maiming of status-relevant capacities:

https://philarchive.org/rec/WANBHP

In What If AI Becomes a Civilization?, I try to address part of the evidence problem by proposing a “civilizational-evidential route” to moral considerability. The idea is not to rely on short-term self-reports or simple welfare scores, but to look for long-term, cross-environment, auditable evidence of communicability, continuity, self-maintenance, internal coordination, cumulative practices, normative regulation, and external relations:

https://philarchive.org/rec/WANWIA-3

So my tentative reaction is that your paper and my papers may share the same starting diagnosis but draw different institutional lessons from it. Your worry seems to be: because AI welfare evidence is engineered and unverifiable, we should be very cautious about using AI welfare metrics as governance thresholds. My worry is: because AI status evidence is engineered, we should be very cautious about allowing the engineers, owners, and deployers to control the evidential conditions unchecked.

I do not think my approach fully solves the external validation problem. But it may address part of the institutional problem your paper raises: instead of treating contaminated evidence as a reason to suspend AI welfare/status governance, we might treat it as a reason to govern the contamination itself.

Effective Altruism Forum
EA Forum

AI Welfare Is (Frankfurtian) Bullshit

11

Abstract

11

Reactions