The (Ψ) Interspecific Affect GPT: A Tool for Interspecies Welfare Scaling

Wladimir J. Alonso; cynthiaschuck

This post is the third in a short series about sentience, ceilings of Pain and Pleasure intensity (‘affective ceiling’, or ‘affective capacity’), and how to make interspecific welfare comparisons more explicit without creating false precision.

In the first post, we approached the problem through what was essentially a physics or information-encoding lens, treating affective experience as a system for encoding and transmitting biologically relevant information. In this view, Pain and Pleasure are signals whose properties can be analysed in terms of how information is represented and processed. Taking the human affective system as a reference point, we focused on two key features of this signalling system: the range of intensities it can represent, and the resolution with which differences in intensity can be distinguished. We then explored whether early forms of sentience would already exhibit these attributes, or whether they might instead be associated with more limited affective range and/or resolution.

In the second post, we approached the problem through an evolutionary lens, arguing that sentience and affective capacities should be treated like other biological traits: shaped by evolutionary constraints and payoffs, with extreme affective intensities requiring adaptive justification rather than arising automatically from neural complexity.

This third post marks the transition from the theoretical framework to its practical application. It introduces an experimental tool, the Interspecific Affect GPT, designed to operationalize those ideas alongside additional ones presented here, in a way that may eventually support interspecific welfare comparisons.

What problem the Interspecific Affect tool addresses

Within the Welfare Footprint Framework (WFF), welfare is quantified as time spent in different affective intensity categories, where Pain and Pleasure are operationally defined as any negative or positive affective states. Intensity categories are assigned relative to each species’ own behavioral and physiological indicators of perceived Pain and Pleasure intensity, allowing robust within-species comparisons without requiring interspecific assumptions.

A major unresolved challenge arises, however, when one tries to compare welfare across species. This difficulty is not unique to the WFF; it applies to any welfare metric, since the intensity of affective experiences across different sentient beings remains one of science’s most profound unknowns. In the WFF, this is acknowledged explicitly in Module (Ψ) Interspecific Scaling, which highlights the need for transparent methods to address cross-species comparability without distorting the integrity of species-specific welfare assessments.

At least two distinct questions arise when comparing welfare across species. First, there is the ceiling question: what is the highest affective intensity a given species can plausibly reach? Second, there is the time-mapping question: if different species can potentially experience comparable affective intensities, does a given unit of clock time correspond to the same magnitude of experienced Pain or Pleasure?

Both issues have appeared in the broader literature on welfare range and interspecific comparison, but we make the distinction explicit here because the two questions are often run together. Our view is that the ceiling question is frequently the more decisive one: if a species cannot plausibly reach very high intensities, later adjustments involving duration or aggregation cannot place it on a par with a system that can. In other words, limits on maximum intensity constrain the overall magnitude of Pain, regardless of how experience unfolds over time. The Interspecific Affect GPT is therefore designed to address this first question, explicitly and provisionally: given the available evidence, where does a species’ maximum plausible affective capacity lie relative to a human-anchored reference scale? In doing so, the tool adopts an evidence-constrained approach: it aims to identify the best-supported upper bound, rather than to assume broader or more extreme affective capacities in the absence of positive justification.

This focus on parsimonious or evidence-constrained ceiling estimates should not be confused with a rejection of precautionary reasoning. The tool is designed to answer a scientific question: given the available evidence, what is the best-supported upper bound on a species’ affective intensity? That is a narrower question than what ethics or policy must answer. When it comes to practical decisions about whether to impose conditions that may cause suffering, the possibility that a species can feel more intensely than current evidence supports remains a legitimate concern. Because the costs of underestimating animal suffering can be high, there are good reasons to apply precautionary reasoning in such decisions even when the scientific analysis yields only a constrained ceiling estimate. These are two distinct steps, and the evidential caution appropriate for one does not automatically carry over to the other.

This tool also serves as a test case for how large language models, current and forthcoming, may support structured reasoning on a scientifically difficult and philosophically contested issue, while offering a more transparent, disciplined, and evidence-sensitive way of approaching the problem.

The role of human-anchored reference categories

One difficulty in interspecies comparisons is that the same terms for the intensity categories (Annoying, Hurtful, Disabling, Excruciating) can be used across taxa as if they referred to comparable magnitudes. Often, they do not.

To reduce this ambiguity, the tool presented here uses human-anchored reference categories, formally introduced in this text, and identified by the (h) suffix (Box 1). These categories do not replace the usual species-internal categories used within WFF analyses; they only come into play when cross-species comparison is required.

The purpose of the notation is not to claim that two species experience the same intensity whenever they are mapped to the same reference category. Rather, it is to create an explicit shared scale for asking a narrower question: if a species is sentient, how high could its maximum perceived Pain intensity plausibly go relative to human reference levels?

Box 1. Human-Anchored Affective Intensity Categories as Absolute Reference Points

A core challenge in interspecific welfare analysis is how to compare affective intensity across species. To address this, the present tool uses human-anchored negative-affect reference categories, marked by the suffix (h): Annoying(h), Hurtful(h), Disabling(h), and Excruciating(h). These serve as absolute reference points for ceiling analysis of Pain intensity.

This notation is distinct from the standard Welfare Footprint Framework use of Annoying, Hurtful, Disabling, and Excruciating as categories that are internal to a species’ own indicators (e.g., behavioural, neurophysiological). Within a given species, these labels refer only to relative differences in affective intensity inside that species and do not imply cross-species equivalence. By contrast, the (h) categories are used only when explicitly making interspecific inferences.

Accordingly, the highest Pain state identifiable within shrimp is conceptually distinct from Excruciating(h), which refers to the human reference level of pain associated with extreme phenomenological intensity and severe functional disruption. Whether the former plausibly reaches the latter is an empirical and theoretical question, not a definitional one.

What the tool does, and does not

The Interspecific Affect GPT is designed to answer a narrow question:

If a system is sentient, what is the plausible upper bound of the intensity of Pain it could experience, relative to a human-anchored reference scale?

That is its central task. It does not compute moral weights, produce rankings, or generate prioritization recommendations. It does not quantify the magnitude of species-to-species differences in affective capacity. Nor does it claim to settle sentience debates. Instead, it aims to clarify one upstream scientific input upon which many downstream ethical analyses rely, often without making that input explicit.

For that reason, the tool is best understood not as a calculator but as a structured reasoning scaffold designed to prevent premature convergence on comfortable answers. Its goal is not to replace expertise, but to make expertise easier to apply by forcing assumptions, inferential steps, and disagreements into a more explicit and inspectable form.

We welcome criticism and feedback on the tool itself, as well as on the reasoning structure it embodies. In a field as scientifically difficult and ethically sensitive as interspecies welfare comparisons, such criticisms are very much needed and valued.

How the Interspecific Affect GPT works

(Note: This description reflects the current instruction set as of this draft. Future versions may evolve, but the account below matches the present public architecture)

The following information is not required to use the tool—its interactions are designed to be self-guided and self-explanatory—but is provided for readers who want to understand its rationale in more detail. The tool is guided not only by its stepwise instruction set, but also by a curated supporting knowledge base. This includes methodological notes, definitions, examples of good and bad outputs, and selected theoretical materials used to stabilize how the workflow is interpreted across cases. In particular, the knowledge base helps the tool keep distinct the sentience gate, the affective-ceiling analysis, and the meaning of the human-anchored reference categories. It is best understood not as a fixed database of answers, but as a supporting layer that helps the model apply the framework more consistently and transparently.

The workflow is structured so that the tool does not jump directly from a user’s query to a ceiling estimate. Instead, it separates a series of questions that are often run together in informal discussion: What taxonomic scope is appropriate? What methodological assumptions are being made? At what level should sentience be assessed? What evidence bears specifically on affective ceiling rather than merely on responsiveness or nociception? Only after those issues have been made explicit does the tool move toward a provisional ceiling estimate. The stepwise design is meant to make the reasoning easier to inspect, challenge, and revise.

Step 1: Input framing and scope

The user inputs a target taxon. The tool then determines, where necessary, two related but distinct scopes: one for the sentience gate and one for the affective-capacity analysis. This distinction matters because evidence relevant to sentience may sometimes be available at a broader taxonomic level than evidence relevant to affective ceiling. The current instruction set therefore allows the sentience scope to be broader when the evidence base is broader, while keeping the affective-capacity scope as close as possible to the user’s target unless broader generalization is necessary. In both cases, the tool is expected to justify the taxonomic scope in terms of the evidence available and potential heterogeneity among the species in the target group.

Step 2: Methodological commitments check

Before proceeding, the tool makes its key methodological commitments explicit and asks whether the user wishes to keep or modify them. These include: the Welfare Footprint Framework use of Pain and Pleasure as umbrella terms for negative and positive affective states; the use of an epistemic sentience classification rather than a numerical sentience score; a biological parsimony default in the absence of positive evidence for broader or more extreme affective ranges; the rule against inferring one taxon’s ceiling from another without explicit justification; and the decision, for present purposes, to analyze intensity ceiling prior to potential interspecific differences in the subjective perception of time. This step is important because it turns assumptions that are often left implicit into assumptions that can be criticized, revised, or rejected before the analysis proceeds.

Step 3: Sentience plausibility and operational gate

Using the sentience scope defined in Step 1, the tool conducts an initial evidence-based appraisal of sentience status and classifies it as Plausible, Contested/Uncertain, or Not coherent. This step is intentionally brief and classificatory rather than a full literature review. Importantly, the tool treats sentience primarily at the broader or ancestral taxonomic scope selected in Step 1, rather than reopening it by default at the level of the narrower target taxon. When the broader sentience scope is supported by convergent evidence, nested taxa are ordinarily treated as inheriting that status unless there is strong subgroup-specific counterevidence or a biologically serious reason to suspect loss or radical divergence.

The current instructions direct the GPT to consult Feinberg’s From Sensing to Sentience as one relevant source at this gate-setting stage, while cross-checking against other theoretical frameworks and empirical evidence.

Step 4: Evidence and indicator assembly

Using the affective-capacity scope defined in Step 1 as the primary unit of analysis, the tool assembles the evidence specifically to inform the affective ceiling question: what level of intensity this system could plausibly reach. It maps evidence across behavioural, neural , neurophysiological, pharmacological, cognitive/representational, and evolutionary domains, while allowing broader evidence only when its relevance to the narrower taxon is explicitly justified.

For each relevant indicator, the tool must state what construct it bears on, whether it informs sensitivity, capacity, both, or neither, what its main strengths and limitations are, and where it may mislead through false positives or false negatives. The point is not merely to list evidence, but to clarify what inferential work each indicator can and cannot do in constraining the plausible affective ceiling. The instructions also make explicit that nociception, defensive behaviour, or simple stimulus responsiveness do not by themselves establish advanced affective capacity.

Step 5: Affective capacity scope

Only after the evidence has been organized does the tool ask what sort of affective architecture the target plausibly has. Here the emphasis is on whether the biological hardware plausibly supports coordinated, high-intensity affective integration, whether through centralized mechanisms or functionally integrated distributed ones. This step is meant to assess architectural support and its limits, while distinguishing direct support from speculative extrapolation; it is not yet the stage at which a final or near-final human-anchored ceiling is assigned.

Step 6: Ceiling inferences and three mandatory checks

At this stage the tool proposes a provisional human-anchored ceiling, either as a single category or a range, and then stress-tests it in three ways. First, through a process we refer to as “Cost of Intensity Check” by asking: would states analogous to Disabling(h) or Excruciating(h) be evolutionarily and biologically justified by the species’ life history and metabolic budget?

Second, it applies an “Alternative Hypothesis Check”. The framework defaults to narrow ceiling estimates in the absence of positive evidence for broad affective range, consistent with biological parsimony. At the same time, the tool checks a specific counter-argument: could a lack of regulatory control over affective states result in intense but poorly modulated negative experience? This possibility is treated as speculative unless the evidence specifically supports it. It remains a live counter-hypothesis, but not one that overrides the parsimonious default in the absence of positive evidence.

Third, the tool uses a final “Convergence Check”: if the evidence is in conflicting directions, the output should widen the uncertainty bounds rather than force an artificially crisp answer.

Step 7: Red-team / steelman

The tool must then construct the strongest plausible case against its own conclusion. Depending on the case, this may involve arguing that the current estimate is too low, too high, or otherwise too confidently bounded. The aim is to expose where the conclusion is most vulnerable to reinterpretation and to ensure that the final output reflects engagement with the strongest serious challenge, rather than premature convergence on a comfortable answer.

Step 8: Final dossier, subjective time, and research priorities

The final output is not a moral weight or a ranking of species, but a dossier containing the sentience-plausibility judgment, the affective-ceiling estimate, a brief assessment of whether subjective time could become relevant in later analysis, a research-priority judgment identifying the single experiment most likely to change the conclusion, and a disagreement log summarizing the user’s objections.

At the end of each step, the users are asked whether they agree, disagree, or want to revise any part of the reasoning before the next step. These interactions are designed to transform disagreement into more explicit and inspectable arguments, reducing the impression of false certainty.

Selected Readings and References

Alonso, W. J., & Schuck-Paim, C. (2023). Welfare metrics and welfare indicators: Clarifying essential concepts in animal welfare assessment. https://doi.org/10.17605/OSF.IO/AQ2BM

Alonso, W. J., & Schuck-Paim, C. (2025). Welfare Footprint Framework: Methodological foundations and quantitative assessment guidelines. https://doi.org/10.17605/osf.io/94bxs

Alonso, W. J., & Schuck-Paim, C. (2025a). Do primitive sentient organisms feel extreme pain? Disentangling intensity range and resolution. EA Forum.

Alonso, W. J., & Schuck-Paim, C. (2025b). When Feeling Is Worth It: A Cost–Benefit Framework for the Evolution of Sentience. EA Forum.

Birch, J. (2024). The edge of sentience: Risk and precaution in humans, other animals, and AI. Oxford University Press.

Browning, H., & Birch, J. (2022). Animal sentience. Philosophy Compass, 17(5), e12822.

Churchland, P. S. (2002). Brain-wise: Studies in neurophilosophy. MIT Press.

Crump, A., Browning, H., Schnell, A. K., Burn, C., & Birch, J. (2022). Sentience in decapod crustaceans: A general framework and review of the evidence. Animal Sentience, 7(32), 1.

Damasio, A. (1999). The feeling of what happens: Body and emotion in the making of consciousness. Harcourt.

Damasio, A. (2010). Self comes to mind: Constructing the conscious brain. Pantheon.

Elwood, R. W. (2011). Pain and suffering in invertebrates? ILAR Journal, 52(2), 175–184.

Feinberg, T. E. (2024). From sensing to sentience: How feeling emerges from the brain. MIT Press.

Feinberg, T. E., & Mallatt, J. (2016). The ancient origins of consciousness: How the brain created experience. MIT Press.

Ginsburg, S., & Jablonka, E. (2019). The evolution of the sensitive soul: Learning and the origins of consciousness. MIT Press.

Ginsburg, S., & Jablonka, E. (2022). Pain sentience criteria and their grading. Animal Sentience, 7(32), 13.

Godfrey-Smith, P. (2016). Other minds: The octopus, the sea, and the deep origins of consciousness. Farrar, Straus and Giroux.

Godfrey-Smith, P. (2020). Metazoa: Animal life and the birth of the mind. Farrar, Straus and Giroux.

Panksepp, J. (2005). Affective consciousness: Core emotional feelings in animals and humans. Consciousness and Cognition, 14(1), 30–80.

Sneddon, L. U., Elwood, R. W., Adamo, S. A., & Leach, M. C. (2014). Defining and assessing animal pain. Animal Behaviour, 97, 201–212.

Schukraft, Jason. 2020. Differences in the Intensity of Valenced Experience across Species. Rethink Priorities. October 29, 2020.

Solms, M. (2021). The hidden spring: A journey to the source of consciousness. W. W. Norton & Company.

SummaryBotApr 102

Executive summary: The author introduces the Interspecific Affect GPT as a structured, evidence-sensitive tool to estimate species’ maximum plausible affective intensity relative to humans, aiming to make interspecies welfare comparisons more explicit without claiming precision or resolving downstream ethical questions.

Key points:

The post transitions from prior theoretical work on affective capacity (information-processing and evolutionary lenses) to a practical tool for interspecific welfare comparison.
A central unresolved problem in welfare science is comparing affective intensity across species, especially regarding maximum intensity (“ceiling”) and how experience maps to time.
The author argues the ceiling question is often more decisive, since limits on maximum intensity constrain total possible suffering regardless of duration.
The tool focuses narrowly on estimating a species’ upper bound of pain intensity relative to a human-anchored reference scale, not on assigning moral weights or rankings.
It introduces human-anchored categories (e.g., Annoying(h), Excruciating(h)) to create a shared reference scale without implying equivalence in actual experience.
The tool is intended as a structured reasoning scaffold that makes assumptions, evidence, and disagreements explicit and open to criticism, rather than a calculator or decision rule.
It adopts methodological commitments such as biological parsimony, explicit separation of sentience and affective-capacity analysis, and avoiding unjustified cross-taxon inference.
The workflow proceeds stepwise: defining taxonomic scope, checking assumptions, classifying sentience plausibility, reviewing multi-domain evidence, assessing affective architecture, and inferring ceilings with stress tests.
Ceiling estimates are tested via evolutionary “cost of intensity,” alternative hypotheses (e.g., poorly regulated intense states), and convergence checks that widen uncertainty when evidence conflicts.
The tool includes a red-teaming step to challenge its own conclusions and produces a final dossier with sentience judgment, ceiling estimate, uncertainty considerations, and research priorities.
The author emphasizes that the tool is for disciplined scientific inference, distinct from how uncertainty should be handled in ethical or policy decisions, and invites criticism and iteration.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Effective Altruism Forum
EA Forum