Thanks for spelling this out, Vasco — yes, that’s a fair clarification.
When we say that pain intensities are defined as “absolute” in WFF, this is meant in a conceptual and operational sense within a shared intensity vocabulary, not as a claim that no interspecific adjustments are needed in practice. The statement you quote is explicitly conditional (“if shrimps were capable of experiencing Excruciating pain”) and is held as a temporary, simplifying assumption to allow measurement of time spent in different intensity categories, while recognizing that the true placement of experiences on an absolute scale across taxa remains an open scientific problem.
At a personal scientific level, I find it very implausible that the affective capacity of a shrimp and that of a human are comparable. However, because this remains an unresolved empirical question, the framework itself is intentionally agnostic and requires that any interspecific adjustments be made explicitly and post-quantification, rather than being implicitly embedded in the core estimates.
Thanks, Vasco. We recognize that for most specific interspecies comparisons, affective capacity (not probability of sentience) is indeed crucial, but this remains an open scientific question. For that reason, the Welfare Footprint Framework is intentionally agnostic about correction values for interspecific scaling: welfare estimates are produced without such corrections, and any assumptions about differences in affective capacity must be applied explicitly and transparently as optional post-quantification adjustments when particular comparisons require them, rather than being implicitly folded into the core estimates.
Hi Vasco,
In the Welfare Footprint Framework, pain and pleasure intensities are defined as absolute categories conditional on an experience being affective, and uncertainty about sentience is treated upstream as a separate epistemic issue rather than folded into intensity probabilities. The closest point of contact between these questions is affective capacity—since different organisms may plausibly reach different intensity ranges or resolutions, as discussed in our article—but probability of sentience is not part of the equation, because the definition of sentience we adopt is itself conditioned on the capacity to experience affective states.
Hi Itamar — congratulations on all these initiatives.
As promised in our private exchange, I wanted to lay out an architectural idea I’ve been exploring for LLM-based applications, which may be useful to others building similar tools. I don’t know how novel this is, but in a world where many tools will increasingly rely on AI, I think it’s a good general practice.
The core idea is simple: all AI prompts live in a dedicated, human-readable folder, separate from application logic.
There are two main reasons for this.
First, radical transparency. If an application makes claims, recommendations, or interpretations that matter ethically or scientifically, then the instructions guiding the AI are part of what should be open to scrutiny. Keeping prompts in a clearly accessible place makes the system legible not only to developers, but also to researchers, ethicists, and communities like EA or academia who may want to understand how conclusions are being generated, not just what the interface shows.
Second, a clean separation between scientific or ethical content and engineering plumbing. Prompts often reflect underlying assumptions, value choices, and ways of thinking about a problem. Keeping them visible and separate from the rest of the code helps ensure that changes to how the AI reasons or frames an issue are intentional and easy to review, rather than happening quietly as a side effect of technical work. In practice, this folder is meant to be the main reference for what the AI is told to do, while the surrounding code simply handles running it.
In our Welfare Food Explorer app, for example, this structure allows researchers and non-developers to easily find, read, and reason about what the AI is being instructed to do, without needing to navigate the rest of the codebase.
We adopted this approach because in applications that touch science, ethics, welfare, or normative interpretation, how the AI reasons is part of the substance of the system itself. Making prompts visible, inspectable, and discussable helps treat AI behavior as something that can be examined, debated, and improved by a broader audience.
I hope this perspective is useful to others. Cheers.
Thanks for the kind words! I am glad you found the tool useful.
A quick update: we’ve now expanded the system so it doesn’t just quantify negative affective experiences, but also positive ones. Because of this broader scope, the new version is called Hedonic-Track GPT, and it is gradually replacing the earlier Pain-Track GPT.
We may need to update this article soon so readers are directed to the most current tool. In the meantime, you can find the link to the Hedonic-Track GPT here:
Thank you very much for this thorough analysis and for the constructive comments.
Cynthia will address the points related to the results of the study, while I’ll focus here on the methodological aspects.
One of the most important points you raise touches on the core of the Welfare Footprint Framework itself: we recognize that inferring the affective states of other beings is enormously challenging—both in scope and depth. This task can never be complete; it will always require revisions and corrections as new evidence becomes available. The Welfare Footprint Framework is, in essence, an attempt to structure this challenge into as many workable, auditable pieces as possible, so that the process of inference can be progressively improved and openly scrutinized.
You are absolutely right that several painful conditions in chickens were not included in this initial analysis. This was a conscious decision—not because those harms are unimportant, but because we had to start with a subset that we judged to be among the most influential and best documented. The framework is designed precisely so that others can build upon it by incorporating additional conditions, refining prevalence estimates, or reassessing intensities. In that sense, this work should be seen as a living model, not a closed dataset.
Regarding the concern about the lack of use of high-quality statistical techniques, our approach is pragmatic. Where robust statistical analyses are feasible—such as in estimating prevalence or duration—they are of course welcome and encouraged. But in areas where measurement is currently impossible—most notably the intensity of affective states—we deliberately avoid mathematical sophistication for its own sake. No amount of elegant equations can compensate for the fact that subjective experience is, for now, beyond direct measurement. What we can do is gather convergent evidence from different sources - e.g. behavior, physiology, neurology, evolutionary reasoning - and generalize that evidence into transparent, revisable estimates, and make every assumption explicit so that others can challenge and adjust them.
As for the legitimacy of this approach, we believe that, while imperfect and always improvable, quantifying affective experiences is vastly more informative than relying solely on indirect indicators such as mortality. Animals can live long, physically healthy lives that are nevertheless filled with frustration, chronic pain, fear, or monotony—forms of suffering invisible to metrics that focus only on death or disease. By directing efforts toward gathering as much evidence as possible to infer the intensity and duration of each stage spent in negative and positive affective states, we can begin to capture what actually matters to the animal.
The framework has also evolved since this analysis was first produced. At that time, we focused primarily on negative affective states, but we have now extended the methodology to include Cumulative Pleasure alongside Cumulative Pain. Positive affective states are now being systematically quantified using the same operational principles, creating a fuller picture of animal welfare.
Finally, we are developing an open, collaborative platform where Pain-Tracks and Pleasure-Tracks can be published, discussed, and iteratively improved by the broader scientific community. Each component of a track—for example, the probability assigned to a certain intensity within a phase of an affective experience—could be challenged and refined, potentially even through expert voting or consensus mechanisms. The aim is to make welfare quantification transparent, dynamic, and collective rather than proprietary.
Thanks again for putting our work under the microscope—this is exactly what it needs. The Framework is meant to evolve, and feedback like yours helps it grow in the right direction.
Thanks Vasco. I’d like to clarify that Disabling Pain is also a severe/intensive level—think of it as the kind of crippling back pain or intense headache that prevents any enjoyment or productivity. And our project study found that moving a hen from a furnished cage to a cage-free aviary prevents, on average, hundreds of hours of Disabling Pain during her laying life. Specifically, transitioning to cage-free systems avoids approximately 275 hours of Disabling pain ( https://welfarefootprint.org/laying-hens).
Additionally, as argued in the book, the estimates for Excruciating Pain were extremely conservative (i.e. Cumulative Pain in both cage systems is likely higher than estimated). We'll have full estimates soon, once 'The Welfare Footprint of the Egg' is released.
Thanks, Vasco. I think we’ve clarified where our frameworks diverge—you prioritize maximizing expected welfare, assuming that equivalences across intensities are possible once the time component is introduced (an assumption I don’t share), whereas I tend to emphasize minimizing the most intense forms of suffering. Both approaches have their merits, but they naturally lead to different prioritizations. Perhaps we can just agree to disagree on this point.
Thanks, Becca — really glad you took a look and liked it.
On your point about how this relates to certifications and similar tools: we see this as strongly complementary to them, not an alternative. When Welfare Footprint estimates become available, our hope is that they’ll be usable in many different ways by different stakeholders — including certification initiatives themselves — rather than being tied to a single interface or application.
This app is best understood as an early exploratory step: a way of seeing how people actually engage with welfare information, what resonates or causes confusion, and how different framings influence choices. We hope these insights can be useful not just for us, but for anyone thinking about how WFF-style estimates might be effectively deployed beyond a single app.