Hide table of contents

TLDR: We looked at a lot of different systems to compare welfare and ended up combining a few common ones into a weighted animal welfare index (or welfare points for short). We think this system captures a broad range of ethical considerations and should be applicable across a wide range of both farm and wild animals in a way that allows us to compare interventions.  

The goal of Charity Entrepreneurship is to compare different charitable interventions and actions so that new, strong charities can be founded. One of the necessary steps in such a process is having a way to compare different animals in different conditions. For example, how does moving a chicken from a battery cage to cage free compare welfare wise for the chicken, or how does giving up red meat, thus resulting in one less cow being brought into existence, compare to an insect dying more humanely because of a change in which insecticide is used. These are complex questions surrounded by both ethical and epistemic uncertainty. In the health community, DALYs have become fairly common and established as a metric. Sadly, there is not the same level of consensus within the animal rights community. We expected there would be multiple competing systems, so we first outlined what we would look for within a system to assess its helpfulness to us. This could be described as the “goal” or purpose of the metric. Of course, the fundamental goal is to help us evaluate different possible actions, but more specifically, we broke down what we were looking for in the criteria below.  

Underlying goals of metrics

  • Proxies’ ethical value accuracy
    • Strength of correlation between the metric and ethical value
    • Encapsulation - captures a broad range of what is important
    • Directness
    • Gamability
  • Cross-applicability
    • Cross-intervention applicability 
    • Cross-animal applicability 
    • Ethical robustness
    • Externally understandable
    • External precedent of use
  • Operationalizability
    • Amenable to numerical quantification 
    • Ease/speed of use
    • Objectiveness 
    • Generates few false positives or false negatives
    • Intuitive to work with
    • Easy to collect
    • Easy to explain

After establishing what we were looking for, the next step was to take a look at all current systems and see if any of them was conducive or could be used partly by an organization like ours. We ended up finding quite a wide range.

EA community

We first looked within the EA community, since there had been some solid attempts at quantification and the ones below are just a few of many examples.

Within the EA community

These metrics were generally very hard, quantified, and often even explicitly cost-effectiveness focused. Sadly, they were also extremely specific and not built for generalization across different interventions and charities. Thus, for our purposes, they were more helpful as inspiration for the factors to consider, or standards that we would want to be able to measure, rather than for practical cross-intervention use.

Biology-based markers

The next set of metrics we looked at was biology-based markers. We had some background knowledge about cortisol readings as a measure of stress and hoped that we would find other objective markers that could make up part of a more inclusive system and add some objectivity to other soft systems. Some of the ones we considered (although, there are many other possible biological indicators) are listed below.  

Biology-based markers

  • Cortisol
  • Dopamine
  • Endocrine changes
  • Circulating catecholamines and corticosteroids
  • Death rate
  • Behavior changes
  • Visible injury rate 
  • Reduced life expectancy
  • Impaired growth
  • Impaired reproduction
  • Body damage
  • Disease 
  • Immunosuppression
  • Adrenal activity
  • Behavior anomalies
  • Self-narcotization

Biological markers were useful in that they were much less subjective than other metrics but sadly, it was also very hard to find consistent data across animals on many of them (with the death rate being a notable exception). We ended up thinking these would make up a part of a larger system, but even an index of them would not be inclusive enough to cover all the possible sources of animal welfare situations that could occur. 

Academic measures of quality

The third type of system we considered was “academic measures of quality of life”. WAS research had a great summary of many of the different systems used, but we also looked outside of their research for other possible systems.

Academic measures 

  • Five freedoms
  • The Five Domains model
  • Five Provisions model
  • Botreau’s twelve criteria
  • McMillan’s five elements, which play a fundamental role in quality of life
  • Fraser’s animal welfare’s four core values
  • Webster’s animal welfare’s three questions
  • Taylor and Mills’s domains for assessing companion animal’s quality of life:
  • Swaisgood’s ten motivational theories which have currency among animal-welfare researchers

Many of these systems were beautifully comprehensive and described metrics and criteria in such a way that it would be cross-applicable to a wide range of animals across a wide range of conditions. Some even specified different grade levels (although, these were generally not numeric) to provide more consistency across reports. It seemed possible that some researchers would have already used these systems, though sadly, we did not find much research showcasing the modern practical use of these systems. The main drawback of these systems was their subjectivity. Even with the ones with specific grade levels, a lot would be left up to the evaluator about making calls between one system and another: for example, how does not being fed for several days, while being otherwise perfectly fed, compare to semi-chronic but low level hunger. Overall, we took a large number of elements of our system from the five domains model, which felt like the most extensively quantified and broad one of these models. 

Systems used in global poverty

Next, we considered the current systems used in global poverty alleviation and other cause assessment areas. We thought it might be possible to modify one of these metrics to be usefully applicable to animals. 

Modified poverty based metrics

  • Animal QALYs
  • Animal DALYs
  • Animal Income
  • Animal subjective well-being estimates
  • Equivalent lives saved
  • Preference from behind the veil of ignorance 

Generally, these metrics were too unapplicable (e.g. income) or would have required considerably more time to modify and put into the animal welfare context (e.g. DALYs do not have a way to have a net negative existence, which is a key consideration in the case of factory farmed animals).

Creating our own system

Finally, we considered creating a cross-applicable system from scratch

Our own ideas for possible systems

  • SAD - suffering-adjusted life-day
  • Sentience-adjusted suffering years
  • Net negative lives averted 
  • Total world net expected value 
  • Numerical criteria for animals’ quality of life, e.g. a -100 to 100 rating

We did end up using some of the ideas drawn from considering this option but, overall, found that taking elements from other systems would both increase quality and reduce the time that we would otherwise spend on creating a new system from scratch. 

Results: an inclusive index 

We ended up putting many of these systems onto a spreadsheet and comparing them on the original metric criteria we had derived. Some criteria ended up getting narrowed down. For example, we combined various biological markers into a single “biological markers” category. Some criteria were made more numerical and cross-comparable, for example, by translating the 5 domains model into number-based scores, instead of grades. Other elements were given their own category and weighting based on how well they met the top line criteria (for example, death rate). Most criteria were ruled out as redundant or not helpful for our purposes. 

We ended up with 8 criteria with an importance weighting for each. Combined, they added to a range of +100 (an ideal life) to -100 (a perfectly unideal life) with 0 representing uncertainty about the life being net positive or negative. Each area can have positive or negative welfare scores and is to be rated independently, giving a more robust cluster approach to the overall endline score. The weighting of each factor is different, depending on how well it scored on our original metric criteria. For example, death rate gets a relatively higher weighting (20 welfare points) than our index of other biological markers (4 welfare points) due to its ease to work with and its clearer relation to direct animal suffering (e.g. we are more confident that animals with very high and painful death rates will correlate more strongly with a life not worth living than the more abstract biological markers will).  

Factors we ended up using:

  • Death rate/reason - 20
  • Human preference from behind the veil of ignorance - 20
  • Disease/injury/functional impairment - 17
  • Thirst/hunger/malnutrition - 15
  • Anxiety/fear/pain/distress - 15
  • Environmental challenge - 5
  • Index of Biological markers - 4
  • Behavioral/interactive restriction - 4

Our full spreadsheet with factors, scores, and metric criteria scores gives a deeper sense of why different areas were given the weighting they were, as well as a narrative explanation of what a negative, middling, and positive score would look like in each category. 

Overall, we felt like this system gave us a good balance between both the more subjective metrics that could capture more data and the harder metrics that were more objective. We feel that this system could be used across a wide range of both animals and interventions, and lead to cross-comparable results.

Sorted by Click to highlight new comments since:

I found this post very interesting. Here are some pros and cons I've noted down on your factors, scores, and metric criteria scores:


  • Clearly enumerated strengths and weaknesses according to desiderata
  • Compatible with expressing uncertainty (e.g., via ranges)
  • Simple and single-axis
  • Potentially also compatible with rating human welfare


  • Leans towards promoting “ease of measurement”, which might miss important but hard-to-measure things
  • Likely to be sensitive to weightings which are not very robustly grounded
  • Unclear how to account for indirect and long-term effects
  • Largely incompatible with rating welfare of artificial beings

(I stumbled onto this post 4 years after its publication while exploring the literature adjacent to The Moral Weight Project Sequence.)

Thank you for tackling a very important problem. But currently I feel I’d be lost when trying to apply this model because there is more explanation needed for many factors. For example, how does the cortisol level weight against the dopamine level? And what levels are good? How to measure and weight various listed factors to assess anxiety? Etc.

Some examples of this model being applied would be very helpful for understanding the model. Is that the next step in your research?

Yes indeed, that is the next step. We plan on applying this system to ~15 animal situations and doing a 1-5 hour report on each. This would be both for different animals (e.g. wild rat and factory farmed cows) and different welfare situations for the same animal (e.g. a report each for battery caged laying hens vs enriched cage laying hens)

On biological markers specifically, from the research we have done so far, it's very hard to find any consistent biological markers, not to mention situations where we have a bunch that we can cross compare on the same animal. Generally a good score might look like “some cortisol tests have been done on rats in an ideal living situation vs wild rats and the cortisol levels are about the same” where if the same study was done but the cortisol levels were much higher in the wild rats, that would be an indication of lower wild rat welfare.

I wanted to echo all of Saulius' points (including the thanks for doing this!).

To clarify your response here: all of the rankings are essentially subjective judgements, based on whatever evidence you have available in that category? So in the example above, if those cortisol tests were somehow your only evidence in the "index of biological markers" category, you would just decide a score that you felt represented the appropriate level of badness for the wild rat "index of biological markers" score?

I'm also wondering if you're going to use the method to compare humans to non-human animals? Some of the biological measures we could use fall down when we think about how humans fit in, e.g. neuron count. Including humans in comparative measures seems valuable for reflecting on/testing intuitions we might otherwise have about cross-species comparisons.

Re:biological markers, the ideal situation would be multiple markers in both the animal in an ideal life vs their current life vs a perfectly unideal life, then scores would be given based on how their current life compares. In practice, sometimes we have found data on a happy life vs a standard life for an animal and can get some sense of how far away these are from each other, but often we have found no applicable data at all for this section. Our reports are very time capped (5 hours or less depending on the importance of the animal), so we do not dive deep into the mechanisms.

Humans from different situations will be ranked as well. I agree having them as a comparative measure for cross-species comparison allows for much easier intuition checks.

Also, I think the link "WAS research had a great summary" does not link to where you intended.

Thanks. Fixed.

Some examples of this model being applied would be very helpful for understanding the model.

We had applied this system to 15 different animals/breeds and recently posted the summary of our research here.

Thanks for providing the examples! A couple of questions:

1) Can I check I've understood: the “Estimated population size” and “Odds of feeling pain” columns are not factored into the "total welfare score" (which is made up of adding together scores from the various criteria which then end up somewhere between -100 and +100) at all; they are to be used separately.

So if you wanted to work out whether sparing 10 broiler chickens or 20 beef cows from existence was more impactful, you’d have to multiply your result by the odds of feeling pain etc. E.g. for chickens: 10 * -56 * 0.7 = -392 units of suffering prevented. For beef cows: -20 * 20 * 75% = -300 units of suffering prevented. So sparing chickens slightly better by this metric (also: note that people might not agree with that the rough estimates from the OPP on consciousness mean the same thing as "odds of feeling pain," e.g. if you subscribe to consciousness eliminativism, although I haven't read the OPP report in a while so might be misremembering the specifics)

2) I don’t understand where the “range” figure comes from?

1) As you correctly observed, we didn’t adjust welfare points for population size and odds of feeling pain in this spreadsheet. But we just publish another report summarizing our animal prioritization research where we aggregated information about baseline welfare points, population size, odds of feeling pain, neglectedness, and amount of suffering caused by a smaller number of specific reasons.

Generally, when we are calculating the cost-effectiveness of a given intervention we take into account the number of welfare points “gained” (baseline welfare points changed counterfactually by the intervention) multiplied by odds of feeling pain and number of animals affected.

We also need to adjust for length of life. For example, if the baseline welfare points per year for a cow is -20 and for broiler chicken is -56, but beef cow spends 402 days on a farm, their WP would be multiplied by the percentage of year they spend on the farm, so 402 days / 365 days in a year = 110%, and broiler chicken spend 42 days, then WPs would be multiplied by 12% resulting in:
Cow: -22 welfare points per lifetime of an individual
Broiler chicken: -6.72 welfare points per lifetime of an individual.

2) The range is the minimum and maximum values of welfare points as rated by our external reviewers. “Total welfare score” (second column) is an average of internal and external reviewer’s ratings.

Great to see this being looked at. Do you have any examples of this method in use? I'd be interested to see various animals and situations ranked using this method - as it could provide a baseline to quantify the benefits of various interventions.

I also attempted to create my own method of comparing animal suffering while I was calculating the value of going vegetarian. I'll provide a quick summary here, and would love to hear if anyone else has tried something similar.

The approach was to create an internally consistent model based upon my naive intuitions and what data I could find. I spent a while tuning the model so that various trade-offs would make sense and didn't lead to incoherent preferences. It is super rough, but was a first step in my self-examination of ethics.

  1. I created a scale of the value of [human/animal] experience from torture (-1000) to self-actualization (+5) with neutral at 0.
  2. I guessed where various animal experiences fell on the scale, averaged over a lifetime. This is a very weak part of the model - and where Joey's method could really come in handy.
  3. I then multiplied the experience by the lifespan of the animal (as a percentage of human life).
  4. Finally, I added a 'cognitive/subjectivity' multiplier based on the animal's intelligence. This is contentious, but helps so I don't value the long-lived cicada (insect) the same as a human. This follows from other ethical considerations in my model, but some people prefer to remove this step.

The output of this rough model was to value various animal lives as a percentage of human lives - a more salient/comparable measure for me.

This model was built over about 5 hours and is still updating as I have more conversations around animal suffering. Would love to hear if anyone else tried a different strategy!

Examples coming soon. We are currently aiming to have ~15 done and published by 10/7/18. Our full goal of this project is to create a consistent systematic baseline to quantify the benefits of various interventions which would then allows us to compare specific charity ideas and rank what might be the best few to found within the animal movement.

http://everydayutilitarian.com/essays/how-much-suffering-is-in-the-standard-american-diet/ is the closest thing to calculating the value of going vegetarian that I know.

Please link to the examples here when they are finished, thanks!

Please link to the examples here when they are finished, thanks!

We had applied this system to 15 different animals/breeds and recently posted the summary of our research here.

I tried to do something similar when deciding where to donate. The most significant difference was step 4. I used neuron count as a multiplier. For example, according to http://reflectivedisequilibrium.blogspot.com/2013/09/how-is-brain-mass-distributed-among.html, cows on average have 13.6 times more neurons than chickens. So in my model, one minute of cow's life was 13.6 times more important than one minute of chicken's life of comparable quality. I've seen some people comparing the square root of neuron count instead. http://ethical.diet/ makes it easy to make these kinds of comparisons for farm animals.

This looks promising!

I often find myself second guessing estimations of animal charity effectiveness as it feels like they might have cherry-picked their 'moral metric'. Breaking it down in this way seems like a laudable and structured approach for assessing an issue with quite so many unknown variables.

Things that excited me:

  • I could imagine a report where, for a given intervention, each of these is estimated, confidence weightings given and explanations of evidence, priors and reasonings for each estimation. Reading that would have given me more confidence when I was earlier in my journey re animal suffering.

  • Complex, intuition-challenging problems broken down into smaller, more intuition-friendly problems seems valuable.

  • I'd guess it's likely that making many weighted judgement calls and making gut checks from many angles will result in answers closer aligned with our values.

Glad to see work on this.

It seems to me there are two questions here: (1) what are the average effects of different environments (e.g. wilderness; factory farm) on animal well-being? (2) what is the average hedonic well-being of different species?

It feels like you're attempting to find a method that will give the combined score for any given animal. But maybe it'd be best to focus on each individually. Some of the methods you mentioned (e.g. cortisol levels, behavior anomalies, self-narcotization) seem fairly solid for addressing (1), if you had more data. What's the biggest hurdle to gathering more data? Can you think of any clever ways to gather lots of data cheaply? Basically it seems really useful to try to build an intra-species hedonic comparison first, and worry about inter-species comparisons later.

That said-- on inter-species comparisons, I don't think any of the methods you mention are likely to give a good answer to (2), especially as none deal directly with brain activity. It's possible (although I don't know for sure) that some of QRI's work is relevant here- essentially, we have a method ('CDNS') that could be adapted to estimate the degree to which a given connectome is naturally 'tuned' toward harmony or dissonance. This would face many of the same data & validation challenges you mention for other proxy measures, but essentially I'm skeptical that it's possible to address (2) without something like what QRI is doing, that actually looks at brain activity and doesn't rely on hard-coded assumptions about things that could be species-specific and are probably leaky anyway (e.g., brain region X is associated with pain).

If it checks out, this could give a rough inter-species comparison of natural hedonic set-points between literally any two connectomes-- cows, chickens, rats, grasshoppers, mosquitos, humans. Probably not an end-all-be-all, but a useful tool in the toolbox. More on our 'CDNS' method.

To clarify, are you asserting that wild rats, fish, and bugs have net negative lives, on the order of half of the suffering of a factory farmed animal? That seems like a fairly controversial point, since it suggests that, e.g., habitat destruction is a good thing wherever the damage to the ecosystem would not be catastrophic.

Although you've said that a score of 0 is supposed to represent uncertainty about whether the animal's life is net positive or net negative, it doesn't seem to me that the metrics are well-designed for that. Most of them seem best for capturing negative utility, rather than positive. For instance, when a score of "5 to 15" is assigned to a death with "quick or low pain," I assume that doesn't mean that the act of dying itself has positive utility, so where does the positive utility come from? It seems you'd have to implicitly weigh the suffering from death with the lifespan of the animal and its welfare over the course of its life, but it seems wrong to include that all in a quality of death metric. For instance, if we had two groups of animals that were had the same scores on all of these metrics, including how painful their death was, but one had a much shorter lifespan than the other, then the shorter-lived group would have much more pain, even though their scores under this system would be equal. (This might be captured by the death rate figure – if so, could you explain what a "10%" or "50%" death rate means?)

I find that discussion about wild animal suffering very quickly gets to "but it would be ludicrous to believe X because then we'd have to do Y." I think it's better to focus on "How might we find out if X is true" rather than the drastic consequences that would have.

As a parallel, people in power have often found it convenient to believe that slaves, immigrants, poor people, etc have naturally higher pain tolerance than themselves and thus it's not a problem for them to do hard labor, have inadequate medical care, etc. The fact that changing this belief would have disruptive consequences doesn't have anything to do with its accuracy.

Having read much of Brian Tomasik's work, I think the idea that wild animals have net negative lives is plausible, and I don't think habitat destruction would be ludicrous. However, that does seem to be a more extreme position than most wild animal welfare organizations are willing to commit to, and I suggest that the framework proposed here is not well-suited for answering those sorts of questions.

Yeah, I think there's a bad dynamic where people who have read Tomasik either seriously or jokingly propose "pave everything" and other people find that alarming and want nothing to do with any ideas that could lead in that direction. I spent years intentionally not reading Tomasik because I was afraid it would make me into some kind of fanatic.

Good work ! I am including a link to it in my Preparatory Notes for the Measurement of Suffering, where perhaps you will find other useful measuring methods.

More from Joey
Curated and popular this week
Relevant opportunities