William McAuliffe

705 karmaJoined Sep 2021Miami, FL, USA


I am a Senior Research Manager in the Animal Welfare department at Rethink Priorities. The views I express here do not represent Rethink Priorities unless stated otherwise. 

Before working in effective altruism, I completed a Ph.D. in psychology studying the evolution of cooperation in humans, with a concentration in quantitative psychology. After that, I was a postdoctoral fellow studying public health. My main interests now are animal welfare and social science methodology/statistics.


Reducing the influence of impression management on the measurement of prosocial and antisocial traits was the topic of my doctoral research. When I started, I thought that better behavioral paradigms and greater use of open-ended text analysis could meaningfully move the needle. By the time I moved onto other things I was much more pessimistic that there is low-hanging fruit that can both (a) meaningfully move the needle (here's one example of a failed attempt of mine to improve the measurement of prosocial traits; McAuliffe et al., 2020), and (b) be implemented at scale in a practical context. The general issue is that harder-to-game measures are much noisier than easier-to-game measures (e.g., see Schimmack, 2021 on implicit measures), so the gameable measures tend to be more useful for making individual predictions in spite of their systematic biases. The level of invasiveness required to increase the signal on a non-gameable measure (e.g., scraping all of a person's online text without their permission) would probably be at odds with other goals of the movement. The same probably goes for measures that do not rely on actual evidence of concerning behavior (e.g., polygenic scores).

More fundamentally, I disagree that this is a neglected topic– measuring malevolence and reducing responses biases are both mainstream topics within personality psychology, personnel psychology, developmental psychology, behavioral genetics, etc. For example, considerable effort has gone into testing whether multidimensional forced-choice personality questionnaires do a good job reducing faking (e.g., Wetzel et al., 2020). An academic psychologist who is EA-sympathetic and getting funding from standard academic sources might have more impact from pursuing this topic rather than whatever else they would have studied instead, but I see limited value in people changing careers or funding grants that would have otherwise gone to other EA causes. I also do not see a strong case for carrying on the discussion outside of the normal academic outlets where there is a lot more measurement expertise. 

I personally would not make mortality the focus of the marginal research project, but I do think you would get it 'for free' in the sort of project I would prioritize. In my view, the main considerations are: 

1. A lot of uncertainty is an artifact of inconsistent reporting practices. An article arguing for a standardized methodology in an aquaculture magazine signed by a bunch of prestigious researchers (or a presentation at an aquaculture industry event) might do more to reduce uncertainty than more data per se.  

2. A lot of the basic trends are robust to the uncertainty. Cumulative mortality is probably around ~50% even in ideal circumstances, more intensive farms have less mortality, larval mortality is steeper than juvenile mortality, and wild shrimp have higher mortality rates than farmed shrimp. 

3. Hannah's upcoming report, a Monte Carlo model of which welfare issues cause the most harm in aggregate while shrimp are still alive, contains enormous uncertainty due to limitations in the surveys of farms that have been conducted. As a result, the rank-order of the badness of many issues is not robust, an issue that new, higher-quality data could address. Improved surveys would presumably also measure survival, so we would gain clarity on premature mortality even though it was not the main focus. 

4. It would probably be at least as valuable to get larval mortality estimates for the farmed fish species to which we compared farmed shrimp in Figure 4. 

Our next report explores preslaughter mortality in much greater depth, including cobbling together estimates from a wider variety of sources, some of which are broken down by extensive vs. intensive. We expect to publish it very soon, stay tuned!

Interesting idea! I will have to look into whether it has been tried on farmed animals or laboratory animals. I would have a concern similar to the concern I have with the classical conditioning experiments: aversion to the more intense pain might reflect reduced volition rather than welfare maximization. But it does seem plausible that volition is not as much of an issue when the pain is only administered with a low probability. 

I am not familiar with the authors you cite so I will refrain from commenting on their specific proposals until I have read them. I speculate that my comment below is not particularly sensitive to their views; I am a realist about morality and phenomenal consciousness but nevertheless believe that what you are suggesting is a constructive way forward.

So long as it is transparent, I definitely think it would be reasonable to assign relative numerical weights to Welfare Footprint's categories according to how much you yourself value preventing them. The weights you use might be entirely based on moral commitments, or might partly be based on empirical beliefs about their relative cardinal intensities (if you believe they exist), or even animals' preferences (if you believe the cardinal intensities do not exist or believe that preferences are what really matter). Unless one assigns lexical priority to the most severe categories, we have to make a prioritization decision somehow, and assigning weights at least makes the process legible.   

Joel Michell argues that the theory of conjoint measurement provides indirect tests for whether psychological constructs are quantitative. I do not yet understand the approach in much detail or the arguments for alternative approaches

I like your summary. I feel (slightly) less hopeless because I think...

  • Comparisons that involve multiple dimensions of pain are, in principle, possible. I think I would only regard them as impossible if I came upon evidence that pain severity is, in reality, an ordinal construct. 
    • In one sense, I might be more pessimistic about this topic than many because I think it is plausible that many psychological constructs are ordinal.
  • Behavioral evidence could in theory license cardinal comparisons among different pains. Practical issues of feasibility (and permission from institutions) stand in the way, and I would grant that these will probably never be overcome.
    • Possibly, cardinal differences in severity are explicitly represented in the brain. If so, then in principle we could measure these representations, though I do not think that we ever will.  
  • We may be able to prioritize between relieving severe pain and long-lasting pain without making direct cardinal comparisons, so long as we have a sense of just how many orders of magnitude pain severity can span. Many aspects of pain experience appear conserved across a large number of species. If we find that pain in humans or laboratory animals have a wide range of pain severity, then there is an above-chance possibility that farmed animals do too. There is also an above-chance possibility that the most severe pains on factory farms are close to the end of the negative side of the range, given that it is difficult to see the adaptive value of being able to represent threats more extreme than, say, being boiled alive.  
    • I would agree that the point above is partly grounded in intuition that has only a vague relationship to a well-established theory of the evolution of pain. Hopefully, advances in this area will reduce our reliance on intuitions that are not grounded by a plausible scientific theory.  


Strongly agreed. For those who want exposition on this point, see Ashford's article on demandingness in contractualism vs. utilitarianism https://doi.org/10.1086/342853

This survey item may represent a circumstance under which YouGov estimates would be biased upwards. My understanding is that YouGov uses quota samples of respondents who have opted-in to panel membership through non-random means, such as responses to advertisements and referrals. They do not have access to respondents without internet access, and those who do but are not internet-savvy are also less likely to opt in. If internet savviness is correlated with item response, then we should expect a bias in the point estimate. I would speculate that internet savviness is positively correlated with worrying about AI risk because they understand the issue better (though I could imagine arguments in the opposite direction--e.g., people who are afraid of computers don't use them).

To give a concrete example, Sturgis and Kuha (2022) report that YouGov's estimate of problem gambling in the U.K. was far higher than estimates from firms that used probability sampling that can reach people who don't use the internet, especially when the interviews were conducted in person. The presumed reasons are that online gambling is more addictive and that people at higher risk of problem gambling prefer online gambling to in-person gambling.

Load more