When it comes to helping nonhuman animals in factory farms, there’s a lot of things people can do. But when figuring out what is best, we currently have to rely entirely on our intuitions. The hope is that, in the future, empirical research will at least be able to strongly guide our intuitions and improve the effectiveness at which we work as well as our understanding of how to achieve victories for animals.

 

Prior work has either had an insufficiently large sample size to find significant effects (e.g., ACE 2013 Humane Education Study, MFA 2016 Online Ads Study), had an insufficient control group size (e.g., ACE 2013 Leafleting Study, 2014 FARM Pay-per-view Study, VO 2015 Leafleting Study; Ferenbach, 2015; Hennessy, 2016), had no control group (e.g., THL 2011 Online Ads Study; THL 2012 Leafleting Study; THL 2014 Leafleting Study; THL 2015 Leafleting Study; VO 2015 Pay-per-view Study; James, 2015; Veganuary 2016 Pledge Study), only measured intent to change diet rather than actual self-reported diet change (e.g., MFA 2016 MTurk Study), or did not attempt to measure differences against a baseline prior to the intervention (e.g., VO 2014 Leafleting Study; CAA 2015 VegFest Study; VO 2016 Leafleting Study; Ardnt, 2016). (This is all the veg research I know of, if you can name any more, I’d be happy to add them to this literature overview.)

 

However, now, the Animal Welfare Action Lab and Reducetarian Labs teamed up, with help from myself and Kieran Greig, to produce the 2016 AWAL Newspaper Study (see announcements from the Reducetarian Foundation and from AWAL), and found the first statistically significant effect on meat reduction using an actual, and sizable, control group. This is a very amazing contribution, and I applaud the whole team for doing it well!

 

In this document, I aim to re-analyze the study for myself. In doing so, I use their data but my own statistical analysis to replicate the effects I care most about. I also recreate all the analysis in this writeup, rephrasing it in my own words (so I and others can understand from slightly different perspectives).

 
 

What is the basic methodology?

 

Participants were recruited through Amazon Mechanical Turk, which is a broadly representative sample of the US population that can be recruited for online surveys at low cost. 3076 participants were recruited to take a baseline food frequency questionnaire with a variety of self-reported behavior and attitude questions. However, for this re-analysis, I’m only going to dig into the questions about how many servings of meat they self-report eating.

 

One week after that, participants were recontacted to view a newspaper article. One third of participants randomly saw an article advocating vegetarianism, another third of participants randomly saw an article advocating reducetarianism (eating less meat), and a final third of participants saw a control article advocating exercise. Immediately after viewing the article, participants were asked to report their diet.

 

Five weeks after that, participants were recontacted and asked to report their diet again.

 

Exact copies of the newspaper articles, copies of the survey, and full methodology are available in the study writeup.

 
 

What are the headline results?

 
  • Participants in the treatment groups, on average, changed their diet to eat 0.8 less servings of turkey, pork, chicken, fish, and beef per week than those in the control group.

 
  • There was a statistically significant difference between the three groups (ANOVA, p = 0.03) and between treatment (pooled) and control (chi-square test, p = 0.001).

 
  • There is no statistically significant difference between the reducetarian message and the vegetarian message on diet change (chi-square test, p = 0.09).

 
  • The presence of a control group is important. Without looking at the control group, there appear to be significant decreases in vegetarian rates, but when taking the control group into account these decreases disappear.

 

There were other effects on intentions too, but I prefer to focus on the diet change since it’s a lot more important and exciting to me. For more, feel free to check out the original study.

 
 

How should we cautiously interpret these results?

 

The study reports an average of eating one less serving of meat for the treatment groups, but I think it is more clear to split this into three groups -- of the 1422 people in the treatment groups, 258 people (18.2%) had no change, 538 (37.8%) increased an average of 6.94 servings per week of meat on average, and 626 (44.0%) people decreased an average of -7.66 servings per week.

 

For the 702 people in the control group, 138 people (19.7%) had no change, 286 (40.7%) people increased an average of 6.90 servings per week, and 278 (39.6%) decreased an average of -6.34 servings per week.

 

Thus the treatment and control group differ in both the number of people who ultimately decide to reduce and the magnitude of the average reduction for those that do end up reducing. Breaking it up like this, the number of people who reduce (holding the magnitude of reduction constant) is insignificant between groups after controlling for multiple hypothesis testing using Benjamini-Hochberg procedure (t-test, p = 0.0585), and the magnitude of reduction among those who do reduce (holding the number of people reducing constant) is significant (t-test, p = 0.007).

 

We can then keep in mind that the actual magnitude of change is, for many people, a lot more than one serving per week, even if it is one serving per week on average for the entire group.

 

All that being said, it’s still unclear to what degree we can take these numbers literally, since people can’t correctly recall the precise numbers of servings they have eaten over the past month, and because there is a good deal of fluctuation in diets. However, it does look like this reduction effect survives a few different ways of looking at it and also survives my independent data re-analysis, such as by looking at the magnitude of reduction and a binary value of whether or not there was any reduction; by looking across ANOVA, chi-square, and t-tests; and by either pooling or not pooling the treatment groups.

 

I also did an extra sanity check -- did the treatment cause people to also reduce on fruits, nuts, vegetables, beans, and grains? Or maybe to increase on vegetables due to social desirability bias? The answer is no on the first (chi-squared, p = 0.7933) and no on the second (chi-squared, p = 0.1778), both of which are good for the results of this study.

 
 

When reducing, are people just shifting away from beef and toward chicken?

 

In a word, no.

 

Among those who reduced beef, there was also a -0.7 servings per week reduction of chicken in the control group and a -2.2 reduction of chicken in the treatment group.


Similarly, among those who reduced chicken, there was also a -1 servings per week reduction of beef in the control group and a -2.3 reduction of beef in the treatment group.

 

I’m not making any claims of statistical significance between treatment and control groups for this, but it is pretty clear that people aren’t shifting from beef to chicken, but rather just reducing across the board.

 
 

What did this study not find?

 

Notably to me, while the magnitude of people eating less meat is significant, this study found no effect on people cutting out meat entirely (even though the vegetarian appeal suggested doing that). Inferring vegetarianism from people who reported no servings consumed of beef, turkey, chicken, fish, and pork, in the treatment group nine people start vegetarianism and twelve people stop and in the control group four people start vegetarianism and three people stop. This difference is not significant among the groups (ANOVA, p = 0.26) or among treatment vs. control (chi-square test, p = 0.55).

 
 

What implication does this have for our existing strategies?

 

I’m going to speculate pretty wildly here and afterwards I’ll mention appropriate disclaimers that walk back my speculation.

 

According to ACE’s research which relies on a re-analysis of a 2012 THL study, we come up with expectations that 3.3% of people shown a leaflet or online ad will stop eating red meat, 1.6% will stop eating chicken, 1.0% will stop eating fish, 0.4% will stop eating eggs, and 0.6% will stop eating dairy.

 

According to this study, 2.7% of people shown a newspaper story about vegetarianism or reducetarianism stop eating red meat. However, notably 2% of people shown the story about exercise also stop eating red meat. This means that we would estimate the true effect of the newspaper study to be a net change of 0.7 percentage points. (Presumably, the 2% change from the control story comes not from people changing their diet once convinced about the power of exercise, but from just a general trend toward vegetarianism over time unrelated to any news story, and/or from social desirability bias, and/or a mix of other factors.)

 

Going down the list, there is a -0.4 percentage point change for eliminating chicken (once the treatment group is compared to the control group), and -0.1 percentage points for fish, +2.0 percentage points for eggs (meaning people in the treatment group ate more eggs than people in the control group -- this could be a substitution effect or it could be just random noise), and -0.6 percentage points for dairy. These stark departures from the other study remind the importance of including a control group. Also, don’t take these numbers literally because none of them were statistically significantly different from +0 percentage points (no change).

 

-

 

I’m not going to plug those numbers into the calculator and call it a day, though, because the reality is that there is no statistically significant difference between groups on individual food items, likely because of the lower sample sizes and high within-item variability. The pattern only emerges on the larger scale of reduction across all food.

 

It’s also worth noting that a newspaper article on MTurk is much different than a leaflet handed out in person along with an in-person survey. MTurk can be better in some respects, since you’re less likely to have a nonresponse bias in who answers your survey as it’s easier to follow up with everyone. Also, you have a much better knowledge of who was actually in your treatment and control groups. On the other hand, a newspaper article is less persuasive than a leaflet or video, and the fact that people are paid to read it and may expect comprehension questions creates an unrealistic effect that won’t be present in real life.

 

-

 

Taking this a different way, we may want to focus on reduction instead of elimination, since that’s what this study was about. However, ACE numbers are about how many animals are spared, whereas we only know about the number of servings that are not eaten (and even that may be hard to take literally).

 

To simplify things for myself, I’m going to look just at chicken consumption, since chickens are the vast majority of factory farmed animals. From this study, we found that participants reported eating an average of 4.9 servings of chicken per week at baseline. Given that a serving of chicken is approximately 3 ounces, 4.9 servings of chicken per week is 0.41kg of chicken per week, or 21.32kg per year. As a sanity check this matches up well with the USDA reporting (p15) that Americans ate 24kg of chicken per year in 2000.

 

A chicken weighs 1.83kg, so taking this survey data literally would mean that survey respondents are consuming 11.6 chickens per year and respondents in the treatment group reduce their consumption by 0.26 servings per week, which assuming treatment effects continue to hold and don’t decline (a strong assumption) and projecting those effects out annually, would be a reduction of roughly 1.1 chickens per year per respondent.

 

Assuming the newspaper ad over MTurk is the same as a leaflet or an online ad (another strong assumption), we could project to 1.1 chickens saved per $0.35, or 3.1 chickens spared per dollar, which is quite close to ACE’s estimate of 3.6 chickens spared per dollar.

 

Going off the speculative deep end, we can note that a factory farmed broiler chicken lives for 42 days on average, so 3.1 chickens spared per dollar is 130 days of factory farmed suffering averted per dollar, which is $2.81 per chicken DALY.

 

However, I want to strongly caution against taking these numbers very literally, since there still is a lot we don’t know. The serving sizes reported by our sample are not close to literal numbers for a variety of reasons, and have a lot of noise and fluctuation. Also, there are a lot of differences between MTurk and the real world, and MTurk might not accurately reflect how our materials do against the real public.

 

-

 

Lastly, it’s also interesting to note that the reducetarian message and the vegetarian message resulted in roughly the same (no statistically significant difference) amount of meat reduction. However, this claim should not be taken too literally, as it could easily be due to the sample size not being large enough to pick up a small difference between the two messages. This was similar to a finding in Ferenbach (2015) where both videos tested produced roughly the same amount of behavior change, though the sample sizes in that study were even smaller.

 

Why did this study work when others haven’t?

 

Earlier I mentioned that all previous studies have been held back by having inadequate sample sizes or not having a control group. Through the magic of a control group and an adequate sample size, this study prevailed. Pretty simple!

 

One way statistically significant effects could be found in a smaller sample was through a randomized block design. This was used by AWAL to reduce the variance in the outcome measure, which increases statistical power.

 

This study had a control group of 742 people and a treatment group of 1495 people. Combined, that’s about 20% larger than the previous largest study, the MFA 2016 Online Ads Study, with a treatment group of 934 people and a control group of 864 people.

 
 

What unanswered questions remain?

 

I’d like to do a more careful power analysis to see exactly how this study surveyed enough people to be effective, especially since it’s not that much larger than other studies.


I’d be curious to also replicate the study analysis without the randomized block design and see if the effects still hold. It could be that statistical power was only achieved through the block design.

 

It’s also possible the study may have just gotten lucky.

 

-

 

I’d like to look more into the attrition data. Since some people who took the first wave didn’t show up for the second wave and some people who took the second wave didn’t show up for the third wave (despite a much higher compensation), there’s a worry about a nonresponse bias where people predisposed to not like vegetarianism drop out instead of filling out their survey showing their lack of change, which leads us to overstate the amount of vegetarianism.

 

-

 

I’d also like to look more deeply at the way the food frequency questionnaire was used. As my friend and ACE intern Kieran Greig pointed out to me, the FFQ asks the respondent about their meat intake in discrete, ordinal buckets (zero times per week, 0-1 times per week, 1-6 times per week, 1-3 times per day, 4 or more times per day) and these buckets are then transformed into continuous, numerical data (e.g., “zero times per week” became 0 times per week, “1-6 times per week” became 3.5 times per week, “4 or more times per day” became 28 times per week).

 

However, there are multiple other ways these buckets could have been transformed into continuous data (for example, assuming “4 or more” roughly becomes 4 seems to really underestimate the “or more” part), and it would be quite problematic if the effect failed to replicate under certain methods and not others. I have not yet tested this, but the fact that the study effects do hold under a binary variable (any meat reduction at all versus no or negative meat reduction) is encouraging.

 

There are also other methods that can be used to analyze the differences in ordinal values between the treatment and control groups and between the baseline and endline waves, such as ordinal logistic regression, that would be able to analyze the data and find statistically significant effects without the need to pick a particular method of transforming the ordinal data into continuous, numeric data. While the output from this model would be very difficult to interpret in terms of amount of meat reduced, we would definitely expect a statistically significant effect on this kind of model if the effects of the treatment are real and not just an artifact of the method of transformation used.

 

While the transformation does introduce complications, I’d generally note that using an ordinal FFQ sounds like a good idea. The ordinal buckets may be less accurate, but I’d expect respondents would find it much easier to fill out rather than trying to recall the precise amount of servings that they ate. Since I wouldn’t really trust the difference between someone self-reporting 4 servings instead of 3, it makes sense to create a sizable bucket where we would expect differences to be meaningful.

 

-

 

I’m somewhat curious how much a newspaper ad approximates a leaflet and I’m somewhat curious to replicate the study again, but with actual leaflets (and including a control leaflet), though others I’ve talked to have been less interested in more MTurk studies.

 

-

 

I and many others would also very much like to see a study on a platform other than MTurk, such as a replication of the MFA 2016 Online Ads Study but with an even larger sample size. We’d really like to see if effects on MTurk hold up in other areas.

 

-

 

The bottom line is that while these results are encouraging, we know that most studies that people try to replicate end up failing to replicate. There are still many unanswered questions here that we’ll only really know as the field of empirical animal rights work continues to evolve.

 

-

 

Disclaimer: I funded 75% of the costs of the study, provided consulting on the study methodology, and continue to be involved in Reducetarian Labs’s empirical work.

 

Thanks to Krystal Caldwell, Kieran Greig, Brian Kateman, Bobbie Macdonald, Justis Mills, Joey Savoie, and Allison Smith for reviewing an advanced copy of this work.

Comments10


Sorted by Click to highlight new comments since:

Thanks for writing this up.

The estimated differences due to treatment are almost certainly overestimates due to the statistical significance filter (http://andrewgelman.com/2011/09/10/the-statistical-significance-filter/) and social desirability bias.

For this reason and the other caveats you gave, it seems like it would be better to frame these as loose upper bounds on the expected effect, rather than point estimates. I get the feeling people often forget the caveats and circulate conclusions like "This study shows that $1 donations to newspaper ads save 3.1 chickens on average".

I continue to question whether these studies are worthwhile. Even if it did not find significant differences between the treatments and control, it's not as if we're going to stop spreading pro-animal messages. And it was not powered to detect the treatment differences in which you are interested. So it seems it was unlikely to be action-guiding from the start. And of course there's no way to know how much of the effect is explained by social desirability bias.

A chicken weighs 1.83kg, so taking this survey data literally would mean that survey respondents are consuming 11.6 chickens per year and respondents in the treatment group reduce their consumption by 0.26 servings per week, which assuming treatment effects continue to hold and don’t decline (a strong assumption) and projecting those effects out annually, would be a reduction of roughly 1.1 chickens per year per respondent.

Does your calculation account for the fact that only part of the chicken actually gets converted into meat that is eaten? There are approximately 9 billion chickens slaughtered each year in the United States (a country of roughly 300 million people), so the mean consumption should be around 30 chickens a year.

I did not take that into account; that's a good point. I think further research would be needed to nail down that figure more precisely. However, the entire calculation is already pretty speculative, so I'm not too concerned about some of the figures being loose. Sounds like this oversight would make veg ads look even better, which is nice to hear and could help balance out likely errors in the other direction.

Good write-up.

I find the very long list of badly-designed studies you note in the introduction a cause for consternation, and I'm glad this has been done much better.

However, I couldn't see a power calculation in the study, nor in the pre-registration, so I worry the planned recruitment of 3000 was either plucked from the air or decided on due to budget constraint. Yet performing this calculation given an effect size you'd be interested in is generally preferable to spending money on an underpowered study (which I'm pretty sure this is).

Given the large temporal fluctuations (e.g. the large reduction in control group), the pretty modest effects, I remain sceptical - leave alone the obvious biases like social desirability etc. Another reanalysis which might reassure would be monte carlo permutation of food groups: if very few random groups show reduction in consumption to a similar magnitude as meat, great (and, of course, vice versa).

Here is the main plot from our power calculations that informed our sample size selection (alongside budget constraints): http://i.imgur.com/aeEYagA.png

I worry the planned recruitment of 3000 was either plucked from the air or decided on due to budget constraint

I want to be careful not to speak for the authors here, but I'm personally pretty sure it was picked by budget constraint, though with an eye to power calculations (that I saw, not sure why they weren't published) suggesting it would be sufficient.

-

Given the large temporal fluctuations (e.g. the large reduction in control group), the pretty modest effects, I remain sceptical - leave alone the obvious biases like social desirability etc.

Agreed.

-

Another reanalysis which might reassure would be monte carlo permutation of food groups: if very few random groups show reduction in consumption to a similar magnitude as meat, great (and, of course, vice versa).

In my re-analysis, I did make a "bogustarian" label looking at reduction in beans, fruits, nuts, vegetables, and grains and found no statistically significant results (see https://github.com/bnjmacdonald/reducetarian-messaging-study/blob/master/peter-reanalysis/analysis.R#L124-L132). So maybe that's reassuring, but one could extend this to be a true monte carlo method.

Major props to the authors on the study and for this follow up write up as well.

For maximizing power per dollar, another technique to consider is the one outlined here, in which researchers first found a subsample of people more responsive to follow-up surveys (of course, there are concerns about external validity):

http://science.sciencemag.org/content/sci/suppl/2016/04/07/352.6282.220.DC1/Broockman-SM.pdf

I'm excited to see more research like this produced, on this and other topics – are you able (both in terms of permission and in terms of capability) to tell us how much this study cost, both in terms of money and time?

Just checked our MTurk account. Pilot cost $1,189. Main study cost $6,075. Total cost $7,264.

Total time: it took about 11 months from very first conversations to completing the final paper. About 3 months of that was data collection for the pilot and main study. The remaining time was planning, survey development, analyzing data, writing, etc. And several chunks of time were stagnant due to lack of time on our part. Total time certainly could have been cut in half if we had the time to work on it 40 (or even 20) hrs per week.

If I recall correctly, we spent ~$12K on MTurk fees to pay participants. All the labor in developing the study was given voluntarily without pay. I don't know how many total person hours it took to create, implement, and analyze. The start-to-finish wall time was ~8 months, but much of that time was spent waiting.

Edit: Krystal's comment has more accurate numbers

Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f