More Thoughts (and Analysis) on the Mercy For Animals Online Ads Study

Peter Wildeford

More Thoughts (and Analysis) on the Mercy For Animals Online Ads Study

Peter Wildeford

26 min readMay 27, 2016

Comments 15

Sorted by

New & upvoted

ClaireZabel

10y

The experimental group reported higher agreement with the claim that “that cows, pigs, and chickens are intelligent, emotional individuals with unique personalities”. Does that matter?

Likely no, for similar reasons as discussed earlier. Beliefs and attitudes are nice. They’re certainly better than nothing. Maybe they’ll even help create a societal shift or cause someone to go vegetarian many years down the road. However, they just as well might not.

I'm not sure about this. Some people that are funding online ads want to reduce animal product consumption now. Others are primarily interested in effecting long-term values shifts, and merely use animal product consumption as a weak proxy for this. I'd be pretty independently interested in answering the question "which intervention is most effective at convincing people that cows, pigs, and chickens are intelligent, emotional individuals with unique personalities?”

If I knew which intervention best did that, and which most reduced animal product consumption, and they were different, I'm not sure which I'd be more excited about funding (but I'd be interested if other people have a strong opinion about this).

Cerulean

10y

Interesting point. I suppose that Peter did pre-emptively respond to it when he noted that "it is still premature to do lots of studies on the relative effectiveness of certain vegetarian messaging ... when we don’t even know if the absolute effectiveness is there yet."

Furthermore, this would probably be really difficult to detect, as ads which aim to reduce animal product consumption now might actually be the most potent vector for effecting long-term value shifts - people who start integrating some vegetarianism into their diets are more likely to come across and even spread information showing the intelligence of factory farmed animals.

Given the problems faced by this study, I doubt we'll have a clear answer to the real long-term effectiveness of various interventions any time soon. The best we can do at the moment is to try a combination of methods that appeal to diverse moral intuitions and interests.

CarlShulman

10y

Even though this study had insufficient statistical power to detect an effect, it is easy to understate how much of an improvement this study was on the prior studies that have been conducted.

This study is the largest to date, more than 3x the size of the second largest study. Also, this study is the first of its kind to have an equal-sized control group, 14x larger than any control group previously studied.

Kudos for this long-term effort and the significant improvements.

[ X, Y, Z ] Does that matter?

Given the problems you're having with statistical power you may do better by creating an index outcome variable that takes several signals into account. For example, the GiveDirectly evaluations combine a variety of well-being measures into a single index to replace massive underpowered multiple testing.

Instead, the big problem to me is one of bias -- we’ve already analyzed the results and we’re operating on a motivated continuation where we only continue if we don’t find the effect we’re looking for and only stop once the effect is found. Instead, we need to precommit to a stopping point.

You might want to read this, this and this,

Jeff Kaufman 🔸

10y

The response rate issue seems key to me: if we had known the study would be substantially underpowered we would probably have not run it, or at least figured out how to run it differently.

It would have been awesome if we could have funnelled all 200K people who saw one of the two pages into taking a survey. However, the retargeting required the participants to click on yet another ad advertising the survey (at ~$1 in cost-per-click and the incentive offered per person), and fewer than 2% of our original population did so (the “response rate”).

This low response rate was lower than we expected (despite doing actual piloting of the study to determine a guess at the response rate) and led to a large degree of subjects inadvertently dropping out[6] and we weren’t able to get a large enough sample size despite paying so much money.

What's the right methodology for a response rate pilot?

You're trying to learn what the (cumulative) response rate is as a function of money/time. You need a small enough sample (audience) that you can afford to really probe the dimensions of this space and pull out all the responses you're going to be able to get. So if your full study will have 200k participants, you should have your pilot sample just be ~1k. Then ramp up your spending, and see how many responses you get over time. This tells you the total number of responses you can pull out of a 1k sample, and how much money/time it will probably take to get a given response rate.

(The pilot study in this case didn't actually measure response rate, just response cost, and used CPC ads with a very large sample size in a way that only measured the cost of the first few clicks. Since the first few clicks from a sample are always the cheapest, this wasn't a useful approach.)

jonathonsmith

10y

Vegan Outreach ran its first annual Leafletting Effectiveness Survey (LES) last fall and we had a dismal response rate as well (around 2%). We were offering $5 incentives for people to take a 2-part survey, where Part 1 was filled out immediately and then an email was sent out two months later to complete Part 2 and claim their gift card. We've been running small response rate studies since then to figure out what kind of incentives we need to hit our targets, but we're seeing significant variation based on what city / state we're operating in. This is making it really difficult to find one incentive level to rule them all.

I wonder if you've looked at the geographical distribution of where your 2% came from? And do you have any theories why your actual response rate differed from your pilot response rate?

Peter Wildeford

10y

I wonder if you've looked at the geographical distribution of where your 2% came from?

I have not. I don't believe we collected geographic data (it's not in the public data set provided), but you could check with Krystal at MFA.

And do you have any theories why your actual response rate differed from your pilot response rate?

It’s hard to say for sure, but I suspect it was because the pilot study was not run for very long, so we inadvertently selected for more enthusiastic people.

zdgroff

10y

This is a very good writeup, thanks for this. Everything strikes me as correct on the merits of the experiment. I think the objection that we don't know how long people watched it misses the mark as you say, since we are interested in the effect of viewing an online ad, not watching an entire video (it can become relevant if we try to extrapolate to contexts where people do watch the entire movie).

As I've said elsewhere, I'm skeptical that the approach to take is to do more such RCTs. I worry about us having to spend extremely large sums of money for such things. Certainly it seems we should compare with other options, like investigations, and not try too hard to find effect sizes that don't dominate those other options.

On this note, what effect size are you using for power calculations? Is it the effect size in the study? You probably want to power it for a smaller effect size - the smallest such effect such that MFA or another org would choose to invest more or less in online ads based on that (so the effect that would determine whether online ads are or are not competitive with investigations and corporate campaigns most likely).

jonathonsmith

10y

As I've said elsewhere, I'm skeptical that the approach to take is to do more such RCTs. I worry about us having to spend extremely large sums of money for such things.

It's probably a good idea to consider the global amount of money being spent on an AR intervention when evaluating the cost to investigate it. Like how much money is being spent across the different AR orgs on FB ads? If a proper study costs $200K and there is only $500K a year being spent globally, then it's hard to see the value proposition. If the total being spent annually is $20M, then a full fledged RCT is probably in order.

Does anyone know of estimates of how much the AR movement as a whole is investing in different interventions? This might help prioritize which interventions to study first and how much to pay for those studies.

Joey🔸

10y

I have heard that farm animal welfare as a whole is in the $10m-$100m range, so I would be surprised if something like online ads was $20m a year. That being said, it's worth accounting for long term effects. For example, if online ads were proven not to work for $100k and only $200k gets spent on it a year, the first year might seem like a waste, but if over the next ten years 50% of funding for online ads moves to more effective interventions, this definitely makes it worth it.

Additionally, if something is proven to work, then the amount of total AR funding that goes to it could increase to well past the amount it's getting now. For example, if online ads get strong evidence showing they work, they might get $500k a year instead of $200k and other less proven interventions might get less.

CarlShulman

10y

Not to mention that the study itself is delivering the intervention to the treatment group, so the marginal cost of adding the control group for randomization is only a portion of the nominal outlay.

JesseClifton

10y

I would be especially wary of conducting more studies if we plan on trying to "prove" or "disprove" the effectiveness of ads with so dubious a tool as null hypothesis significance tests.

Even if in a new study we were to reject the null hypothesis of no effect, this would arguably still be pretty weak evidence in favor of the effectiveness of ads.

CarlShulman

10y

What are you worried about here? The same studies will give confidence intervals on effect sizes, which are actionable, and reliable significance at a given sample size indicates an effect of a given magnitude..

JesseClifton

10y

Confidence intervals still don't incorporate prior information and so give undue weight to large effects.

CarlShulman

10y

Sure, one should attend to priors in interpretation, but that doesn't make the experiment useless.

If a pre-registered experiment reliably gives you a severalfold likelihood ratio, you can repeat it or scale it up and overcome significant prior skepticism (although limited by credence in hidden flaws).

JesseClifton

10y

I'm not saying any experiment is necessarily useless, but if MFA is going to spend a bunch of resources on another study they should use methods that won't exaggerate effectiveness.

And it's not only that "one should attend to priors in interpretation" - one should specify priors beforehand and explicitly update conditional on the data.

Comments

Name	Date	Treatment	Control
Cooney’s FB Study	Fall 2011	104 people	Did not use
Farm Sanctuary’s Leaflet Study	Fall 2012	"nearly 500”	Did not use
ACE’s leafleting study	Fall 2013	123	23 (control leaflet), 477 (no leaflet)
ACE’s Humane Education Study	Fall 2013	169	60
THL’s Leaflet Study	Fall 2013	524	45
Vegan Outreach MTurk Leaflet Study	Winter 2014	404	213
MFA Study	Spring 2016	934	864