Summary
I ran a 20-day pre-registered experiment where 6 different participants who have not recently talked about Giving What We Can were asked to contact their friends to talk about Giving What We Can. A total of 14 people were contacted, 6 expressed interest, and as of 2/8/2017, 0 of them have taken the Giving What We Can pledge.
There was some unanticipated methodological difficulties, and I do not think you should take the outcome of this experiment too seriously.
What Happened
I originally proposed the experiment here: Can talking about GWWC for 90 minutes actually get somebody to take the Pledge?
(The experiment protocol has not visibly changed)
I proposed to start the experiment if I have at least five interested participants. 7 people expressed interest, so I decided to launch the experiment. I asked the 7 to reconfirm interest; 6 replied.
The 6 participants were asked to each contact between 5-20 friends to talk about the Giving What We Can pledge in the next five days. 4 of the 6 initiated contact with at least one person. The 4 contacted a total of 14 people within five days.
At least 6 of the 14 people expressed interest. We then waited 15 days to see if any of them went on to take the pledge or Try Giving. None of them did, which concludes the experiment.
Timeline:
Jan.8: Experiment First Proposed
Jan.9 - Jan.15: People emailed or otherwise contacted me expressing interest in participating in this experiment.
Jan.17: Participants informed that the experiment will be launched
Jan.19: Participants started contacting their friends. A total of 14 people were contacted.
Jan.24: Participants asked to stop initiating contact with their friends. A total of 6 people expressed interest in participating
Feb.8: Experiment wrap-up
Potential Takeaways
- I was originally hoping for at least 25 data-points, and hopefully closer to 50-100, so this experiment did not really settle the question it was originally set out to settle.
-
The biggest lesson for me is that I should definitely anticipate volunteer attrition and set minimum manpower at 2-4x more than what I would naively expect.
-
I should have been more proactive in providing support to the experimentees. I contacted them an average of 3 times each. If I was to do this again, I (or an assistant) would probably contact the experimentees daily and provide more explicit scripts for interaction.
-
The experiment came right after the Giving What We Can Pledge Drive, and I think there was some level of “talking about GWWC” attrition, such that most people who wanted to volunteer on things relating to GWWC have already done so during the pledge drive. Thus, in the future I will be careful not to schedule similar volunteering projects really close to each other, unless it is to explicitly build off of momentum.
-
The participants who wound up contacting their friends were people who emailed me to express interest, whereas the participants who didn’t were people I already know. This suggest that I’ve somewhat saturated my social network in terms of willingness to do additional EA Outreach (see above).
- Overall, I consider this more of a failed experiment than a negative result. What I mean by this is that negative results give strong evidence for no evidence of effect, but I think there is not nearly enough information here for this to be clear.
Lessons You should NOT have from This
-
You should not update significantly towards “casual outreach about EA is ineffective”, or “outreach has a very low probability of success” since the study is FAR too underpowered to detect even large effects. For example, if talking about GWWC to likely candidates has a 10% chance of making them take the pledge in the next 15-20 days, and the 14 people who were contacted are exactly representative of the pool of “likely candidates”, then we have a .9^14=23% chance of getting 0 pledges.
-
If your hypothesis is 1%: 87%
-
5%: 49%
-
20%: 4.4%
-
How much you should actually update depends on your distribution of prior probabilities. I’m happy to explain the basic Bayesian statistics further if there’s interest, but do not want to digress further from this post.
-
You should not decrease your trust in the usefulness of volunteer work broadly.
Follow-Ups
-
I’m interested in running this experiment again with a much greater sample size and acceptance that a decent % of volunteers will drop out because of lack of time, etc.
-
This would likely have to wait until 3-6 months later, so pledge drive fatigue dies down.
-
I will also probably advertise using either the EA or GWWC mailing list, instead of the EA Forum, which I believe is better for presentations of intellectual work than for calls to action.
-
When I first proposed this experiment, there was an interest in doing 6-month and 12-month followups on the people contacted. I set my calendar to evaluate this again in 6 months, but I do not expect any interesting results (and do not plan to publish uninteresting ones).
Actionable Insights
-
Running (even very simple) experiments is harder than it looks!
-
Be wary of volunteer fatigue.
-
For anything that requires volunteer work, consider recruiting significantly more volunteers than you need.
I was one of the volunteers who ended up not being able to contact anyone. I'd like to renew my interest to try again. While the fault of not contacting is my own, it would be nice to have a longer window in which to initiate contact with people.
It also would be nice to receive a message from the organizer explicitly defining the deadline the day before, which may have motivated me to, e.g., stay up extra late contacting people.
I was also one of the volunteers who experienced a personal motivation failure; my apologies Linch!
I can confirm, having seen and provided some of the data, that it is not nearly enough to provide any meaningful conclusions whatsoever beyond the success/failure of the experiment itself. Still, it is great this experiment was done and written up despite the problems encountered!
Given that it was already unlikely that being put in contact with a GWWC member would have a 10% chance of making them take the pledge, we can now call it very unlikely.
I'm not sure how you're operationalizing the difference between unlikely and very unlikely, but I think we should not be able to make sizable updates from this data unless the prior is REALLY big.
(You probably already understand this, but other people might read your comment as suggesting something more strongly than you're actually referring to, and this is a point that I really wanted to clarify anyway because I expect it to be a fairly common mistake)
Roughly: Unsurprising conclusions from experiments with low sample sizes should not change your mind significantly, regardless of what your prior beliefs are.
This is true (mostly) regardless of the size of your prior. If a null result when you have a high prior wouldn't cause a large update downwards, then a null result on something when you have a low prior shouldn't cause a large shift downwards either.
[Math with made-up numbers below]
As mentioned earlier:
If your hypothesis is 10%: 23% probability experiment confirms it.
If your hypothesis is 1%: 87% probability experiment is in line with this
5%: 49%
20%: 4.4%
Say your prior belief is that there's a 70% chance of talking to new people having no effect (or meaningfully close enough to zero that it doesn't matter), a 25% chance that it has a 1% effect, and a 5% chance that it has a 10% effect.
Then by Bayes' Theorem, your posterior probability should be: 75.3% chance it has no effect
23.4% chance it has a 1% effect
1.24% chance it has a 10% effect.
If, on the other hand, you originally believed that there's a 50% chance of it have no effect, and a 50% chance of it having a 10% effect, then your posterior should be:
81.3% chance it has no effect
18.7% chance it has a 10% effect.
Finally, if your prior is that it already has a relatively small effect, this study is far too underpowered to basically make any conclusions at all. For example, if you originally believed that there's a 70% chance of it having no effect, and a 30% chance of it having a .1% effect, then your posterior should be:
70.3% chance of no effect
29.7% chance of a .1% effect.
This is all assuming ideal conditions.Model uncertainty and uncertainty about the quality of my experiment should only decrease the size of your update, not increase it.
Do you agree here? If so, do you think I should rephrase the original post to make this clearer?
More quick Bayes: Suppose we have a Beta(0.01, 0.32) prior on the proportion of people who will pledge. I choose this prior because it gives a point-estimate of a ~3% chance of pledging, and a probability of ~95% that the chance of pledging is less than 10%, which seems prima facie reasonable.
Updating on your data using a binomial model yields a Beta(0.01, 0.32 + 14) distribution, which gives a point estimate of < 0.1% and a ~99.9% probability that the true chance of pledging is less than 10%.
I trust that you can explain Bayes theorem, I'm just adding that we now can be fairly confident that the intervention has less than 10% effectiveness.
Yeah that makes sense!