It's my great pleasure to announce that, after seven months of hard work and planning fallacy, the EA Survey is finally out.
It's a long document, however, so we've put it together in an external PDF.
Introduction
In May 2014, a team from .impact and Charity Science released a survey of the effective altruist community. The survey offers data to supplement and clarify those anecdotes, with the aim of better understanding the community and how to promote EA.
In addition it enabled a number of other valuable projects -- initial seeding of EA Profiles, the new EA Donation Registry and the Map of EAs. It also let us put many people in touch with local groups they didn’t know about, and establish presences in over 40 new cities and countries so far.
Summary of Important Findings
-
The survey was taken by 2,408 people, 1,146 (47.6%) of whom provided enough data to be considered, and 813 of whom considered themselves members of the EA movement (70.9%) and were included for the entire analysis.
-
The top three sources people in our sample first heard about EA from were LessWrong, friends, or Giving What We Can. LessWrong, GiveWell, and personal contact were cited as the top three reasons people continued to get more involved in EA. (Keep in mind that EAs in our sample might not mean all EAs overall… more on this later.)
-
66.9% of the EAs in our sample are from the United States, the United Kingdom, and Australia, but we have EAs in many countries. You can see the public location responses visualized on a map!
-
The Bay Area had the most EAs in our sample, followed by London and then Oxford. New York and Washington DC have surprisingly many EAs and may have flown under the radar.
-
The EAs in our sample in total donated over $5.23 million in 2013. The median donation size was $450 in 2013 donations.
-
238 EAs in our sample donated 1% of their income or more, and 84 EAs in our sample give 10% of their income. You can see the past and planned donations that people have chosen to made public on the EA Donation Registry.
-
The top three charities donated to by EAs in our sample were GiveWell's three picks for 2013 -- AMF, SCI, and GiveDirectly. MIRI was the fourth largest donation target, followed by unrestricted donations to GiveWell.
-
Poverty was the most popular cause among EAs in our sample, followed by metacharity and then rationality.
-
33.1% of EAs in our sample are either vegan or vegetarian.
-
34.1% of EAs in our sample who indicated a career indicated that they were aiming to earn to give.
The Full Document
You can read the rest at the linked PDF! -->
A Note on Methodology
One concern worth putting in the forefront is that we used a convenience sample, trying to sample as many EAs as we can in places we knew where to find them. But we didn't get everyone.
It’s easy to survey, say, all Americans in a reliable way, because we know where Americans live and we know how to send surveys to a random sample of them. Sure, there may be difficulties with subpopulations who are too busy or subpopulations who don’t have landlines (though surveys now call cell phones).
Contrast this with trying to survey effective altruists. It’s hard to know who is an EA without asking them first, but we can’t exactly send surveys to random people all across the world and hope for the best. Instead, we have to do our best to figure out where EAs can be found, and try to get the survey to them.
We did our best, but some groups may have been oversampled (more survey respondents, by percentage, from that group than are actually in the true population of all EAs) or undersampled (not enough people in our sample from that subpopulation to be truly representative). This is a limitation that we can’t fully resolve, though we’ll strive to improve next year. At the bottom of this analysis, we include a methodological appendix that has a detailed discussion of this limitation and why we think our survey results are still useful.
You can find much more than you’d ever want in the methodological appendix at the bottom of the PDF.
-
In sum, this is probably the most exhaustive study of the effective altruism movement in existence. It certainly exhausted us!
I'm really excited about the results and look forward to how they will be able to inform our movement.
Thank you for doing this survey and analysis. I regret that the feedback from me was primarily critical, and that this reply will follow in a similar vein. But I don’t believe the data from this survey is interpretable in most cases, and I think that the main value of this work is as a cautionary example.
A biased analogy
Suppose you wanted to survey the population of Christians at Oxford: maybe you wanted to know their demographics, the mix of denominations, their beliefs on ‘hot button’ bioethical topics, and things like that.
Suppose you did it by going around the local churches and asking the priests to spread the word to their congregants. The local catholic church is very excited, and the priest promises to mention at the end of his sermon; you can’t get through to the Anglican vicar, but the secretary promises she’ll mention it in the next newsletter; the evangelical pastor politely declines.
You get the results, and you find that Christians in Oxford are overwhelmingly catholic, that they are primarily White and Hispanic, and tend conservative on most bioethical issues, and are particularly opposed to abortion and many forms of contraception.
Surveys and Sampling
Of course, you shouldn’t think that, because this sort of survey is shot through with sampling bias. You’d expect Catholics are far more likely to respond to the survey than evangelicals, so instead of getting a balanced picture of the ‘Christians in Oxford’ population, you get a picture of a ‘primarily Catholics in Oxford with some others’ – and predictably the ethnicity data and the bioethical beliefs are skewed.
I hope EA is non-denominational (or failing that, ecumenical), but there is a substructure to the EA population – folks who hang around LessWrong tend to be different from those who hang around Giving What We Can, for example. Further they likely differ in ways the survey is interested in: their gender, their giving, what causes they support, and so on. To survey of ‘The Effective Altruism Movement’, the EAs who cluster in both need to be represented proportionately (ditto all the other subgroups).
The original plan (as I understand) was to obviate the sampling concerns by just sampling the entire population. This was highly over-confident (when has a voluntary survey captured 90%+ of a target population?) and the consequences of its failure to become a de facto ‘EA census’ significant. The blanket advertising of the survey was taken up by some sources more than others: LessWrong put in on their main page, whilst Giving What We Can didn’t email it around – for example. Analogous to the Catholics and the Pentecostals, you would anticipate LWers to be significantly over-sampled versus folks in GWWC (or, indeed, versus many other groups, as I’d guess LW’s ‘reach’ to its membership via its main page is much better than many other groups). Consequently results like the proportion of EAs who care about AI/x-risk, where most EAs live, or what got them involved in EA you would predict to be slanted towards what LWers care about, where LWers live (bay area), or how LWers got involved in EA (LW!).
If the subgroups didn’t differ, we could breathe a sigh of relief. Alas, not so: the subgroups identified by URL significantly differ across a variety of demographic information, and their absolute size (often 10-20%) makes the difference practically as well as statistically significant – I’d guess if you compared ‘where you heard about EA’ against URL, you’d see an even bigger difference. It may understate the case – if one moved from 3 groups (LW, EA FB, contacts) to 2 (LW, non-LW), one may see more differences, and the missing variable issues and smaller subgroup size mean the point estimates for (e.g.) what proportion of LWers care about X-risk is not that reliable.
Convenience sampling is always dicey, as unlike probabilistic sampling any error in parameter estimate due to bias will not expectedly diminish as you increase the sample size. However, the sampling strategy in this case is particularly undesirable as the likely bias runs pretty much parallel to the things you are interested in: you might hope that (for example) the population of the EA facebook might not be too slanted in terms of cause selection compared to the ‘real’ EA population – not a group like GWWC, LW, CFAR, etc.
What makes it particularly problematic is that it is very hard estimate the ‘size’ of this bias: I wouldn’t be surprised if this survey only oversampled LWers by 5-10%, but I wouldn’t be that surprised if it oversampled LWers by a factor of 3 either. The problem is that any ‘surprise’ I get from the survey mostly goes to adjusting my expectation of how biased it is. Suppose I think ‘EA’ is 50% male and I expect the survey to overestimate the %age male by 15%. Suppose the survey said EA was 90% male. I am going to be much more uncertain about the degree of over-representation than I am about what I think the ‘true EA male fraction’ is. So the update will be to something like 52% male and the survey overestimating by 28%. To the extent I am not an ideal epistemic agent, feeding me difficult to interpret data might make my estimates worse, not better.
To find fault is easy; to plan well, difficult
Science rewards caution and planning; many problems found in analysis could only have been fixed in design, and post-hoc cleaning of data is seldom feasible and still seldomer easy. Further planning could have made the results more interpretable. Survey design has a variety of jargon like “population definition”, “sampling frame”. More careful discussion of what the target population was and how they were going to be reached could have flagged the sampling bias worry sooner, likewise how likely a ‘saturation’ strategy was to succeed. As it was most of the discussion seemed to be focused on grabbing as many people as possible.
Similarly, ‘baking in’ the intended analysis plan with the survey itself would have helped to make sure the data could be analysed in the manner intended (my understanding – correct me if I’m wrong! – is that the planning of exactly what analysis would be done happened after the survey was in the wild). In view of the sampling worries, the analysis was planned to avoid giving aggregate measures sensitive to sampling bias, but instead explore relationships between groups via regression (e.g. what factors predict amount given to charity). However, my understanding is this pre-registered plan had to be abandoned as the data was not amenable. Losing the pre-registered plan for a new one which shares no common elements is regrettable (especially as the new results are very vulnerable to sampling bias), and a bit of a red flag.
On getting better data, and on using data better
Given the above, I think the survey offers extremely unreliable data. I’m not sure I agree with the authors it is ‘better than nothing’, or better than our intuitions - given most of us are imperfect cognizers, it might lead us more astray to the ‘true nature’ of the EA community. I am pretty confident it is not worth the collective time and energy it has taken: it probably took a couple of hundred hours of the EA community’s time to fill in the surveys, leave alone the significant work from the team in terms of design, analysis, etc.
Although some things could not have been helped, I think many things could have, and there were better approaches ex ante:
1) It is always hard to calibrate one’s lack of knowledge about something. But googling things like ‘survey design’, ‘sampling’, and similar are fruitful – if nothing else, they suggest that ‘doing a survey’ is not always straightforward and easy, and put one on guard for hidden pitfalls. This sort of screening should be particularly encouraged if one isn’t a domain expert: many things in medicine concord with common sense, but some things do not, likewise statistics and analysis, and no doubt likewise many other matters I know even less about.
2) Clever and sensible the EA community generally is, it may not always be sufficient to ask for feedback on a survey idea and then interpreting the lack of response as a tacit green light. Sometimes ‘We need expertise and will not start until we have engaged some’, although more cautious, is also more better. I’d anticipate this concern will grow in significance as EAs tackle things ‘further afield’ from their background and training.
3) You did get a relative domain expert raise the sampling concerns to you within a few hours of going live. Laudable though it was that you were responsive to this criticism and (for example) tracked URL data to get a better handle on sampling concerns, invited your critics to review prior drafts and analysis, and mention the methodological concerns prominently, it took a little too long to get there. There also seemed a fair about of over-confidence and defensiveness – not only from some members of the survey team, but from others who thought that, although they hadn’t considered X before and didn’t know a huge amount about X, that on the basis of summary reflection X wasn’t such a big deal. Calling a pause very early may have been feasible, and may have salvaged the survey from the problems above.
This all comes across as disheartening. I was disheartened too: effective altruism intends to put a strong emphasis on being quantitative, getting robust data, and so forth. Yet when we try to practice what we preach, our efforts leave much to be desired (this survey is not the only – or the worst – example). In the same way good outcomes are not guaranteed by good intentions, good information is not guaranteed by good will and hard work. In some ways we are trailblazers in looking hard at the first problem, but for the second we have the benefit of the bitter experience of the scientists and statisticians who have gone before us. Let us avoid recapitulating their mistakes.
Why isn't the survey at least useful count data? It allows me to considerably sharpen my lower bounds on things like total donations and the number of Less Wrong EAs.
I think count data is the much more useful kind to take away even ignoring sampling bias issues, because the data in the survey is over a year old, i.e. Even if it were a representative snapshot of EA in early 2014, that snapshot would be of limited use. Whereas most counts can safely be assumed to be going up.