Please Take the 2020 EA Survey

by Peter_Hurford1 min read11th Nov 202028 comments

151

Data (EA Community)Community
Frontpage

If you would like to share the EA Survey with others, please share this link: https://www.surveymonkey.com/r/EAS2020Share

The survey will close on the 10th of December at midnight GMT.

-

What is the EA Survey?

The EA Survey provides valuable information about the demographics of the EA community, how people get involved, how they donate, what causes they prioritise, their experiences of EA, and more.

The estimated average completion time for the main section of this year’s survey is 20 minutes. There is also an ‘Extra Credit’ section at the end of the survey, if you are happy to answer some more questions.

 

What's new this year?

There are two important changes regarding privacy and sharing permissions this year:

1) This year, all responses to the survey (including personal information such as name and e-mail address) will be shared with the Centre for Effective Altruism unless you opt out on the first page of the survey.

2) Rethink Priorities will not be making an anonymised data set available to the community this year. We will, however, consider requests for us to provide additional aggregate analyses which are not included in our main series of posts.

 

Also the Centre for Effective Altruism has generously donated a prize of $500 USD that will be awarded to a randomly selected respondent to the EA Survey, for them to donate to any of the organizations listed on EA Funds. Please note that to be eligible, you need to provide a valid e-mail address so that we can contact you.

We would like to express our gratitude to the Centre for Effective Altruism for supporting our work.

28 comments, sorted by Highlighting new comments since Today at 1:53 AM
New Comment

Thanks for organising this! I think the survey is very valuable! I was wondering if you could you say more on why you "will not be making an anonymised data set available to the community"? That seems initially to me like an interesting and useful thing for community members to have, and was wondering whether it was just a lack of resources/it being difficult, that meant that you weren't doing this anymore.

Thanks!

Roughly speaking, there seem to be two main benefits and two main costs to making an anonymised dataset public. The main costs: i) time and ii) people being turned off of the EA Survey due to believing that their data will be available and identifiable. The main benefits: iii) the community being able to access information (which isn't included in our public reports) and iv) transparency and validation from people being able to replicate our results.

Unfortunately, the dataset is so heavily anonymised in order to try to reduce cost (ii) (while simultaneously increasing cost (i)), that it seems impossible for people to replicate many of our analyses (even with the public dataset), because the data is so heavily obscured, essentially vitiating (iv). We have considered, and are considering, other options like producing a simulated dataset for future surveys in order to allow people to complete their own analyses, if there were sufficient demand, but this would come at an even higher time cost. Conversely, it seems benefit (iii) can be attained, in the main, without releasing a public dataset, just by producing additional aggregate analyses on request (where possible).

Of course, we'll see how this system works this year and may revisit it in the future.

To add to that, if there are concerns about data being de-anonymized, there are statistical techniques to mitigate it.

Do you or anybody else reading this have experience with differential privacy techniques on relatively small datasets (less than 10k people, say)? 

I've only heard of differential privacy used in the context of machine learning and massive datasets.

Well, I am far from expert, but my understanding is that differential privacy  operates on queries as opposed to individual datapoints. But there are tools s.a. randomized response which will provide plausible deniability to individual responses.

Regarding the $500 "prize":

I've seen this sort of prize for a few things recently. I don't really understand how it's supposed to incentivise me to complete the survey. The total money donated is $500 regardless of how many complete the survey, so unless I think that I'm at least as well informed than the average respondent about where this money should go (which I definitely don't!) then if anything, isn't it an incentive to not complete the survey?

unless I think that I'm at least as well informed than the average respondent about where this money should go

This applies if your ethics are very aligned with the average respondent, but if not, it is a decent incentive. I'd be surprised if almost all of EAs' disagreement on cause prioritization were strictly empirical.

I think my ethics are less considered than the average EA community member, so I think I'd rather defer the decision to them. Doesn't seem especially motivating for me personally.

Presumably part of the idea is that it is somewhat incentivising while also being very cheap: the money goes to places CEA would like to support anyway, and doesn't really motivate non-EAs to take the survey.

A different concern is it is not clear to me how counterfactually valid the donation is.

I agree this won't be an incentive to many EAs. So long as it serves as an incentive to some respondents, it still seems likely to be net positive though. (Of course, it's theoretically possible that offering the prize might crowd out altruistic motivations (1) (2) (3), but we don't have an easy way to test this and my intuition is that the overall effect would still be net positive).

I would hope that concerns about being less well placed to make the donation would not incentivise people to not take the EA Survey, just so that they don't risk winning the prize and making a sub-optimal donation. If the respondent doesn't feel comfortable just delegating the decision elsewhere, they could always decline the prize, in which case it could be given to another randomly selected respondent.

For what it's worth I thought it was a nice touch and agree it's likely to incentivise some and unlikely to put off any (/many)

Right, that makes sense, thanks. To clarify, I don't actually think anyone will be put off taking the survey because of this. I will definitely be taking it anyway :)

Thanks for organising, always enjoy filling it in each year! Did questions on religious belief/practice get dropped this year? Or perhaps I just autopiloted through them without noticing. Aware that there are lots of pressures to keep the question count low, but to flag as part of EA for Christians we always found it helpful for understanding that side of the EA community.

Thanks Alex! Yeh, due to the space constraints you mention, we're planning to run some questions (which mostly stay very similar across multiple years) only every other year. The same thing happened to politics and diet.

This is, of course, not ideal, since it means that we can't include these variables in our other models or examine, for example, differences in satisfaction with EA among people with different religious stances or politics, every other survey.

Thanks for explicitly mentioning that you found these variables useful. That should help inform discussion in future years about what questions to include.

I was disappointed to see this. I think there is a strong 'What gets measured gets done" effect, so the fact that some demographic questions (race, sexual preference) are recorded while others (politics, diet, religion) are not is significant. In particular, I think it tends to lead to efforts to reach out to groups which the data shows to be under-represented, while those without data are neglected.

Thanks for your feedback! It's very useful for us to receive public feedback about what questions are most valued by the community.

Your concerns seem entirely reasonable. Unfortunately, we face a lot of tough choices where not dropping any particular question means having to drop others instead. (And many people think that the survey is too long anyway implying that perhaps we should cut more questions as well.)

I think running these particular questions every other year (rather than cutting them outright) may have the potential to provide much of the value of including them every year, given that historically the numbers have not changed significantly across years. I would be less inclined to think this if we could perform additional analyses with these variables (e.g. to see whether people with different politics have lower NPS scores), but unfortunately with only ~3% of respondents being right-of-centre, there's a limit to how much we can do with the variable. (This doesn't apply to the diet measure which actually was informative in some of our models.)

Have you considered running different question sets to different people (randomly assigned)?

It could expand the range of questions you can ask.

Thanks for the suggestion. We have considered it and might implement it in future years for some questions. For a lot of variables, I think we'd rather have most data from almost all respondents every other year, than data from half of respondents every year. This is particularly so for those variables which we want to use in analyses combined with other variables, but applies less in the case of variables like politics where we can't really do that.

Ah yes, that makes sense and I hadn't thought of that

I think this was the first time I completed it in years! Thank you for organising.

Some notes: I was confused why I was supposed to select only up to three responses for EA entities which might have positively influenced my impact, but had unlimited responses for negative influences. I also thought it was a bit odd that I was asked to estimate my giving for 2020 but not my income.

Hi Denise,

Thanks for taking the survey!

The questions about positive/negative influences were CEA requests (although we did discuss them together): I believe the rationale is that for positive influences, they were interested in the most important influences (and wanted to set a higher bar by preventing people indicating that more than three things had the “largest” influence on them), whereas for the possible negative influences, they were interested in anything which had a negative influence, not merely the largest negative influences.

Regarding donations: historically, we have always asked about the previous year’s income and donations (because these are known quantities) and then planned donations for the year the survey is run (since people likely won’t know this for sure, but it’s still useful to know, for broader context. Now that we launch the survey right at the end of the year, the difference between past and planned donations is likely less acute. Naturally, it would be ideal if we could ask for income and donation data for both the previous year and the present year, but we constantly face pressure to include other questions, while trying to maintain survey length, so we have had to leave out a lot of things. (This also explains why we had to cut the questions we had in previous years asking about ‘individual’ and ‘household’ figures, given that many people’s earnings/donations are part of a unit).

Darn, I donated a lot more in 2019 than 2020. 

That should work well for you this year then: this year you'll report how much you donated in 2019 (and how much you plan to donate in 2020). Next year you'll report how much you actually donated this year and how much you plan to donate overall next year.

The two questions were subtly different for other reasons. If I understand it correctly, the first question asks for positive influences on your impact, and the second question for negative influences on your EA involvement

So for example, being persuaded that less involvement in EA -> increasing your impact will lead you to select the same response for both.

Looking forward to seeing the results!  

Minor comment: there was a question that asked for "up to 1" response in each column, where I wasn't sure if this was a typo (and should have said row); I couldn't really make sense of it either way, with the two columns that were available.

Thanks!

If you are referring to the question I think you're referring to, then we really do mean that people should select up to one option in each column: one column for whichever of the options (if any) was the source of the most important thing you learned and one column for whichever of the options (if any) was the source of the most important new connection you made.

I was curious about the formatting of some of your demographic questions. For example this question;

28. Your gender:

provides only a free text box, with no standard options. This is often considered poor survey technique, because it can lead to a very broad range of responses, which require a lot of manual work on the backend. You will need to manually determine whether 'woman', 'Female', 'Lady', 'f' etc. are the same thing, and what you want to do with someone who says 'Dude'. Not only is this time consuming but it adds subjectivity to your analysis. It also increases the amount of work required from your respondents - if they are on their iPhone they will have to manipulate the keypad, rather than just pressing once.

Since you are using SurveyMonkey, you have access to their SurveyMonkey Certified Questions:

This certified question was added from our Question Bank. It was written by our methodologists to minimize bias and get the most accurate responses. 

If you edit the wording of this question, it'll no longer be certified, which means it might be subject to bias and accuracy issues.

Most of their accredited gender questions avoid these problems by giving you simple options to click. This will likely be optimal for the vast majority of your respondents, and if you wanted to be politically correct you could always include an 'Other' box!

Strangely, it seems like for the race/ethnicity option you go in the opposite direction, by providing the full list of standard US options for people to select from. This includes 'Native Hawaiian or Other Pacific Islander', even though I think less than 0.1% of the global population fall into this composite category. If you are concerned about space limitations I would have considered removing this category, as well as the Alaskan Native one, implicitly folding them into the 'other' box.

Hi Dale. Thanks for your comment.

The gender question and many of the other demographic questions were selected largely to ensure comparability with other surveys run by CEA.

That aside, I think your claim that open comment gender questions are "considered poor survey technique" is over-stated. The literature discusses pros and cons to both formats. From this recent article in the International Journal of Social Research Methodology:

One of the simplest ways to collect data on gender identity is to use an open text box (see Figure 2) which allows participants the freedom to describe their gender in whatever way they see fit while accommodating changing norms around acceptable terminology. Terms commonly used around gender evolve over time... It would therefore be misguided of researchers to attempt to find the most contemporary terminology and use it to the exclusion of all other terms. Research teams are also likely to find such a process difficult and frustrating (Herman et al., 2012). Thus, an open text box is certainly the most accommodating approach to a range of evolving terms to describe gender identity.

If open text boxes are used for research that intends to analyze by category, however, researchers will still ultimately be categorizing the gender identities in order to define groups for statistical analysis and groups to which the findings might be generalized... These decisions will also need to be made if researchers using a multiple-choice approach choose to provide a long list of as many gender identity terms as possible. This approach is a fine option, but researchers need to be cognisant that terminology that was in common use when a tool was published may no longer be current when research is conducted using that tool... Good arguments can be made for the value of participants being able to see the specific term for their gender identity among a list of possibilities, but even Herman’s and Kuper’s lists, published within the past decade, contain terms that are increasingly considered problematic and do not contain some terms that are more common today.

An approach which provides a smaller number of options for gender identity has benefits and drawbacks. Providing fewer categories inevitably forces gender minority participants to place themselves into categories that the researcher provides, but gives the advantage that the participant, not researcher, chooses the categories in which they will be included.