Summary: The EA movement could do more good by (1) evaluating and comparing whole cause areas (rather than interventions within cause areas) (2) evaluating EA organisations and (3) doing research on how best to evaluate things. In this post, I make the case that all three things are important, neglected, and (on average) tractable.
This is a quick write-up because I don’t have much time at the moment, but I thought it was still worth sharing given the impact of what I describe. The probabilities I give are thus trying to give something to disagree about and translating my inside view into Fermi estimates; they are, for most of them, not the result of an in-depth analysis.
In this post, I make the case that a fruitful new cause area could be the ‘meta-cause’ of Evaluating Causes and Interventions. It seems to me that comparing and evaluating cause areas is under-prioritised in the EA community, and this leads to a great waste of resources.
We can split this cause area into three parts:
- Comparing different cause areas
- Evaluation of EA organisations
- Research to improve our ability to evaluate
I discuss these three parts separately and do an ITN analysis for each because I think that each part is valuable in itself. However, because they’re related, it also makes sense to consider them together.
Comparing different cause areas
It seems to me that we’re reproducing at the level of causes the very fallacy that the EA movement set out to correct in the first place with GiveWell and other organisations. That is, people who are not familiar with EA often (implicitly) think something like,
‘Charities probably have similar impacts, so as long as I donate to charity, that’s fine: it doesn’t particularly matter which charity I donate to’.
Part of the value of the EA movement is to point out that some charities are orders of magnitude more effective than others. But I think that the EA movement is guilty of a similar fallacy not within cause areas, but between popular EA cause areas. We present all the causes as if they had the same impact, letting people form their own opinions on the relative importance of each cause area. But most people don’t have the time or expertise to develop informed opinions about this.
But if one cause area is orders of magnitude more impactful than the others, then the EA movement should prioritise that area, just as we prioritise the most effective interventions and charities within cause areas. This would affect people’s career choices and the way money is allocated, and we’d probably want to work much harder on the cause that’s most impactful.
It might be true that one or two cause areas are orders of magnitude more impactful than others. We should put lots of thought into this, because if some cause areas are vastly more impactful than others, we could vastly increase our impact by prioritising them. For example, here are three questions that I think are extremely important for inter-cause area prioritization. The answers to these questions could influence billions of dollars of funding:
- What’s the likelihood of an irrecoverable civilizational collapse?
- Are biorisks X-risks? (I’m excited that Open Phil has put out a request for proposals on this)
- Are nuclear risks X-risks?
In my opinion, questions like these are under-explored. Some people have attempted to compare causes, but mostly as individual side projects, rather than as a core focus of the movement. The organisations that are closest to doing something like this publicly are QURI (especially Nuño Sempere) and Open Philanthropy. Open Phil is the main organization that has produced reports that enable us to make progress on cause comparison answering important questions within causes, and Nuño Sempere has tried to compare cause areas using Fermi estimates (e.g. AI safety & ALLFED). For a comprehensive description of what has been done in the area, see this post, which elaborates more on all this in the ‘Whodunnit’ section. I think that we should compare cause areas much more systematically, and that several organisations could usefully work on this.
Evaluation of this kind would also help us to think about how cause areas interact with each other. For instance, not many people think about the way that AGI safety interacts with other causes. I think the community would make different choices if more people successfully propagated their beliefs from one area into other areas. As I argued in this tweet, most people don’t propagate the consequences of their beliefs about AGI to their other beliefs.
A disadvantage of this sort of research is that it can be emotionally harsh: most people are at least somewhat emotionally-attached to cause areas that they have worked in or donated to, and they might find it difficult if those areas are less prioritized. Relatedly, it could lead to conflict. It could also lead to some people not getting involved in EA who otherwise would have. Because I expect that some causes will turn out to be far more impactful than others, I think this is likely to happen, at least somewhat.
Nonetheless, I think this type of research would be very beneficial for the EA movement, because I would expect the impact gained to be far greater than the impact lost. It could increase EA’s overall impact greatly and prevent us wasting resources. It would also ensure that everyone who’s intrinsically motivated to do the most good is not led astray by confusion about what the most impactful areas are.
ITN of ‘comparing cause areas’
Here’s an ITN evaluation of this sub-area:
If some cause areas are orders of magnitude more iimportant than others, we should try to find this out. This will greatly increase the productivity of our resource allocation. I think that’s it’s 70% likely that one cause is at least one order of magnitude more impactful than others. If it were expressed much more clearly, I’d expect it to increase the number of people working in the most impactful area by 20% of new EAs on average, and 40% among the most talented people (because they are often very versatile).
On the other hand, I’d expect this to cause 5-15% of EAs who’re working in less impactful areas feeling less EA or leaving EA. Likewise, I’d expect 5-15% less members to enter these cause areas.
I think this is fairly tractable. I have formed strong personal beliefs on this by reading a lot about different cause areas and thinking about their differences in impact; if people can usefully do this on an individual level in their spare time, a fortiori a dedicated organisation could do it more effectively. I also think that there are several arguments that exist and are compelling, but that aren’t generally used to compare to other cause areas:
- There are 10^19 insects, so if they’re suffering significantly, that’s an ongoing moral catastrophe
- AGI will be the main driver of history once it happens, so shaping it correctly is both important for preventing extinction, but also probably the best way to affect the long-term future.
The biggest bottleneck to tractability is if only very senior people could do this. Because senior researchers are rare, that could limit the potential to do that kind of research. Nevertheless, it seems to me that more junior researchers supervised my senior researchers could probably do a good job at it. I’d be surprised if we couldn’t make progress with 3 additional FTE senior researchers for a year and 10 additional FTE junior researchers.
A few major questions haven’t found answers yet (see below). It seems that less than 5 FTE people are working on this.
Evaluation of EA Organisations
Within some EA cause areas, there are meta-organisations that evaluate other organisations for impact (for example, Animal Charity Evaluators and Givewell), but there is no such organisation for EA organizations themselves. However, just as everywhere else, EA organizations are likely to differ greatly in impact.
Some organisations might be ineffective or even net negative. For example, I suspect that Fathom Radiant is likely to be net negative, but since I can’t access their private information, I can’t be sufficiently certain about this and therefore won’t be able to make a case which is strong enough for it to stop or change their plans, even if it were in fact deeply net negative. If there were an equivalent of GiveWell for every EA organization, it could help us to recognize when organizations are net-negative or ineffective, and generally let us allocate money much more effectively.
Nuño Sempere has published a valuable shallow evaluation of longtermist organisations, but I think that dozens of people could fruitfully do this, given how much grantmaking EA will do over the next few years.
Again, if we evaluate organisations in this way, it risks hurting the feelings of people who work at organisations that are evaluated as less impactful or as harmful. It might be difficult to find people who are a good fit for this work because they need to not only be competent, but widely trusted. However, I think that this should be one of EA’s highest priorities, given that many organisations are starting right now (and that’s very positive on average!), and some organisations have a very high status but few observable results. And if something becomes known as a high priority and a bottleneck in EA, this will likely draw talented people to do it.
ITN of ‘Evaluating EA organisations’
This could be very impactful, since it could increase the amount of high-quality work that gets funding and support. It will incentivise organisations not to overestimate their future success in grant applications, to do better work, and to help other organisations do better work through feedback and sharing information and methods. I expect this to increase by ~5% the productivity of grantmaking if a 1y evaluation was mentioned clearly in grant applications and to increase EA orgs’ ex-post productivity by 2-5% thanks to knowledge sharing, 5 to 10% thanks to incentives to do better, and by 5% if actionable insights were provided by evaluators.
I also think it could reduce the likelihood that an EA org working on AGI would push the timelines by more than 1 year by 40%.
This is very tractable. Nuño Sempere & Larks have already produced good shallow evaluations, which proves that even with mostly public information we can achieve something very valuable. It might be harder to evaluate research organizations that have no competitors, because if they’re researching very difficult questions, even if they were good they’d most likely have no results for a while. This is also one reason why competition is beneficial: it enables comparison.
This is very neglected: as far as I know, two people are single-handedly doing this job every year. Now that there are probably over a hundred organisations, it’s worth having (at least) an organisation dedicated to evaluation.
Research to Improve Our Ability to Evaluate
If we evaluate cause areas or organisations, but do so badly, then it’s far less valuable. Therefore, it could be highly impactful to do research that improves our ability to evaluate things in a limited amount of time. Beyond initiatives such as Squiggle that aim at making probability estimates easier, there are also research questions that could be useful for evaluation. For example, we might consider questions such as:
What is the value of information?
…and how does it depend on what I already know, or the time I’ve already spent researching a topic? How much should I try to learn before switching to optimising for impact based on what I already know?
Grantmakers face this question all the time - they have to decide ‘should I spend one more hour evaluating this grant application which is very uncertain?’ Maybe we can develop metrics that will allow us to make quick estimates of the value of marginal information (or resilience). We can use these to decide that when the value of marginal information is below a certain threshold, you shouldn’t spend more time evaluating the proposal.
Such metrics would be useful not only for grantmakers, but for everyone who wants to maximise their impact. It’s impossible to know everything about every topic, so it’s useful to develop intuitions or heuristics about ‘when should I stop researching or thinking about this’, in the same way that having an intuition about the value of one’s time can be useful.
How should we update on our updates?
This question is especially relevant to AI timelines, but I expect it to be generally useful. If you always update in the same direction, you should probably also update on the fact that you always update in the same direction, and thus that there’s something wrong about your underlying model which you should update. But how much should you update?
ITN of ‘research to improve our ability to evaluate’
I think the research of this kind could be highly important because it would make us better at evaluating things and therefore more likely to make the correct evaluations. Our allocation of money might change substantially depending on how we answer the two questions above - and there are probably many more important questions like this.
Figuring out a heuristic to use on the value of information would save at least hundreds of hours of EA grantmakers and probably thousands. Answering to the question on update could change community’s AGI timelines probably by up to 5 years (70% of at least a 1 year change, 40% of at least a 3 year change and 10% of at least a 5 year change) and thus affect community’s optimal spendings, governance plans etc.
I think it’s quite likely that the tractability (50%) of this cause is very low. That said I think that attempts at making progress on this question will provide a lot more information on the tractability of it. The only work related to this that I know of is Greg Lewis’ writings on resilience.
I’d guess that less than 5 FTE people are currently working on this, and probably way fewer than that.
If you’re convinced by my ITN analysis of the sub-parts, the meta-cause area of ‘evaluating cause areas and organisations’ seems generally promising. If we focus on this, we can get a clearer sense of what type of work has the greatest impact.
Objections to this Cause Area
Isn’t this just ‘global priorities research’?
I think ‘global priorities research’ makes sense as a name for this cause area, but I don’t think that the Global Priorities Institute (for example) is doing what I’ve described. GPI’s theory of change is focused on making academia evolve; this leads them to produce research that’s far more theoretical than what I have in mind. I’ve almost never updated in a practically useful way (i.e in a way that changed my view about an organization or a cause area) by reading GPI’s work, whereas I’ve updated a lot based on Nuño’s and Lark’s work. I think this is because GPI’s work is not very empirical in its approach. I find reports like those produced by Luisa Rodriguez and Ajeya Cotra more useful: in-depth investigations on a topic of major importance for the EA community, with empirical Fermi estimates of the relevant parameters and probability estimates. So although what I describe could in principle be called ‘global priorities research’, it’s somewhat different from what people who currently use the label ‘global priorities researcher’ are doing.
Isn’t inter-cause prioritization useless if the bottleneck is who can work where?
In practice, the most talented people can often work anywhere, so it’s a significant waste not to orient them towards the most impactful thing, if it’s an order of magnitude more impactful. When people choose their careers, they make trade-offs between impact and personal preferences. If it’s clear that one cause area is far more impactful than others, this will naturally encourage more people to work in that area because they truly want to have a huge impact.
To solve these problems, I would recommend:
- Help Nuno Sempere scale what he does, i.e cause comparisons and EA orgs evaluation. Potentially pair him with a recruiter and an ops respectively because it’s non-trivial to find the right people to do this job and because scaling things is time-consuming.
- Find someone excited about bayesianism and give him/her a grant to work on the question on meta-update or put a request for proposals on this question.
- Give a grant to a generalist, ideally with some grantmaking experience (e.g a FTX regranter), to work on developing a heuristic or a tool to estimate the value of information of a marginal hour spent on evaluating a grant. This would surely make sense that Greg Lewis be involved into this.
- Make a grant or a request for proposal on estimating the likelihood of irrecoverable collapse. Past or current volunteers/staff at ALLFED might be good fits for this question because they are used to thinking about a wide range of catastrophes.
Overall, I’d expect most of these grants to be one-off grants and thus to be extremely cost-effective. I expect the bottleneck to be more on the "finding the right person" side but my guess is that paying a recruiter is likely to be worth for the questions I mentioned.
Here’s a BOTEC for the the EA org evaluator I mentioned and for which I think that Nuno is a good fit:
- cost from 500k to 5M$/year,
- improve by ~5-20% the productivity of the EA community as a whole,
- EA orgs probably receive ~300M$/year
- overall it would increase by 15-60M$ the grantmaking with the average efficiency of EA grantmaking.
So the ROI would be between 3 and 120 with a mode at 2M$ for the cost and 30M$ for the returns, i.e a ROI of 15.
I think that these three types of research - comparing different cause areas, evaluating EA organizations, and improving our ability to evaluate - could be very impactful, though not necessarily scalable. I think that they would have impact on the impact of the entire EA community, and thus that they are a very high priority.
If you’re a generalist who likes digging deep into a wide range of topics and you’d consider an offer to work on this, put your email here. If you’re a funder interested in funding this kind of work, put your email here.
Ideas are from Simeon, but this post was heavily edited by Amber. If you would be interested in working with Amber to help you write up your ideas, fill out this form.