Click the link above to see the full article and charts. Here is a summary I wrote for the latest edition to the 80,000 Hours newsletter, or see the Twitter version.

Is it really true that some ways of solving social problems achieve hundreds of times more, given the same amount of effort?

Back in 2013, Toby Ord1 pointed out some striking data about global health. He found that the best interventions were:

  • 10,000x better at creating years of healthy life than the worst interventions.
  • 50x better than the median intervention.

He argued this could have radical implications for people who want to do good, namely that a focus on cost-effectiveness is vital.

For instance, it could suggest that by focusing on the best interventions, you might be able to have 50 times more impact than a typical person in the field.

This argument was one of the original inspirations for our work and effective altruism in general.

Now, ten years later, we decided to check how well the pattern in the data holds up and see whether it still applies – especially when extended beyond global health.

We gathered all the datasets we could find to test the hypothesis. We found data covering health in rich and poor countries, education, US social interventions, and climate policy.

If you want to get the full picture on the data and its implications, read the full article (with lots of charts!):

The bottom line is that the pattern Toby found holds up surprisingly well.

This huge variation suggests that once you’ve built some career capital and chosen some problem areas, it’s valuable to think hard about which solutions to any problem you’re working on are most effective and to focus your efforts on those. 

The difficult question, however, is to say how important this is. I think people interested in effective altruism have sometimes been too quick to conclude that it’s possible to have, say, 1,000 times the impact by using data to compare the best solutions.

First, I think a fairer point of comparison isn’t between best and worst but rather between the best measurable intervention and picking randomly. And if you pick randomly, you expect to get the mean effectiveness (rather than the worst or the median). 

Our data only shows the best interventions are about 10 times better than the mean, rather than 100 or 1,000 times better.

Second, these studies will typically overstate the differences between the best and average measurable interventions due to regression to the mean: if you think a solution seems unusually good, that might be because it is actually good, or because you made an error in its favour. 

The better something seems, the greater the chance of error. So typically the solutions that seem best are actually closer to the mean. This effect can be large.

Another important downside of a data-driven approach is that it excludes many non-measurable interventions. The history of philanthropy suggests the most effective solutions historically have been things like R&D and advocacy, which can’t be measured ahead of time in randomised trials. This means that restricting yourself to measurable solutions could mean excluding the very best ones.

And since our data shows the very best solutions are far more effective than average, it’s very bad for your impact to exclude them.

In practice, I’m most keen on the “hits-based approach” to choosing solutions. I think it’s possible to find rules of thumb that make a solution more likely to be among the very most effective, such as “does this solution have the chance of solving a lot of the problem?”, “does it offer leverage?”, “does it work at all?”, and “is it neglected?” 

Hypothetically, if we could restrict ourselves to solutions that are among the top half and then pick randomly from what remains, we can expect a cost-effectiveness that’s about twice the mean. And I think it’s probably possible to do better than that. Read more in our article on choosing solutions.

So, suppose you use a hits-based approach to carefully pick solutions within an area. How much more impact can you have?

My overall take is something like 10 times more. I feel pretty uncertain, though, so my range is perhaps 3-100 times.

A 10-times increase in impact given the same amount of effort is a big deal. It’s probably underrated by the world at large, though it may be overrated by fans of effective altruism.

A final thought: I think you can increase your impact by significantly more than 10 times by carefully choosing which problem area to focus on in the first place. This is a big reason why we emphasise problem selection in career choice so much at 80,000 Hours. Overall, we’d say to focus on exploring and building career capital first, then start to target some problem areas, and only later focus on choosing solutions.

Comments11


Sorted by Click to highlight new comments since:

I am worried that the cited data does not really inform this question -- as we can always choose solutions that leverage "conjunctions of multipliers" (e.g. advocacy, changing trajectories) so that real variance also in solutions should be *much* larger than 10x for anyone being a funder within a cause area.

To make this more concrete, when choosing what to fund in climate one would not choose between different policies or lifestyle actions (the evidence for climate presented here), but between fundable opportunities that stack different impact multipliers on top of each other, e.g. advocacy instead of direct action, supporting policies with large expected long-term consequences (e.g. by accelerating technological change, whereas -- AFAICT -- the data from Gillngham and Stock displayed here is their static case which they describe as focused on current technology and  project cost, contrasted to their dynamic case studies which seems much more likely to be something attractive to fund), etc.

So, it seems to me the evidence presented significantly underplays the real variance in solution effectiveness that a funder faces because it uses data on single-variable direct intervention variance as a proxy for variance in effectiveness of interventions, despite the most effective interventions not usually being direct actions (certainly outside GHD, most EA funding does not buy equivalents of malaria nets for ex-risk etc.) and not actions where impact differentials can be easily quantified with certainty (despite being very large in expectation). 

This also seems to potentially lead to biased comparisons between solution variance and cause level variance given how strongly differences in cause level variance are driven by expected value calculations (value of the future, etc.) that are far more extreme / speculative to what people comparing interventions on single interventions would have data on.

Hey,  thanks for the comments. Here are some points that might help us get on the same page:

1) I agree this data is missing difficult-to-measure hits based interventions, like research and advocacy, which means it'll understate the degree of spread.

I discuss that along with other ways it could understate the differences here:

https://80000hours.org/2023/02/how-much-do-solutions-differ-in-effectiveness/#ways-the-data-could-understate-differences-between-the-best-and-typical-interventions

 

2) Aside: I'm not sure conjunction of multipliers is the best way to illustrate this point. Each time you add a multiplier it increases the chance it doesn't work at all. I doubt the  optimal degree of leverage in all circumstances is "the most possible", which is why Open Philanthropy supports interventions with a range of degree of multipliers (including those without), rather than putting everything into the most multiplied thing possible (research into advocacy into research into malaria..). (Also if adding multipliers is the right way to think about it, this data still seems relevant, since it tells you the variance of what you're multiplying in the first place.)

 

3) My comparison is between the ex ante returns of top solutions and the mean of the space.

Even if you can pick the top 1% of solutions with certainty, and the other 99% achieve nothing, then your selection is  only ~100x the mean.  And I'm skeptical we can pick the top 1%  in most cause areas, so that seems like an upper bound. E.g. in most cases (esp things like advocacy) I think there's more than a 1% chance of picking something net harmful, which would already take us out of the top 1% in expectation.

 

4) There are also major ways the data overstates differences in spread, like regression to the mean.

The data shows the top are ~10x the mean. If you were optimistic about getting a big multiplier on those, that maybe could get you to 1,000x. But then when we take into account regression to the mean, that can easily reduce spread another 10x, getting us back to something like 100x. 

That seems plausible but pretty optimistic to me. My overall estimate for top vs. mean is ~10x, but with a range of 3-100x.

 

5) 

>This also seems to potentially lead to biased comparisons between solution variance and cause level variance given how strongly differences in cause level variance are driven by expected value calculations (value of the future, etc.) that are far more extreme / speculative to what people comparing interventions on single interventions would have data on.

I agree estimates of cause spread should be regressed more than solution spread. I've tried to take this into account, but could have underestimated it.

In general I think regression to the mean is a very interesting avenue for developing a critique of core EA ideas.

Hey Ben, thanks for the replies -- adding some more to get closer to the same page 🙂

Re your 1), my criticism here is more one of emphasis and of the top-line messaging, as you indeed mention these cases of advocacy and research.

I just think that these cases are rather fundamental and affecting the conclusions very significantly -- because we are almost never in the situation that all we can choose from are direct interventions so the solution space (and with it, the likely variance) will almost always look quite different than what is discussed as primary evidence in the article (that does not mean we will never choose direct interventions, to be sure, just that the variance of solutions will mostly be one that emerges from the conjunction of impact differentials).
 

Re your 2), I think this is mostly a misunderstanding -- my comment was also very quickly written, apologies. 

I am not saying we should always choose the most leveraged thing ever, but rather that the solution space will essentially always be structured by conjunction of multipliers. There are reasons to not only choose the most leveraged solution, as you point out, but I don’t think this is enough to argue that the most effective actions will not usually be conjunctive ones.

I agree that the data in the article is useful for specifying the shape of a particular impact differential, I am mostly arguing that it understates the variance of the solution space.  

(I worry that we are mixing expected and realized value here, I am mostly talking about conjunctive strategies affecting how the variance of the solution space looks like on expected value, this does not preclude the realized value sometimes being zero (and that risk aversion or other considerations can drive us to prefer less leveraged actions.)).

Re your 3) & 4) I agree -- my understanding was that these are the factors that lead you to only 10x and my comment was merely that I think direct intervention space variance is not that informative with regards to solution selection in most decision contexts. 
Aside: I  agree with you that I don’t think that advocacy by itself is a 100x multiplier in expectation.


 

I'd also add it would be great if there was more work to empirically analyse ex ante and ex post spread among hits based interventions with multiple outcomes. I could imagine it leading to a somewhat different picture, though I think the general thrust will still hold, and I still thinking looking at spread among measurable interventions can help to inform intuitions about the hits based case.

One example of work in this area is this piece by OP, where they say they believe they found some 100x and a few 1000x multipliers on cash transfers to US citizens by e.g. supporting advocacy into land use reform. But this involves an element of cause selection as well as solution selection, cash transfers seem likely below the mean, and this was based on BOTECs that will contain a lot of model error and so should be further regressed. Overall I'd say this is consistent with within-cause differences of ~10x from top to mean, and doesn't support > 100x differences.

I agree that this would be great to exist, though it is likely very hard and the examples that will exist soon will not be the strongest ones (given how effects can become visible over longer time-frames, e.g. how OP discusses green revolution and other interventions that took many years  to have the large effects we can now observe). 

 

One small extra data point that might be useful: I made a rough estimate for smallpox eradication in the post, finding it fell in the top 0.1% of the distribution for global health, so it seemed consistent.

Some of these DCP cost-effectiveness estimates are terribly low:  few dollars per QALY, compared to GiveWell's evaluation of their top charities (on the order of $100/QALY). 

Even more surprisingly, looking into DCP3, the top 4 interventions had negative cost-effectiveness values.

This seems to me to be mostly because these cost-effectiveness analyses are from a decision-maker standpoint. Say, a hospital that can choose between different medications (e.g. for malaria, $4/DALY) or a governmental policy that can reduce overall health costs (e.g. reducing salt intake, reduction of $1.4k per DALY). 

I think it's mostly because these estimates aren't properly adjusted for regression to the mean – there's a ton of sources of model error, and properly factoring these in will greatly reduce the top interventions. There are also other factors like the top interventions quickly running out of capacity. I discuss this in the article. I put a lot more trust in GiveWell's figures as an estimate of the real marginal cost-effectiveness. Though I agree there could be some interventions accessible to policy-makers that aren't accessible to GiveWell.

Yea, I agree with your analyses in the article, though I'd be interested in understanding the relative effects

"First, I think a fairer point of comparison isn’t between best and worst but rather between the best measurable intervention and picking randomly. And if you pick randomly, you expect to get the mean effectiveness (rather than the worst or the median)."

I'm not sure if this is fair if you're trying to communicate the amount of value that could be created by getting more people to switch strategies.

Let's say everyone picks their strategy randomly. Then they read some information that suggests that some strategies are far more effective than others. Those who are already executing top-10% interventions conclude that they should stick with their current strategies, while some fraction of the other 90% are persuaded to switch. If everyone who switches strategies comes from that bottom-90% group, then the average change in value will look closer to 100x rather than 10x - because if you exclude the positive outliers then the mean will look much lower, and in fact closer to the median.

If you're trying to suggest that choosing the correct cause area is more important than choosing the correct strategy, because there's "only" a 10x value difference in choosing the correct strategy, I think you'd need to show why this mean-over-median approach is correct to apply to strategy selection but incorrect to apply to cause area selection. Couldn't you equally argue that regression to the mean indicates we'll make errors in thinking some cause areas are 1000x more important or neglected than others?

I agree different comparisons are relevant in different situations.

A comparison with the median is also helpful, since it e.g. tells us the gain that the people currently doing the bottom 50% of interventions could get if they switched.

Though I think the comparison to the mean is very relevant (and hasn't had enough attention) since it's the effectiveness of what the average person donates to, supposing we don't know anything about them.  Or alternatively it's the effectiveness you end up with if you pick without using data.

I think you'd need to show why this mean-over-median approach is correct to apply to strategy selection but incorrect to apply to cause area selection. Couldn't you equally argue that regression to the mean indicates we'll make errors in thinking some cause areas are 1000x more important or neglected than others?

Yes absolutely.

I think regression to the mean is a bigger issue for cause selection than solution selection. I've tried to take this into account when thinking about between-cause differences, but could have underestimated it.

Basically, I think it's easier to pick the top 1% of causes than the top 1% of solutions, and there's probably also greater variance between causes.

(One way to get an intuition for this is that only <0.001% of world GDP goes into targeted xrisk reduction or ending factory farming, while ~10% of world GDP is spent on addressing social issues in rich countries.)

Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 2m read
 · 
I speak to many entrepreneurial people trying to do a large amount of good by starting a nonprofit organisation. I think this is often an error for four main reasons. 1. Scalability 2. Capital counterfactuals 3. Standards 4. Learning potential 5. Earning to give potential These arguments are most applicable to starting high-growth organisations, such as startups.[1] Scalability There is a lot of capital available for startups, and established mechanisms exist to continue raising funds if the ROI appears high. It seems extremely difficult to operate a nonprofit with a budget of more than $30M per year (e.g., with approximately 150 people), but this is not particularly unusual for for-profit organisations. Capital Counterfactuals I generally believe that value-aligned funders are spending their money reasonably well, while for-profit investors are spending theirs extremely poorly (on altruistic grounds). If you can redirect that funding towards high-altruism value work, you could potentially create a much larger delta between your use of funding and the counterfactual of someone else receiving those funds. You also won’t be reliant on constantly convincing donors to give you money, once you’re generating revenue. Standards Nonprofits have significantly weaker feedback mechanisms compared to for-profits. They are often difficult to evaluate and lack a natural kill function. Few people are going to complain that you provided bad service when it didn’t cost them anything. Most nonprofits are not very ambitious, despite having large moral ambitions. It’s challenging to find talented people willing to accept a substantial pay cut to work with you. For-profits are considerably more likely to create something that people actually want. Learning Potential Most people should be trying to put themselves in a better position to do useful work later on. People often report learning a great deal from working at high-growth companies, building interesting connection
 ·  · 17m read
 · 
TL;DR Exactly one year after receiving our seed funding upon completion of the Charity Entrepreneurship program, we (Miri and Evan) look back on our first year of operations, discuss our plans for the future, and launch our fundraising for our Year 2 budget. Family Planning could be one of the most cost-effective public health interventions available. Reducing unintended pregnancies lowers maternal mortality, decreases rates of unsafe abortions, and reduces maternal morbidity. Increasing the interval between births lowers under-five mortality. Allowing women to control their reproductive health leads to improved education and a significant increase in their income. Many excellent organisations have laid out the case for Family Planning, most recently GiveWell.[1] In many low and middle income countries, many women who want to delay or prevent their next pregnancy can not access contraceptives due to poor supply chains and high costs. Access to Medicines Initiative (AMI) was incubated by Ambitious Impact’s Charity Entrepreneurship Incubation Program in 2024 with the goal of increasing the availability of contraceptives and other essential medicines.[2] The Problem Maternal mortality is a serious problem in Nigeria. Globally, almost 28.5% of all maternal deaths occur in Nigeria. This is driven by Nigeria’s staggeringly high maternal mortality rate of 1,047 deaths per 100,000 live births, the third highest in the world. To illustrate the magnitude, for the U.K., this number is 8 deaths per 100,000 live births.   While there are many contributing factors, 29% of pregnancies in Nigeria are unintended. 6 out of 10 women of reproductive age in Nigeria have an unmet need for contraception, and fulfilling these needs would likely prevent almost 11,000 maternal deaths per year. Additionally, the Guttmacher Institute estimates that every dollar spent on contraceptive services beyond the current level would reduce the cost of pregnancy-related and newborn care by three do