# 15

## Key Points

• The exponential distribution offers a statistical framework for evaluating the cost-effectiveness of global health interventions
• Shifting donations from interventions near the 50th percentile of effectiveness to groups near the 97th percentile of effectiveness can multiply the impact by at least 5 times

## The Exponential Distribution

I was reading an old essay by Toby Ord when I came across this striking graph that ranked 108 health interventions from the Disease Control Priorities in Developing Countries (DCP2) report:

Most obvious is the implication that interventions differ dramatically in their effectiveness with some interventions saving orders of magnitudes more DALYs (Disability Adjusted Life Years) than others.  As Ord points out,

"Moving money from the least effective intervention to the most effective would produce about 15,000 times the benefit, and even moving it from the median intervention to the most effective would produce about 60 times the benefit."

However, what also struck me was the graph's resemblance to a common statistical model: the exponential distribution.  The exponential distribution is a model used to describe the probability and impact of events that are usually benign but potentially dramatic like the costs of natural disasters, the largest single day declines for the Nasdaq, and the wealth of individuals.  In each of these cases, most of the events have values near 0 (most tornados cause very little damage, most single day stock market drops are small, most people make relatively little money) but a few events have extremely large values (Hurricane Katrina caused \$170 billion in damage, the largest Nasdaq drop was 12.32% in 2020, and Jeff Bezos is worth \$177 billion).

Exponential distributions are well defined by their averages.  For example, taking the mean of Ord's data to be 60 DALYs per \$1000 and generating 1000 sample data points according to an exponential distribution with an average of 60, we get this graph:

The blue bars represent how many times a value in its width was generated (i.e. there were about 175 of the 1000 sample data points with between 0 and 10 DALYs per \$1000) and the black line shows the expected density at that value based on the exponential distribution with an average of 60.  The key here is that just taking the average of Ord's data and generating new data based on what we would expect from the corresponding exponential distribution gives a graph that looks quite similar in shape and DALY scale to Ord's actual data from earlier.  (Note: Ord's scale plots a single horizontal bar for each data value while this approach plots a vertical bar depending on how many "interventions" had a certain effect.  However, the effect is the same: the few extremely cost-effective organizations stand out to the right while the majority of the not-very-cost-effective organizations clump to the left.)

Looking at more recent data, a similar pattern emerges.  Taking the DCP3 (2018) equivalents of Ord's earlier data and plotting them, we see a similar trend.

The blue bars represent the number of health interventions (out of the 94 evaluated) that produce DALYs in the range of the bar (i.e. 48 interventions produce between 0 and 0.005 DALYs per dollar) while the black line shows the expected distribution of interventions according to the exponential distribution with a mean of 0.016 DALYs per dollar (the mean of the data).  Admittedly, the fit is not as nice here, but with only 94 data points (versus the 1000 in the previous graph of example data), more fluctuation from the line is expected.

Interesting comparison, but who cares?

## The Memoryless Property

What makes the exponential distribution so powerful is a result known as the memoryless property.  This result says that the data points in an exponential distribution past some cut-off also follow an exponential distribution.  For example, taking the exponential distribution with a mean of 0.016 from earlier, the proportion of all interventions between 0 and 0.05 DALYs/dollar is the same as the proportion of interventions better than 0.05 DALYs/dollar that are between 0.05 and 0.1 DALYs/dollar.  The proportions are the same; the interval is just shifted up in the subset.  In other words, every subset of an exponential distribution has the same shape as the original distribution.

Assuming that health intervention effectiveness follows an exponential distribution, this says that shifting your investment from the median intervention (in terms of effectiveness) to the 75th percentile intervention does as much good as shifting your investment from the 75th percentile to the 87.5th percentile and so on.  In the real world, you'll eventually run out of interventions to shift your investment to, but until then, the differences can be dramatic.  For example, taking the DCP3 data's approximate exponential distribution, shifting donations from groups near the 50th percentile of effectiveness to groups near the 97th percentile of effectiveness can multiply the impact by at least 5 times.  Applying the same process to Orb's data shows even more dramatic differences.

## Caveats

A few caveats worth mentioning:

1. The exponential model seems to fit the DCP2 and DCP3 data pretty well but potentially underestimates how many very ineffective interventions and how many very effective interventions there are.  That is, the exponential model might suggest there are more interventions in the middle than there actually are.  This difference would make shifting investments from low-effectiveness to high-effectiveness even more useful but would limit the use of the exponential model to describe interventions.
2. The underlying data in both DCP reports is somewhat sparse and unexpected.  For example, some interventions use volunteers and therefore have no labor costs while others give DALYs only in vague "expert estimates."  This results in seemingly strange results like voluntary male circumcision being up to seven times more cost effective than malaria prevention (sprays, nets, and insect control).

## Takeaways

• The exponential distribution can model the cost-effectiveness of health interventions
• Properties of the exponential distribution show that shifting investments (especially between good interventions and very good interventions) can have dramatic effects

# 15

New Comment

I like this kind of observation. :) I think the claim that, broadly speaking, the few most cost-effective interventions are much, much more cost-effective than the bulk of 'typical' interventions - also in contexts other than global health - is an important and sometimes underappreciated 'foundational assumption' of EA. See also my notes here.

One question I have: why should I believe that the distribution you're describing "is" an exponential distribution? Indeed, in your introductory paragraph ("However, what also struck me ...") one could swap out 'exponential distribution' with 'log-normal distribution' or 'Pareto distribution'/'power law' everywhere, and it would still be true! :)

In fact, I think a more typical EA reaction to the DCP2 and other data is: look, a log-normal distribution! Or even: look, a power law!

These distributions would suggest even stronger returns to identifying top interventions: as you say, due to the memoryless property, an exponential distribution says that, no matter the 'cutoff', shifting from the median to the 75th percentile is as good as shifting from the 75th percentile to the 87.5th one. But for a log-normal distribution the second shift would have even larger returns than the first one. And that difference would be even more pronounced for a power law. (Or at least I think so - I haven't actually done the maths.)

(There are also some subtle complications on how to interpret exponential distributions when modeling only the 'tail' of some distribution beyond some positive cutoff. - Again unless I messed up the maths, which I wasn't super careful about ...)

FWIW, my current impression is:

• For data that looks reasonably 'heavy-tailed', we can get a good fit using a variety of different distributions: exponential, Weibull, log-normal, Pareto/power law, ...
• We need to do somewhat sophisticated statistics to 'distinguish' between these distributions - and often such tests will simply be inconclusive based on 'brute empiricism'/curve fitting alone.
• Therefore, in the absence of a mechanistic theory of the causal origins of the phenomenon we're examining (from which we might be able to derive a particular distribution), it is actually somewhat unclear what we mean when we say "X has an exponential distribution". Often we could just as well say "X looks like a log-normal" etc. If it's just about describing data we've seen we might therefore be better off saying something like "this looks quite heavy-tailed". Claims involving specific distributions have the downside that they're claims both about the data we've seen and about what we should expect when we extrapolate beyond the range of observed data; but often we can't know based on observations how to extrapolate (since it is precisely here where the differences between e.g. exponentials and log-normals becomes highly relevant). And so we shouldn't imply that we can.

(Some related discussion in this appendix of the research Ben Todd and I discuss here.)

Thanks Max, you make a good point about differentiating between exponentials, Paretos, and log-normals.  It does seem like log-normals are the norm when it comes to these skewed distributions, especially with things like world income.  Still, keeping an open mind as to which of the skewed distributions best fits the data can hopefully make these models more robust.

You mentioned the challenges of distinguishing between these heavy-tailed distributions, and I would only add that this challenge increases when viewing these outcomes as intervals rather than points.  For the sake of creating graphable data here, I only used the midpoint of the ranges listed on DCP3, but some of the intervals did span orders of magnitudes.

Finally, I'm not sure exactly what you mean about the complications of interpreting exponential distributions beyond some cutoff, but if the question was about applying the memoryless property to exponentials (and the tail only depending on the rate parameter), there's a short derivation on Wikipedia.  Again, not sure if that's what you were getting at, but maybe it'll clear things up.