**Summary:** Cost-effectiveness is __believed__ to follow a heavy-tailed distribution. But we don’t know how heavy the tail is. E.g. it could be lognormally distributed (which is quite heavy tailed) or power law distributed (which is more heavy tailed). This information could help estimating the cost-effectiveness of future interventions and inform the __explore-exploit__ tradeoff. This post delves into:

- The background and motivation for our work
- Our future research plans, including how we plan to leverage concepts such as entropy to gain insight

**About this post**

This post is not a full-fledged piece of SoGive research, but more akin to a research proposal or pre-publication notice. We plan to do this more often with our work so that we can get useful feedback before we have done too much work, to tackle publication bias, and so that when work gets started and isn’t finished, others can be aware that some unfinished thoughts are available.

This post was written with help from GPT-4.

Thanks to other members of the SoGive community for their support on this work, notably Vasco Grilo for modelling the multiplier on the 99.9th percentile and to Rebca van de Ven for earlier work on this topic.

**Background**

Recent __work__ by Ben Todd suggests that the way impact is distributed between charities likely follows a heavy-tailed distribution. However the exact form of this distribution—lognormal, power law, or something else—remains unclear.

**What do we mean by the distribution of impact or cost-effectiveness?**

If you take everything in a large reference class, e.g. charities or impact investing opportunities or people, and assess the cost-effectiveness or impact or talent of each, what does that distribution look like?

**Motivation**

Lognormal and power law distributions look quite different in the tails. The tails matter because EA research seeks to find the very best interventions, not just those in the middle of the distribution.

To express this in more concrete terms, imagine that the EA community has already invested a certain amount of effort in exploring some charitable interventions and we have already found some which look high impact compared to others. How likely do we think it is that further research will yield something that is even better?

Although the previous paragraph referenced the split of impact between charitable interventions, the research we plan to do is abstract enough that it could be applied to other things too, e.g. the split of impact between impact investing opportunities, or the split of __talent__ between people.

When considering the explore-exploit trade-off, two things are needed for the “explore” option (i.e. further research) to be favoured: (1) for the underlying distribution to *actually* be heavy tailed (2) for us to be able to *identify* the high impact things based on our research capabilities. Item (1) is about the “territory” and item (2) is about our ability to “map” it. To manage the scope of our research, we are only focusing on (1).

We believe that the __work__ done by Ben Todd adds a lot of value by examining relevant data. However data, by its nature, is not good at capturing the tails. In order to complete the Bayesian picture, our aim is to capture the priors as well, which we believe should give relevant insight about the tails.

We will focus on the lognormal and power law distributions, since those two distributions seem to have been mentioned the most in the literature that we’ve seen thus far.

The difference between a lognormal and power law distribution is material. To illustrate this, imagine you have a dataset which doesn't include much data about extreme values (as is normal for datasets). If you extrapolate using either a power law or lognormal assumption, the results can differ significantly. Our __calculations__ suggest that if the dataset captured data up to the 99th percentile, then an estimate of the 99.9th percentile value using a power law distribution would be 189 times larger than the estimate of the 99.9th percentile value using a lognormal distribution.^{[1]} This difference is due to the higher kurtosis of the power law distribution.

**Our next steps**

We plan to explore the following:

- Whether the distribution of impact between charities seems more likely to have the features associated with lognormal distributions or power law distributions
- If something is the product of lots of similar underlying processes multiplied together, then we would expect to have a lognormal distribution.
- If we have reason to believe that something should be scale-invariant (i.e., like a fractal it (in some sense) “looks the same” no matter how much you zoom in or out), then we would expect it to have a power law distribution.

- How the concept of entropy might help us: a good prior for a distribution is (at least some of the time) one with high entropy. We will review both information theoretic conceptions of entropy and thermodynamic conceptions of entropy.
- We have encountered a paper which employs a thermodynamics concept of entropy (
__Kafri 2016__), and this notion of entropy appears to lead to a power law distribution. - However it appears that applying an information-theoretic approach to entropy could lead to a wide range of possible probability distributions, depending on which constraints are most appropriate. This is described in
__this paper__by Keith Conrad and illustrated with the long list of maximum entropy probability distributions listed on the__relevant wikipedia page__.

- We have encountered a paper which employs a thermodynamics concept of entropy (
- As well as the academic literature mentioned above, we will also be cross-referencing against relevant literature from within the EA community:
- Benjamin Todd’s aforementioned
__article__that analysed the distribution of the cost-effectiveness of different interventions across several problem areas. - Max Daniel and Benjamin Todd’s
__post__on how much performance varies amongst people. - A
__piece by Stijn__which also referred to this topic.

- Benjamin Todd’s aforementioned

In addition, we are aware of the following relevant literature:

__Clauset et al 2009__explores why power law distributions appear in empirical data, and also seems quite relevant to our research question- We have skimmed
__this Terry Tao blog post__and found it interesting. It was also useful because the comments to that post led us to__Kafri 2016__, which may end up containing some of the central ideas in our research. We found out about it because Max Daniel referenced it on the EA Forum. __Newman 2005__clearly explains a number of key concepts relating power law distributions__Sneppen and Newman 1996__build an interesting model. It consists of N “agents”, which could be grains of sand in a sandpile, or species in an evolving ecosystem, and those agents are subject to “stresses” η(t) at each time-step t. Each agent possesses a threshold of tolerance, above which it does out or moves on. The paper shows that this model has power law outcomes. It appeared to refer to the concept of__critical phenomena__, however on more careful reading it appears that the root cause of the power law distributions seen in these models is not related to critical phenomena, so we don’t plan to focus on critical phenomena.- Max Daniel also has lots of other very useful content in a
__shortform__which predates his work on the distribution of talent, which is essentially a fairly useful linkdump in its own right.- This includes multiple useful sources, such as Brian Tomasik’s early
__comment__on the topic, Kokotajlo and Oprea’s 2020__paper__(which Vasco has also mentioned on the EA Forum recently), and Tobias Baumann’s suffering-focused__exploration__of this question.

- This includes multiple useful sources, such as Brian Tomasik’s early
- We learned from Max’s shortform that Linch
__started some work__on this topic, but (as far as we’re aware) it’s still incomplete. It seems he was leaning away from the power law distribution being likely. - This article about
__Zipf’s law for Atlas models__appeared, at first glance, to explain why Zipfian or power law distributions are so widespread. However, it now looks like it explains why one of the parameters in a power law distribution is likely to be one, which still begs the question of why it’s a power law in the first place. For this reason, we plan to deprioritise this article.

We would love to hear of any other potentially relevant literature, or ideas that may be worth exploring!

^{^}We generated a Pareto distribution with 10

^{3}points up to the 99th percentile. We then fitted a lognormal distribution to the generated Pareto distribution. We then took points at the 99.9th, 99.99th and 99.999th percentile for the Pareto and fitted lognormal distribution. Taking the value of the Pareto as a fraction of the fitted lognormal at these percentiles gives us 189×, 1400× and 10760× respectively.

Nice work, Michael! I think heavy-tailedness of the cost-effectiveness across causes, and across interventions within causes is a core premise of effective altruism, so it is great to see this being analysed.

(Maybe I'm stating the obvious here, but...)

Nice start. This will simply be useful for DOING investigations of causes and interventions, especially new ones with limited credible evidence. It will be helpful to have an informative prior about the impactfulness about that class of interventions.

Ideally, I would use this prior in a Bayesian framework, in conjunction with the evidence presented, to generate a posterior about the impact, and the cost-effectiveness of the proposed intervention.

Without a sense of what the prior should look like, this proves extremely challenging

Really excited about this line of research, and it's also very cool that you are publishing this research plan overview in advance :)

Some random thoughts: