I research technology and innovation in developing countries. I'm always interested in chatting with other people working in global health and wellbeing research!
Alternatively, worldview diversification can be understood as an attempt to approximate expected value given a limited ability to estimate relative values. If so, then the answer might be to notice that worldview-diversification is a fairly imperfect approximation to any kind of utilitarian/consequentialist expected value maximization, and to try to more perfectly approximate utilitarian/consequentialist expected value maximization. This would involve estimating the relative values of projects in different areas, and attempting to equalize marginal values across cause areas and across years.
I think this is the interpretation of worldview diversification that makes the most sense, and I see two major difficulties with your alternative:
The total funding amounts are not constant over time, so it doesn't make sense to equalize marginal values across time. In a year where funding decreases (e.g. FTX) all marginal values should be higher than in the previous year. And since there's no clearcut relationship between total funding and marginal values, I don't see any way to set the marginal values to be consistent across years, at least not with the information we have.
More importantly, a relatively fixed funding pool (or at least a non-shrinking one) is kind of a requirement for a healthy ecosystem of projects and organizations. If the animal welfare funding bucket bounced between $100 million and $20 million and $300 million year to year, it would be really hard to actually get any animal welfare organizations up and running for an extended time.
The central limit theorem is exactly that which implies what I said. The noise is not on the log scale because of the CLT.
Now, if you transform your coefficient into a log scale then all bets are off. But that is not happening throughout this post. And it's not really what happens in reality either. I don't know why anyone would do it.
In general I think it's not crazy to guess that the standard error of your measurement is proportional to the size of the effect you're trying to measure
Take a hierarchical model for effects. Each intervention has a true effect , and all the are drawn from a common distribution . Now for each intervention, we run an RCT and estimate where is experimental noise.
By the CLT, where is the inherent sampling variance in your environment and is the sample size of your RCT. What you're saying is that has the same order of magnitude as the variance of . But even if that's true, the standard error shrinks linearly as your RCT sample size grows, so they should not be in the same OOM for reasonable values of . I would have to do some simulations to confirm that, though.
I also don't think it's likely to be true that has the same OOM as the variance of . The factors that cause sampling variance - randomness in how people respond to the intervention, randomness in who gets selected for a trial, etc - seem roughly comparable across interventions. But the intervention qualities are not roughly comparable - we know that the best interventions are OOMs better than the average intervention. I don't think we have any reason to believe that the noisiest interventions are OOMs noisier than the average intervention.
(I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
I'm not sure what you mean by this, I think any collection of RCTs satisfies the setting I've laid out.
Fun read! A point like this gets made every so often on the Forum, and I feel like a one-trick pony because I always make the same response, which is preempted in your question 1: these results rely heavily on the true spread of intervention quality being of the same order of magnitude as your experimental noise. And when intervention quality has a fat tailed distribution, that will almost never be true. If the best intervention is 10 SD better than the mean, any normally distributed error will have a negligible effect on our estimates of its quality.
And in general, experimental noise should be normal by the central limit theorem, so I don't know what you mean by "experimental noise likely has fatter tails than a log normal distribution".
Great argument. My guess for why this isn't common based on a little experience is that the decision is usually sequential. First you calculate a sample size based on power requirements, and then you fundraise for that budget (and usually the grantmaker asks for your power calculations, so it does have to be sequential). This doesn't inherently prevent you from factoring intervention cost into the power calculations, but it does mean the budget constraint is not salient.
I wouldn't be too surprised ex ante if there are inefficiencies in how we do randomization. This is an area with quite active research, such as this 2022 paper which proposes a really basic shift in randomization procedures and yet shows its power benefits.
This paper looks like the most promising intervention to improve air quality that I have seen. The authors even have an explicit cost-effectiveness analysis. I would love to see it critically evaluated. (I took a stab at doing so here.)