Aerospace Engineer

Working (0-5 years experience)

I've been a lightly-involved EA for several years now, having taken the path of working as an aerospace engineer and donating a portion of my income. I'm interested in getting more deeply involved in the EA community and in contributing to some of the big ongoing discussions in the movement. I'm fairly cause-agnostic, but I especially enjoy discussing utilitarian ethics, quantitative modeling, and forecasting.

5

The Case for Funding New Long-Term Randomized Controlled Trials of Deworming

GiveWell's 2021 metrics report is out! Funding distributed to deworming increased greatly last year, from $15,699,622 to $44,124,942. Rerunning the model with the higher 2021 funding levels, the mean estimate of the value created by a replication study increases to approximately 370,000 GiveWell value units per year. This is equivalent to the value created by $18.5 million/year in donations to organizations with a cost-effectiveness of 6x GiveDirectly's.

In general, as EA-related organizations distribute more money per year, the value of information is naturally going to rise. So this kind of replication work will only get more important.

Targeted Treatment of Anemia in Adolescents in India as a Cause Area

This is really interesting, and something I hadn't thought about before. Doing a quick literature search, there is also previously existing evidence that high levels of dietary iron may impart a diabetes risk. So the effects seen in the paper don't seem crazy, but I did come out of this with a couple of questions/comments.

- Are the new lower estimates of anemia rates in India just due to changing cutoffs, or are they also because of existing supplementation/fortification programs working? If programs switched form mass supplementation to screen-and-treat, would they end up still giving a large fraction of the population supplements?
- The confidence intervals reported in the preprint seem a tiny bit suspicious to me, given that the lower bounds are all between 1.001 and 1.01. Sometimes that's just what comes out of the analysis, but it's also what you'd expect to see if the authors had been p-hacking.

Grantees: how do you structure your finances & career?

The only real job security is to have marketable skills. Eternal perfect job security is extremely rare in the USA—I can’t think of anyone but tenured professors who have that. If you work at a startup, the startup could go under. If you work at a big firm, there could be layoffs. Etc.

One difference between being an employee vs being a grant recipient/independent contractor is that employees get unemployment insurance in the US, while contractors don't (with the exception being the expanded pandemic unemployment benefits in 2020). While it's true that there's no such thing as perfect job security, you do get more of a built-in cushion as an employee.

The Case for Funding New Long-Term Randomized Controlled Trials of Deworming

Yeah that's a cool idea to have an org that specifically focuses on replication work. I think that if you fleshed out the modeling done here, you could pretty confidently show funders that it would be a cost-effective use of money to do this more widely.

Thanks so much for taking the time to read the post and for really engaging with it. I very much appreciate your comment and I think there are some really good points in it. But based on my understanding of what you wrote, I’m not sure I currently agree with your conclusion. In particular, I think that looking in terms of minimum detectable effect can be a helpful shorthand, but it might be misleading more than it’s helping in this case. We don’t really care about getting statistical significance at p <0.05 in a replication, especially given that the primary effects seen in Hamory et al. (2021) weren’t significant at that level. Rather, we care about the magnitude of the update we’d make in response to new trial data.

To give a sense of why that’s so different, I want to start off with an oversimplified example. Consider two well-calibrated normal priors, one with a mean effect of 10 and standard deviation of 0.5, and one with a mean effect of 0.2 and the same standard deviation. By the simplified MDE criterion, a trial with a standard error of 3.5 would be required to detect the effect at p <0.05 80% of the time in the first case and a trial with a standard error of 0.07 would be required to detect the effect at p <0.05 80% of the time in the second case. But we would update our estimate of the mean by the same amount in the second case as in the first case if new trial data came in with a certain standard error and difference between its mean estimate and our prior mean. (The situation for deworming is more complex because the prior distribution is probably truncated at around zero. But I think the basic concept still holds, in that the sample size required to keep the same value of new information wouldn’t grow as fast as the sample size required to keep the same statistical power.)

Therefore, I don’t think the required sample size is likely to be nearly as big as you estimated in order to get a valuable update to GiveWell’s current cost-effectiveness estimate. However, your point is clearly correct in that the sample size will need to increase to handle the worm burden effect. That was something I hadn’t thought about in the original post, so I really appreciate you bringing it up in your comment. According to GiveWell, the highest-worm-burden regions in which Deworm the World operates (Kenya and Ogun State, Nigeria) have a worm burden adjustment of 20.5%. A replication trial would likely need to be substantially larger to account for that lower burden, but I don’t think that increase would be prohibitively large.

Regarding the replicability adjustment, I’m not sure it implies that a larger sample size would be needed to make a substantial update based on new trial data (separate from the larger sample needed to handle the worm burden effect). The replicability adjustment was arrived at by starting with a prior based on short-term effect data and performing a bayesian update based on the Miguel and Kremer followup results. If the follow-up study has the same statistical power as M&K, then the two can be pooled to make the update and they should be given equal weight.

Thinking about it qualitatively, if a replication trial showed a similar or greater effect size than Hamory et al. (2021) after accounting for the difference in worm burden, I would think that would imply a strong update away from GiveWell’s current replicability adjustment of 0.13. In fact, it might even suggest that deworming worked via an alternate mechanism than the ones considered in the analysis underlying GiveWell’s adjustment. On the flip side, I don’t think that GiveWell would be recommending deworming if the Miguel and Kremer follow-ups had found a point estimate of zero for the relevant effect sizes (the entire cost-effectiveness model starts with the Hamory et al. numbers and adjusts them). So if a replication study came in with a negative point estimate for the effect size, GiveWell should probably update noticeably towards zero.

Zooming out, I think that information on deworming’s effectiveness in the presence of current worm burdens and health conditions would be very valuable. GiveWell has done an admirable job of trying to extrapolate from the Miguel and Kremer trial and its follow-ups to a bunch of extremely different environments, but they’re changing the point estimate by a factor of ~66 in doing so. To me, that implies that there’s really tremendous uncertainty here, and that even imperfect evidence in the current environment would be very useful. Since deworming is so cheap, I’m particularly worried about the case where it’s noticeably more effective than GiveWell is currently estimating, in which case EA donors would be leaving a big opportunity to do good on the table.

Thank you again for taking the time to read the post!