Hide table of contents

Abstract: This essay argues that the evidence supporting GiveWell’s top cause area – Seasonal Malaria Chemoprevention, or SMC – is much weaker than it appears at first glance and would benefit from high-quality replication. Specifically, GiveWell’s assertion that every $5,000 spent on SMC saves a life is a stronger claim than the literature warrants on three grounds: 1) the effect size is small and imprecisely estimated; 2) co-interventions delivered simultaneously pose a threat to external validity; and 3) the research lacks the quality markers of the replication/credibility revolution. I conclude by arguing that any replication of SMC should meet the standards of rigor and transparency set by GiveDirectly, whose evaluations clearly demonstrate contemporary best practices in open science.

1. Introduction: the evidence for Seasonal Malaria Chemoprevention

GiveWell currently endorses four top charities, with first place going to the Malaria Consortium, a charity that delivers Seasonal Malaria Chemoprevention (SMC). GiveWell provides more context on its Malaria Consortium – Seasonal Malaria Chemoprevention page and its Seasonal Malaria Chemoprevention intervention report. That report is built around a Cochrane review of seven randomized controlled trials (Meremikwu et al. 2012). GiveWell discounts one of those studies (Dicko et al. 2008) for technical reasons and includes an additional trial published later (Tagbor et al. 2016) in its evidence base. 

No new research has been added since then, and GiveWell’s SMC report was last updated in 2018. It appears as though GiveWell treats the question of “does SMC work?” as effectively settled.

I argue that GiveWell should revisit its conclusions about SMC and should fund and/or oversee a high-quality replication study on the subject. While there is very strong evidence that SMC prevents the majority of malaria episodes, “including severe episodes” (Meremikwu et al. 2012, p. 2), GiveWell’s estimate that every $5,000 of SMC saves a life in expectation is shaky on three grounds related to research quality: 1) the underlying effect size is small, relative to the sample size, and statistically imprecise; 2) SMC is often tested in places receiving other interventions, which threatens external validity because we don’t know which set of interventions bests maps onto the target population; and 3) the evidence comes from studies that are pre-credibility revolution, and therefore lack quality controls such as detailed pre-registration, open code and data, and sufficient statistical power.

2. Three grounds for doubting the relationship between SMC and mortality

2.1 The effect size is small and imprecisely estimated

Across an N of 12,589, Meremikwu et al. record 10 deaths in the combined treatment groups and 16 in the combined control groups. Subtracting the one study that GiveWell discounts and including the one they supplement with, we arrive at 10 deaths for treatment and 15 for control. As the authors note, “the difference was not statistically significant” (p. 12), “and none of the trials were adequately powered to detect an effect on mortality…However, a reduction in death would be consistent with the high quality evidence of a reduction in severe malaria” (p. 4).[1]

Overall, the authors conclude, SMC “probably prevents some deaths,” but “[l]arger trials are necessary to have full confidence in this effect” (p. 4).

As a benchmark, a recent study on deworming (N = 14,172) estimates that deworming saves 18 lives per 1000 childbirths, versus about 0.4 for SMC.

GiveWell forthrightly acknowledges this "limited evidence" on its SMC page, and explains why it believes SMC reduces morality to a larger degree than the assembled studies suggest directly. This is laudably transparent, but the question is foundational to all of GiveWell's subsequent analyses of SMC. Especially given the organization's strong funding position, GiveWell should devote resources towards bolstering that limited evidence through replication.

2.2 It’s unclear which studies map directly to the target population

Of the seven studies analyzed by Meremikwu et al., four test SMC in settings where both treatment and control samples are already receiving anti-malaria interventions. Two studies test SMC along with “home-based management of malaria (HMM)” while two others test SMC “alongside ITN [insecticide treated nets] distribution and promotion” (p. 9).

GiveWell’s SMC intervention report notes that SMC + ITN trials found “similar proportional reduction in malaria incidence to trials which did not promote ITNs.” This finding is useful and interesting, but not does not self-evidently help us estimate the effects of just SMC on mortality, which is the basis of GiveWell’s cost-benefit analyses. To make the leap between the four studies that include co-interventions and those that don’t, we need an additional identifying assumption about external validity, such as:

  • any interaction effect between SMC and co-interventions is negligible or negative, which makes these estimates minimally or downwardly biased;
  • the target population will have a mix of people receiving ITNs, HMM, or neither, and, therefore, we should aggregate these studies to mirror the target population.

GiveWell does not take a position on this. The SMC intervention report says that the organization has “not carefully considered whether Malaria Consortium’s SMC program is operating in areas where ITN coverage is being expanded.” It does not mention HMM.

If we only look at the two studies that estimate the relationship between just SMC and mortality, we see that five children died in the combined control group (N = 1,139), while four died in the combined treatment group (N = 1,122). Every death of a child is a tragedy; but this difference is not a strong basis for determining where the marginal dollar is most likely to save a life, and we are always triaging

This issue merits more careful attention than it currently receives from GiveWell. At a minimum, the SMC intervention page might be amended to note GiveWell’s position on the relationship between co-interventions and external validity. More broadly, an SMC replication could have multiple treatment arms to tease out the effects of both SMC and SMC + co-interventions.

2.3 The provided studies lack the quality markers of the credibility revolution

As Andrew Gelman puts it, “What has happened down here is the winds have changed;” as recently as 2011, “the replication crisis was barely a cloud on the horizon.” In the ten years since Meremikwu et al. was published – as well as in the six years since Tagbor et al. (2016) – we’ve learned a lot about what good research looks like. We’ve also learned, as described in a recent essay by Michael Nielsen and Kanjun Qiu, that studies meeting contemporary best practices – detailed pre-registration, “large samples, and open sharing of code, data and other methodological materials” – are systematically more likely to replicate successfully.

The studies cited by GiveWell in support of SMC do not clearly meet these criteria. 

  • While the original trials are large enough to detect an effect on incidence of malaria, for effects on mortality, “the trials were underpowered to reach statistical significance” (Meremikwu et al. 2012, p. 2).
  • The code, data, and materials are not publicly available (as far as I can tell);
  • These studies were indeed preregistered (e.g. here and here), but not in ways that would meaningfully constrain researcher degrees of freedom.[2] 

This isn’t to say they won’t replicate if we run them again using contemporary best practices. But given the literally hundreds of millions of dollars at stake, let’s verify rather than assume. 

3. Conclusion

One of the unsettling conclusions of the replication revolution is that when studies implement stringent quality standards, they’re more likely to produce null results.[3]

As it happens, GiveDirectlyformerly one of GiveWell’s top charities, already has evaluations that meet the highest standards of credibility. Haushofer and Shapiro (2016), for instance, have a meticulous pre-registration plan, large sample sizes, and publicly available code and data; they also “hired two graduate students to audit the data and code” for reproducibility (p. 1977). A subsequent evaluation by the same authors found more mixed results: some positive, enduring changes but also some negative spillovers within treated communities. But GiveDirectly was much more likely to generate null and contradictory findings because its evaluations were so carefully done.

GiveWell argues that it only recommends charities that are at least 10X as effective as cash. Right now, that comparison is confounded by large differences in research quality between GiveDirectly’s evaluations and those supporting SMC. 

GiveWell can remedy this by funding an equally high-quality replication for SMC – and then, ideally, for each of its top cause areas.

Thanks to Alix Winter and Daniel Waldinger for comments on an early draft.

  1. ^

     In the text, the Ns accompanying the mortality numbers are slightly different because not all studies recorded deaths (for those that did, N = 9533). I am assuming that if any deaths had occurred in those studies, they would have mentioned it as a matter of data collection/attrition.

  2. ^

     This is a complicated subject on which many people have weighed in (e.g. hereherehere and here). For starters, here is an example of a pre-registration plan that meaningfully constrains its authors.

  3. ^

     I learned this first-hand when contributing to two meta-analyses.

84

0
0

Reactions

0
0

More posts like this

Comments17
Sorted by Click to highlight new comments since: Today at 4:19 PM

Thanks for your entry!

Thanks for these thoughts!

A question: How large do you expect the effects of such a replication to be? Maybe you could estimate "a study of cost would lead to a change if effect size of with probability " for some instances of . That would help to estimate whether the study would, in expectation, be worth more than one life saved per 5000 dollars.

And an observation: I think it would be very difficult to get ethical approval for such a study. SMC is (according to current knowledge) an amazing intervention. Any controlled trial would require a control group that does not receive SMC, nor other interventions that could act as confounding factors. Think about it... you'd expect the study to cause ~10 additional preventable child deaths in the control group, just so it can measure an effect! It might be more feasible to make comparison studies between different types of SMC, but of course these don't directly answer your question.

I made an attempt to estimate the cost-effectiveness of replicating research on Deworming in a previous post. There's especially large uncertainty in the Deworming's effect size,  so I doubt you'd get as big an effect for SMC. But I think a similar Bayesian modeling approach could for this! 

Thanks, I look forward to checking it out! I haven't really followed the worm wars since like 2015 (I was in grad school at the time and a professor in the department wrote something about it that I liked a lot: http://www.columbia.edu/~mh2245/w/worms.html)and I would enjoy jumping back in, time permitting..but I actually just came down with covid so I think it's time to take a rest 😃 

Isn't that an objection to any RCT of treatments that have been shown to work in some contexts?

Yes, absolutely.

As far as I can tell, that type of RCT indeed is not being done. I don't know much about research on SMC specifically, but Givewell reports the following quote of Christian Lengeler, author of Cochrane Review of insecticide-treated bed nets:

To the best of my knowledge there have been no more RCTs with treated nets. There is a very strong consensus that it would not be ethical to do any more. I don't think any committee in the world would grant permission to do such a trial.

That's fascinating, the norm is extremely different in economics and I have never heard of this norm. What is the boundary between a necessary replication and something that would be considered unethical?

Hi, and thanks for giving this a close read!

I considered providing an estimate like the one you suggest, but shied away for two reasons:

  1. I am not a subject matter expert and I don’t have a good sense of what the effect size would be — as GiveWell notes, across all seven studies, mortality in both groups is lower than you expect, so there’s some disconnect between theory and empirics here that I/we lack context on;

  2. the expected value of a new finding hinges on equilibrium effects that I can’t really get a handle on. Let’s say that GiveWell finds smaller effects than they expect and then shifts a different charity to be #1. Is that intervention’s evidence really solid, or should that intervention also be closely re-examined and then replicated? I do not know; if I had had more time I would have like to do this type of analysis for the other three interventions as well.

My hope is that if I help point GiveWell in the right direction, people who are more experienced at cost-benefit analysis can take it from there. My comparative advantage is reading RCTs and meta-analyses.

As to the ethical concerns — that depends on whether the control group is likely to have received an anti-malaria treatment in the absence of an intervention, i.e. the point I made in section 2. If everybody is receiving bed nets anyway, let's study that population.

That seems fair. I agree that my request for an estimate is a big, maybe even unreasonable, request.

I asked because I am wondering if there really is enough reason to doubt the results of existing SMC trials. If I understand your post correctly, your main worry is not about actual errors in the trials; we don't have concrete reasons to believe they are wrong. Indeed, the trials provide high-quality evidence that SMC reduces malaria cases, including severe cases.

Your worries seem to be that (1) studies are underpowered to quantify reduction in malaria deaths. I'm not sure if that is a big problem, given that there are clear causal links between malaria cases and malaria deaths. (2) The trials did not follow the new best practices that we've identified since they were published. This indeed makes the trials less reliable than we would wish for, but I'm not sure whether the problem extends to a meta-analysis of seven trials.

For all these reasons, I keep wondering: how strongly do you really believe these results are wrong? And by how much? Even some rough answer would be OK here... and I'm sure it would also help GiveWell when they evaluate this post.

Hi Sjlver

I've been thinking about this and I think you're right, I  do believe that running this replication trial passes a cost-benefit test, and I should try to explain why.

how strongly do you really believe these results are wrong? And by how much?

I think there's a 50% chance that a perfectly done SMC replication would find mortality effects that are statistically indistinguishable from a null, for two reasons: 1) the documented empirical effects are strange and don't gel with our underlying theory of malaria; 2) our theory also conflicts with the repeated observation that people living in extreme poverty don't seem to take malaria as seriously as outsiders do, which is prima facie evidence that we're misunderstanding something big. 

  • My essay's thesis is that SMC's underlying RCT evidence, which is the foundation of GiveWell's cost-benefit analysis, is weaker than it appears at first glance.  
  • Does the use of meta-analysis somewhat or largely obviate this problem? In my opinion, no, aggregation does not paper over structural issues in the data generation process. 
    • One of the most striking things my co-authors found when meta-analyzing the contact hypothesis literature was the gap in effect size between studies that had a pre-analysis plan (d = 0.016) and those that didn't (d = 0.451).  This obviously isn't dispositive that there's "no there there" with intergroup contact; but when subsequent high-quality studies on the subject found much more mixed results (e.g. here and here), at the very least, we can say we had a warning sign.
  • Can we supplement evidence that SMC reduces malaria cases with other putatively causal[1] evidence that intervening to reduce malaria leads to a sizeable reduction in deaths? 
    • That depends on how seriously we take the argument that most published research findings are false. I myself take this very seriously, and I basically treat all research as provisional until it's been validated through a seriously well-identified study. 
  • I'm not saying that we don't know that malaria causes deaths -- we definitely know that people die of malaria.  But why did the SMC studies find much smaller overall mortality effects, in both treatment and control,  than expected? 
    • Cissé et al. (2006) studied a region "where the mortality rate for children under 5 years of age is 40 deaths per 1000 children per year. Malaria accounts for about a quarter of deaths in those aged 1–5 years" (p. 660). 
    • That study was on the very low end for "entomological inoculation rates (infective bites per person per year)" (Meremikwu et al. 2012 p.8), which ranged from  10 (Cisse  2006) to 173 bites (Konate 2011).
    • So let's just take Cissé et al.'s estimate as a  conservative baseline, and say that if you study 12,589 children in endemic/hyperendemic regions for a year,[2]  you should expect about 500 deaths overall, with 125 of them attributable to malaria.
    • Instead, we get  26 deaths overall. 
    • To my eyes, this looks like a serious disconnect between theory and empirics, and it's repeated across many settings. Frankly I have no idea what's going on.  Am I misunderstanding something fundamental here or have I made a mistake? What does GiveWell make of this? I take it you work for AMF -- what do you make of it? 
    • Back to my experience with the contact hypothesis:  I treat "something weird that we don't understand in the published findings" as a warning bell. So, personally, I think there is at least a 50% chance that a perfectly run SMC study today would find effects on mortality that are indistinguishable from a null. 
    • Let's say that trial cost $10M to do, and it affected the allocation of hundreds of millions of dollars. I think that passes a cost-benefit test across a wide range of supplementary parameter values.
  • GiveWell cautions us not to take its expected value estimates literally, which is why I don't take its 5K per life saved estimate as a baseline.  
  • They also say that they only want to recommend charities that meet a benchmark of being 10X as cost-effective as cash. 
    • In effect, GiveWell is saying that if you give someone living in extreme poverty $10, the things they spend it on will only give them 10% of the utility that they might have gotten if someone else had chosen their bundle of goods for them. 
    • This is actually an extraordinary claim and thus requires extraordinary evidence.  It would definitely raise your hackles if someone said it about you — that you’re leaving 90% of potential value on the table because you don’t know what‘s best for yourself.
      • Are we saying that they don't have access to the correct bundle of goods for distribution/infrastructure reasons?
      • A few of the SMC studies note that people have bed nets but that they're in bad shape; others have found that "widely distributed ITNs have been repurposed as fishing nets throughout the world" (Larsen et al. 2021). 
      • At face value, this seems like evidence that people in extreme poverty don't value malaria treatments nearly as much as GiveWell does. What's going on there? Is it because they're ignorant or short-sighted? Or is it because Westerners are convinced of a theory that conflicts with people's lived experiences -- and also with the actual empirical evidence found by SMC studies? I don’t know, and I find these theories about equally plausible. 

I know this was all very approximate for a cost-benefit analysis, but IMO,we need a stronger basis for our assumptions about effect sizes than we currently have to be more specific.

  1. ^

    putative because I'm pretty sure it doesn't come from Human Challenge Trials, i.e. malaria was not actually the thing randomly assigned. FWIW I don't think that that trial would pass a cost-benefit test. 

  2. ^

    "The length of follow-up for the included trials varied from six months to two years; with one year being most common" (Meremikwu et al p.  8)

I appreciate the thoughts! I'm going to think about this more thoroughly... but here's a quick guess about the low death numbers:

These trials involved measuring malaria prevalence in children. Presumably, children with a positive result would then get medication or be referred to a health center. Malaria is a curable disease, so this approach would save lives. Unfortunately, it's also quite likely that the child would not receive appropriate treatment in the absence of a diagnosis, due to lack of knowledge of the parents, distance to health facilities, etc.

Anyway, it's just a quick guess. Might be worth checking if the studies describe what happened to children with positive test results.

Looks like I can confirm this. Relevant passages from Cissé et al (2006):

The study was designed to measure Malaria, not deaths:

The primary outcome measure was a comparison of the occurence of clinical malaria between children in the two study groups.

Children with positive malaria tests received treatment:

Malaria morbidity was monitored through home visits every week and by detection of study children who presented at one of three health centres in the study area. At each assessment, axillary temperature was measured, and if it was 37.5C or greater, or if there was a history of fever or vomiting during the previous 24 h, a blood film was prepared. Results of the blood film examination were usually available within 2 h. Antimalarial treatment was given when appropriate according to the national guidelines: chloroquine as firstline treatment, quinine or sulfadoxine-pyrimethamine as second-line treatment in cases of failure of treatment with chloroquine, and injectable quinine for cases with persistent vomiting or severe malaria. Study children received iron supplementation if they presented at a health centre with an illness suggestive of anaemia, pale mucosae, or both.

I'll still think more about this... but here we have at least a lead towards better understanding of low death numbers in SMC trials.

Thank you for looking into it! Definitely interesting.  To recap:

  • GiveWell's cost-benefit calculations hinge on the relationship between SMC and mortality. 
  • The key mediator there is cases of malaria. 
  • In the provided studies, the estimated relationship between cases of malaria and deaths is likely to be downwardly biased because of co-delivered interventions (ITN, HMM, and, as you've identified, just more attentiveness to malaria in general in treated areas).
  • As SMC is rolled out, is it rolled out along with more general medical care, or without? With co-interventions, or without? This seems like the key question we don't have a handle on and that GiveWell's materials don't shine much light on.
    • Let's say it's rolled out along with general medical care. In that case, what's actually doing the work in reducing mortality, SMC or medical care? And which set of costs (SMC, medical care, or the two combined) should factor into the $-per-life-saved calculation?
    • Let's say it's rolled out without that general medical care. In that case, do we really have a good estimate of the expected effects on mortality of just SMC? because that seems like the number GiveWell is basing its top charity title on, and at first glance, it's really not clear what percentage of the research actually estimates that directly. 
  • So in sum, either SMC is typically going to be rolled out in places/contexts where its effect on mortality is likely to be much lower than broader data about the relationship between malaria and mortality would suggest, which means that our $-per-life-saved metrics might be seriously off-base; or it will be rolled out in places that are very much unlike the settings in which the studies were run, which is a serious external validity problem. 

So all in all, a confusing situation. And given the high stakes,  I suggest that GiveWell taps a team with expertise in both the subject matter and RCTs to design and run an intervention that maps directly onto the target population. 

Two postscripts: 

  1. Just curious, is this the kind of thing y'all discuss day-to-day at AMF? I'm very curious to hear from practitioners on this kind of thing. I am a total outsider who happened to notice that the evidence in  Cochrane review didn't map very neatly onto GiveWell's analyses. Would love to know the 'insider' perspective a bit more.
  2. DataColada just published something about some structural issues with conventional meta-analysis that might be of interest.   

Thanks for the thoughts!

I think we are getting closer to the core of your question here: the relationship between cases of malaria (or severe malaria more specifically) and deaths. I think that it would indeed be good to know more about the circumstances under which children die from malaria, and how this is affected by various kinds of medical care.

The question might partially touch upon SMC. Besides preventing malaria cases, it could also have an effect on severity (I'm thinking of Covid vaccines as an analogy). That said, the case for SMC (as I understand it) is that it's an excellent way to prevent malaria infections. This is what the RCTs measure, and this is where its value comes from.

To answer the question, I believe it would be more helpful to do research into malaria as an illness, rather than doing an SMC trial replication. I continue to think that the evidence base for SMC is good enough. You have doubts since "most published research findings are false", but "most published research findings" might be the wrong reference class here:

  • It includes observational studies, surveys, and other less reliable methods; here, we have RCTs.
  • It includes all published studies, also those with small samples and effect sizes. Here, we have >7 trials, >12k participants, and the effect (SMC's reduction of malaria episodes) is >6 standard deviations away from zero.
  • It includes studies with effects that are multiple causal steps away from the intervention (e.g., deworming improves income) and have many confounding factors. Here, we are measuring the effect of a malaria medication on malaria, with clearly-understood underlying mechanisms.

You also ask about the settings in which SMC is rolled out. There is no specific answer here, since SMC is often rolled out for entire countries or regions, aiming to fully cover all eligible children. More than 30 million children received SMC last year. In their cost-effectiveness analysis, GiveWell looks at interventions by country and takes a number of relevant factors into account, such as the "mortality rate from malaria for 3-59 month olds".

In general, malaria fatality (deaths per case) is trending downwards a bit, due to factors such as better access to medical care, better diagnosis, better education of parents, and certainly many others. It could make sense to make this explicit when doing a cost-effectiveness analysis.

I'd expect GiveWell to be mindful about these things and to have thought of the most-relevant factors. I don't think additional RCTs would lead to large changes here.


Regarding the post-script about AMF: We are fortunate to have a board of trustees and leaders that think a lot about high-level questions and trends, both those closer to AMF's work (e.g., resistance to insecticides used in nets) and those more peripheral (e.g., the impact of new vaccines). There is also good and regular communication between GiveWell and AMF. As for myself, the day-to-day preoccupations are often much more mundane ;-)

Thanks as always for your careful and helpful read! I was just telling someone yesterday that this exchange is a positive reflection on the EA community and ethos — as a comparison point, it’s been way more constructive and collaborative than any of my experiences with academic peer review.

It sounds like I haven’t changed your mind on the core subject and that’s totally understandable. I speculate that this is something of a (professional) culture difference — the academics I discussed this essay with all started nodding along with the general idea the moment I mentioned “uncertainty about external validity” 😃

And thanks for the insight into AMF, y’all do great work.

The Right-Fit Evidence group provides good resources related to this post. They publish guidance on what types of evidence implementers should collect to demonstrate and monitor the impact of their programs.

Notably, different types of evidence are ideal depending on the stage of a program. In the beginning, when there is lots of uncertainty about an intervention, a randomized controlled trial is great. At a later stage, when the program is scaling to many recipients, it is more important to monitor the program and ensure that the implementation is done well.

In the case of SMC, millions of children receive treatments. A wealth of monitoring data is collected, much more than could be obtained in an RCT. Even though that data isn't randomized or controlled, its quantity might make up for these deficits and allow us to determine whether SMC works with sufficient confidence.

More information can be downloaded on the Right-Fit Evidence website. And here's an introduction to their framework.

Thanks, this is very useful and new to me! (I briefly consulted/worked for IPA in 2015-2016.)