Longer title: Better evaluation of non-pharmaceutical interventions as a possible cause area within pandemic preparedness and response
Many thanks to Aaron Gertler for commenting on a draft of this post.
Here I outline why I think a neglected and important aspect of pandemic preparedness and response is evaluation of the effectiveness of non-pharmaceutical interventions (NPIs). NPIs are also known as behavioural, environmental, social and systems interventions (BESSIs) or non-drug interventions (NDIs). They include interventions like mask wearing, hand-washing, social distancing, quarantining, school openings, and interventions that aim to change behaviour with regards to any of those things.
This post is specifically about the evaluation of NPIs to reduce transmission or severity of disease during a pandemic, rather than interventions to improve “by-products” of pandemics (e.g. mental health due to isolation).
My motivations for writing are i) I could feasibly work on some of the research avenues I’ve suggested, so I would like feedback on whether it is a promising area to explore further and ii) to generate awareness for this area and discussion that might be relevant to others working in health or pandemic research. I will potentially update the post in response to comments and could write up a more in-depth investigation as a paper.
For context, I am a medical science post-doc doing meta-research around drug and device development and evaluation. I am not an expert in pandemics or NPI evaluation.
Consider this a shallow overview from someone who is quite familiar with the COVID-19 research response. I have weak to moderate confidence in the arguments and data that I present and would like to see counter-points or other data sources. It’s definitely possible that I’ve missed some key resources. I’ve relied quite a lot on information available here.
I think there is a fairly strong case that NPI evaluation is neglected (at least compared to drugs and vaccines) and that the scale of its impact could plausibly be quite high. I am less sure about tractability, but I do think there are small additional bits of research that can be done to better assess tractability (so better understanding of tractability is tractable).
I mainly discuss this idea in the context of COVID-19.
- NPIs are always going to be an important part of pandemic response, because effective vaccines and drugs will not be immediately available everywhere
- Very few randomised controlled trials of NPIs have been planned or conducted compared to drug and vaccine trials. There appears to be limited interest and discussion of this in EA so far, with the notable exception of a large ongoing RCT funded by GiveWell
- Many people can make use of NPIs, so although their effects may be expected to be modest (compared to a vaccine, for example), small improvements could have large impacts at the population level
- The fact that so few trials have been done of NPIs suggests that there may be obstacles to conducting them. This, and challenges with generalisability, might limit their tractability for pandemic response/preparedness
- There are several research areas that could be worth investigating further. I list some of them
Feedback that might be useful
- What are some objections to anything I’ve written here?
- Are there other sources of information that I should look into?
- Do you have ideas for other research projects that should be on the list (see “Some possible projects”)?
- Are there good reasons why have so few NPI RCTs have been done?
- Which possible projects (if any) seem most important?
What I mean by evaluating NPIs
I mean evaluation in randomised controlled trials in a somewhat real-world setting. My impression is that, for drugs during the pandemic, observational studies have not been able to tell us much reliably about effectiveness, and the most useful findings on effectiveness have come from large RCTs like RECOVERY. Likewise, I am sceptical that the small to moderate effects of many NPIs can be reliably detected in observational studies.
The sorts of questions that I am talking about are:
- What interventions encourage people to wash their hands and what effect does this have on transmission?
- Does wearing a face mask while exercising reduce risk of transmission?
- Are face shields as effective as face masks in real world settings?
- Do air filtration systems in schools protect students from infection, and by how much?
- How should schools be re-opened? How far away should pupils sit from each other? Should all pupils be at school at the same time?
- Does public health messaging around stay-at-home orders help? E.g. do TV adverts about stay-at-home orders work? Which are best?
- What type of mask is most effective in reducing community transmission?
- Does it make a difference if 1m vs. 2m. of social distancing is recommended in terms of risk of infection?
- Does a stay at home order reduce transmission more than a recommendation to not meet other households?
- (some of this list is inspired by the talk here)
The sorts of questions that I am not talking about are:
- Does wearing a cloth mask over a medical mask increase particulate capture in a lab study done on dummy heads? (discussed here)
- Does poster design influence recall of hand-washing steps? (summary here)
- Do texts/infographics influence recall of how to wear a facemask? (summary here)
There are three main strategies for reducing transmission and severity of disease in pandemics: vaccines, drugs, and NPIs.
There are now several effective vaccines being rapidly rolled out for COVID-19. However, we might have been lucky to produce effective vaccines this quickly. Historical success rates for viral vaccines are around 10% from phase II to FDA approval (link), and we have been unable to develop vaccines for some viruses like HIV and Herpes Simplex virus despite considerable effort. It is plausible that in future pandemics we either will not be able to produce an effective vaccine at all, or that it will take longer. In this pandemic, it’s taken just under one year to develop and approve the first vaccines, and it will take a while for them to be distributed globally. When it is possible to develop a vaccine, there will inevitably be a delay from the emergence of the pathogen to widespread availability.
Drugs are a useful tool in reducing deaths from COVID-19, though despite many trials the most effective drugs for COVID-19 found to date have relatively modest effects. Drugs have been somewhat successful in reducing severity of COVID-19, particularly in hospitalised patients, e.g. dexamethasone reduces deaths in hospitalised patients by ~20% and may have saved 1 million lives.
As far as I know there are no widely used drugs that reduce transmission or prevent serious COVID-19 (second statement supported here, although see summary of promising efforts here). In general, ‘cures’ for diseases are extremely rare. Like vaccines, historical success rates for drugs are low (~14% from phase I to approval by one estimate), and approval does not necessarily mean that the drug is very effective. As with vaccines, when effective drugs can be made, there will be a delay while they are not available widely, or are prohibitively expensive in some places.
NPIs have so far been the main contributor to reductions in spread of COVID-19. In any pandemic scenario that I can imagine, they will be the first interventions used to control the spread of the pathogen.
There is work characterising the impact of large government interventions (e.g. this science paper, this Nature paper (though that has been criticized); however, it seems that very little is known about the effectiveness of ‘smaller’ interventions like the impact of different types of mask wearing, social distancing, hand-washing, or different strategies to communicate aspects of these.
I know of one organisation (the BESSI collaboration) focussing specifically on NPIs: they are a small group of researchers doing this in their spare time with (as far as I know) no dedicated funding.
One scorecard of controlled trials finds that 2155 drug trials have been registered compared to 13 NPI trials (source accessed 7th March 2021). For context, there are 196 RCTs registered to study hydroxychloroquine (HCQ) (based on applying the filters: hydroxychloroquine, interventional, randomised, and trial registry record here). In October 2020, there were at least 26 RCTs of HCQ with results available for meta-analysis. There are at least 71 vaccines in clinical trials (so the number of trials will be much greater).
I know of very few published RCTs of NPIs. One exception is the DANMASK-19 study, which enrolled about 6,000 patients for a study of the effect of a mask-wearing recommendation. The study has a number of potential limitations (discussed here), but illustrates that it is at least feasible to assess the impact of mask use in an RCT. Another is the TRAiN study, which randomised 3,764 individuals to have access or not to gyms. It was limited by extremely low prevalence of COVID-19 (1 case in total in the study) but again illustrates that RCTs can be done. Notably, the TRAiN study was done very quicky – the first preprint was published June 24th 2020.
6,000 and 4,000 people trials are quite large, but compared to the largest drug and vaccine trials they are small. RECOVERY has randomised nearly 40,000 patients; the phase 3 trial of the Oxford-AstraZeneca vaccine alone trial recruited >17,000 patients.
A notable example of NPIs being studied in a larger scale trial is a GiveWell funded cluster RCT in Bangladesh. It aims to determine the effect of a recommendation to wear facemasks in terms of mask-wearing, COVID-19 spread, and protection from infection. About 600 villages are expected to be randomised, each with a population of ~1,000. Results were expected to be available in March 2021 though I haven’t yet seen them.
Another relevant study that I’ve come across investigates the role of SMS reminders or calls on compliance with self isolation. The trial registration says it is completed but I couldn’t find the results, including by looking through their blog.
I am not sure if amount of funding is a good measure of neglectedness here, because the cost of evaluating NPIs should be much lower than that of developing and evaluating drugs and vaccines. On the other hand, the cost of NPIs to the economy and to people’s personal well-being is probably very high, so you might expect that a lot of funding should go into understanding what is and isn’t effective. I therefore tried to get a brief overview, though have not found particularly good data.
As of 30th August 2020, according to one database, £135m public and philanthropic funding for COVID-19 research had been directed to the very broad category of “behavioural and social sciences research”, compared to $2.2 billion for vaccines and $187m for therapeutics. The numbers for vaccines and therapeutics will be major underestimates of total research spending as they do not include private sector investment. The £135m will likely be an overestimate of the amount spent on NPIs as I expect there is relatively little incentive for private sector investment and because behavioural and social sciences research includes “for example people’s behaviour, attitudes, policy research, and society response.” Regardless, it appears that very little of this funding is translating to conduct of controlled trials if the above figures are correct.
I looked through Open Philanthropy’s grant database at all grants in the focus areas: “Biosecurity and Pandemic Preparedness” and “Global Health and Development” since the beginning of the pandemic. Based on the titles, only one was directly related to NPIs, which was “to support experiments on the decontamination and safe reuse of personal protective equipment for health care workers treating COVID-19 patients”. It does not appear to involve any testing of the effectiveness of the PPE. Their report on Research and Development to Decrease Biosecurity Risks from Viral Pathogens doesn’t cover NPIs.
The GiveWell funded trial mentioned above is also relevant here.
Discussion in EA
I searched the EA forum for articles on this topic and did not find any. Testing NPIs does not seem to come up in COVID-19 focussed posts, e.g. this or this.
The number of people affected by NPIs is several orders of magnitude higher than the number who will receive drugs for COVID-19, and is likely to be greater than the number who receive vaccines.
Since NPIs are cheap and readily available, they can be expected to be the first interventions used in any pandemic, and can be expected to be used extremely widely (e.g. nationally, internationally or globally) in short periods of time (e.g. within weeks). As such, even small improvements in effectiveness could have large impacts at the population level. If future pandemics are similar to COVID-19 in terms of time to make and distribute vaccines, there will be time for these benefits to be realised, particularly in countries that have slower access to vaccines. Those countries are also less likely to be able to afford and access drugs, increasing the importance of understanding NPI effectiveness.
Very large effects of NPIs can probably be reliably detected in observational research (e.g. the effect of lockdowns), but I think it’s likely that there will be smaller differences in some interventions that would be difficult to detect without an RCT and are nevertheless important. For example, it seems plausible to me that some combination of better mask wearing habits and mask materials could reduce the likelihood of transmission between people on the order of 10% compared to how most people wear masks. Since such an intervention could be ‘delivered’ to essentially everyone, the magnitude of benefit could be high at the population level, and could be significant in reducing transmission rates below some critical value.
The cost of using NPIs that are not necessary is also potentially high: for example, if social distancing of 1m would allow many businesses to open but 2m would not, you would want to know whether 1m distancing is sufficient; if it is safe for individuals to meet outside but not inside, it would be important to know this so people can meet outside to e.g. minimise mental health damage from isolation. Again, since these costs apply at such a large scale (e.g. the global economy or billions of people’s mental health) they could be important.
It appears that decision makers who recommend public health measures are reluctant to engage in probabilistic reasoning with regards to interventions (e.g. mask wearing was initially not recommended I think largely because there was ‘no solid evidence’ of its effectiveness). One way to address this is through encouraging better probabilistic reasoning, but another is to run the trials that provide the ‘gold standard’ evidence to rely on.
Even in a much more severe pandemic scenario, better evaluation of NPIs could be valuable. For example, if it is essential for most people to stay at home, knowing the most effective messaging strategies would be important. Better information about the safest ways for essential workers to travel might similarly be useful.
Finally, I think research here is unlikely to represent an information hazard, whereas other aspects of pandemic related research (e.g. basic biology or vaccinology) potentially do. While the potential upsides of better NPI evaluation may be lower, there may be less risk of major negative consequences.
The fact that so few trials have been done of NPIs suggests that they may be difficult. One very large trial (pdf download) planned in the Netherlands of school openings did not start, ostensibly due to challenges with public support. The GiveWell funded trial mentioned above appears to have been delayed by at least several months while trying to obtain required permissions and approvals (link). The DANMASK-19 and TRAiN studies trials discussed above provide counterpoints to a certain extent, in that they show that trials can be run, but neither provided particularly useful evidence.
Assuming that they can be done, I think a key challenge is that, compared to drugs and vaccines, findings from NPI trials are likely to be much less generalisable: i.e. they will be relatively specific to the setting and conditions in which the trial was conducted. For example, the effectiveness of a particular social distancing intervention will depend on the community prevalence of the disease, the population density, the proportion of people who need to physically go to work, the local culture, etc. This means that for NPI trials to be most useful they would need to be conducted in many different places and may need to be redone in the same places as conditions change.
If this is true, then for NPI evaluation to be most useful, it would not be a case of funding the occasional definitive trial, but moving towards a culture of ongoing evaluation. Trials would need to be relatively flexible so that they could respond to changing situations and ensure that the results remain useful. Relatively new adaptive trial designs, like that used in RECOVERY, may be relevant here.
I don’t have a good sense of the cost of running RCTs of NPIs. I expect they will be quite expensive compared to e.g. funding philosophy research, but not very expensive compared to e.g. running vaccine trials. This might be roughly in the millions of dollars to run at a large enough scale to provide useful answers. GiveWell have so far granted $3.14 million for the Bangladesh trial, though I don’t know if they are the sole source of funding.
However, I think a lot of funding for setting up and running NPI trials could plausibly come from non-EA funding sources, such as research councils (e.g. MRC UK, NIHR) or research charities (e.g. Wellcome Trust), or directly from governments. The counterfactual use of such money would mostly not be EA-aligned projects, so the opportunity cost might not be as high as implied by looking at the dollar amounts and comparing to other ‘EA uses’ of similar sums. The fact that there is already some interest in the medical research community supports this view (e.g. the BESSI collaboration).
I think EA funding could be useful for more detailed scoping research in this area, possibly specifically considering its relevance to catastrophic biorisks, and funding meetings and small collaborations to help individuals and groups get some momentum and put together ideas for larger grants from other funding bodies.
I can also imagine a scenario where a funding organisation reviews trial protocols outside of a pandemic scenario, and agrees to fund them subject to certain conditions being met with limited additional review, ensuring trial can start rapidly when needed. This is a similar idea to registered reports, in which the introduction and methods of a paper are reviewed and given ‘in principle acceptance’ before any results are collected. Development of many protocols could be funded, each with adaptations for different pandemic scenarios, and these protocols could be peer reviewed before any research is done. EA funding organisations may be better placed to commit to an unconventional funding approach like this.
In the next section, I provide some suggestions for research projects that I think might be useful. Some could be done right away. Others are more speculative, larger scale projects to give an idea of what I think major progress here might look like, assuming that this is indeed a potentially important cause area.
Some possible projects
Immediate/small scale (little/no funding required)
These projects are largely about trying to substantiate and improve the information in this post and establish whether evaluating NPIs is indeed worth focussing on:
- Improve/criticise any information in this post
- Scoping research, including more systematic assessment, to identify and summarise existing research on NPIs. etc e.g. of school openings
- Find more information on spending on NPI evaluation during COVID-19
- Do literature reviews of public support for different types of evaluation
- Find information on costs of NPI trials
- Identify sources of money to fund this sort of work (EA and non-EA)
- Research whether evaluating NPIs is relevant to GCBRs or only less severe pandemics
Medium scale (some funding needed)
- Develop and prioritise research questions involving NPIs that would be useful to investigate in pandemic scenarios
- Develop a central repository of protocols to address those research questions and get feedback from the scientific community on methods and feasibility. The protocols could act as ‘templates’ that researchers can easily adapt to their specific situation in a pandemic scenario. SOLIDARITY and RECOVERY trials were apparently both based on pre-pandemic protocols
- Conduct surveys of public support for trials of NPIs
Longer term (more funding needed)
- Conduct feasibility studies or ‘dry runs’ of the most promising trials
- Get funders to agree to fund the actual trials of the protocols given certain conditions are met; seek engagement and agreement from all relevant stakeholders ahead of time to ensure they can be done quickly when needed
- Fund research groups tasked with ensuring trial protocols are kept up to date
- Conduct simulation studies to study the value of trials of different interventions on an ongoing basis
- Develop infrastructure to enable easy conduct of trials like RECOVERY but for NPIs during pandemics (i.e. large scale, pragmatic, adaptive)
- Run the trials in a pandemic scenario
I like this post a lot. It's well-written, thoughtful, well-linked and thorough. I especially like the way you've scoped out the different project ideas at the end of the post, and you include several classes of project that I think are important but usually left out of these sorts of lists.
It seems likely to me that this would be a very good use of money for "generalist" (non-GCBR-focused) pandemic prevention. I'd be very interested in hearing counterarguments to that.
For GCBR reduction, I'm less sure how valuable it would be to better evaluate existing NPIs, as opposed to developing (and evaluating) new and improved NPIs targeting key weak points. But I could imagine being persuaded fairly easily on this point, and would like to get other people's takes (though perhaps not in an open forum).
Thanks a lot for the comment. I was a bit nervous to put my first post up so some positive feedback is very much appreciated.
Truly excellent post!
My intuition is that research abouts NPIs on behavioural change might be more tractable and therefore impactful than research where the endpoint is infection. If the endpoint is infection, any study that enrolls the general population will need to have very large sample sizes, as the examples you listed illustrate. I am sure these problems can be overcome, but I assume that one reason we have not seen more of these studies is that it is infeasible to do so without larger coordination.
While it is unfortunate and truly surprising that we have very little research on e.g. the impact of mask wearing and distancing, we do know that certain behavioural, realistic changes would be completely sufficient to squash the pandemic in many regions.
The change does not have to be large: As the reproductive number R is magically hovering around ~1.1 to ~1.3 in most regions in the Western world, it would be sufficient if people would act just a little bit more careful to get R below 1: That could mean reducing private meetings by e.g. one third (or moving them outside), widespread adoption of contact tracing apps, placing air filters in schools, or targeting public health messaging towards people that currently are not reached or persuaded. I have seen some research about vaccine hesitancy, but far less about these other areas. At the very least, a randomized study comparing different kinds of public health messaging seems really easy to do.and fairly useful. This might look differently for the next pandemic though.
More broadly: As you alluded to, fostering and increasing coordination between researchers looking to conduct a study might also be really useful. This applies probably even more to research about drug interventions, but way too much of it is underpowered and badly conducted, and thus pretty much useless before results have even been published. This paper argues that the solutions are already known (e.g. multicenter trials), but not implemented widely due to institutional inertia. Again, it is worth looking into how to facilitate such coordination, I believe that large cash grants by EA aligned institutions conditional on coordination between different trial sites could work.
I liked the post (I have worked a lot on RCTs, and a little on NPIs, but not together alas!). Here's another paper that I didn't see mentioned (although maybe I missed it) which I think roughly falls into the category you're considering?https://www.nber.org/system/files/working_papers/w27496/revisions/w27496.rev0.pdf
I feel like there are also some behavioral econ papers looking at e.g. social distancing in queues, but off the top of my head I'm not certain if there are actual randomized field experiments in that space...
Good find - thanks for sharing that paper which I hadn't included. If I update the post I'll add that.
For future searching, where/how did you come across that paper?
Just saw some recent results from a cool NPI study (examining the impact of many different interventions on mask-wearing in Bangladesh). See this slideshow for a quick summary of results.
The headline: Mask distribution and promotion tripled mask usage in test locations, and led to sustained mask use for ~10 weeks after the promotion ended.
From the slideshow, here's what worked from the intervention:
The cost-effectiveness numbers aren't too impressive, but I'd think the numbers would look better for a deadlier pandemic:
Thanks a lot for sharing this. I need to update the post to add this and other research that has been pointed out to me.
Thanks a lot for this! Like willbradshaw I agree that this post is "well-written, thoughtful, well-linked and thorough!"
If I were to nitpick, I think my biggest objection is that your approach to tackling the problem of NPIs for pandemic preparedness and response appears extremely atheoretical. I think this is fine for a scoping study that tries to estimate the scale of the problem, and fine (perhaps even highly underrated!) for clinical studies. But I think we can get decent results at lower cost with a bit of simple theory.
I believe this because I think the human body in general, and the immune system in particular, is woefully complicated, so it makes sense that we cannot have much faith in biologically plausible mechanisms for treatments, which leads us to necessitate correspondingly greater faith in end-to-end RCTs(and be in a state of radical cluelessness otherwise). But there are other parts of epidemiology that's simpler and more well-understood, such that for transmission we can be reasonably confident in our ability to dice the problem and isolate it into specific confusing subcomponents.
For example, suppose we are worried about a potential respiratory disease pandemic, and we want to figure out whether intervention X (say installing MERV filters for offices) has a sufficiently large impact on an (un)desired endpoint (eg symptomatic disease, hospitalizations). One approach might just be:
I think this is good, but potentially quite expensive/time-consuming (which is really bad in a fast-moving pandemic!). One way we can potentially do better:
My decomposition isn't particularly interesting, but I think it's reasonably clean. With it, we can
Tackle 1) with human challenge trials where microbe dose/frequency/timing is variable, to understand what are plausible ranges of parameters for how many droplets are needed to be bad.
Tackle 2) with some combination of
Now my decomposition is still quite high level, and I'm not sure that my suggested instrumentalizations here aren't dumb. But hopefully what I'm gesturing at makes sense?
Thanks a lot for the comment. I do think that what your gesturing at makes sense: if I understand correctly you are saying that certain physical interventions can have more predictable effects that ‘biological’ ones because we have a decent idea of exactly how they work. In some cases this is definitely true: as an extreme example, we don’t need RCTs of aeroplane safety as we have a very good understanding of the physical processes and are able to model them well. If we have an airborne pathogen, it’s hardly necessary to run an RCT to see whether or not there is an effect of a stay at home order: there will be one.
In many of the example questions I gave though, I think the fact that there is a large behavioural component pushes us closer to the situation we have with drugs than to the aeroplane. For example, although it could be demonstrated in a laboratory which of mask or shield is actually more effective at blocking exhaled particles, it would be harder to capture the different effects that each has on how often you touch your face, how often it is removed, or other aspects of compliance. These will differ a lot between people, so you’d need to test it on a large group, and the social setting might influence behaviour. I don’t think that we can decompose the often important behavioural component of these interventions in the same way that we can the physical components.
That said, the air filtration question I posed might not have been well chosen. As you point out, it seems reasonable that we can get a good understanding of whether that is likely to be helpful by applying what we know about the filters and viral transmission. Of the questions I posed, RCTs are likely to be the least useful there and may not be useful at all.
However, I do have some thoughts on why an RCT could still be worthwhile. I’m not saying these because I disagree with your points; I’m just providing some possible counterargument.
Overall, I think the areas where trials would be most useful are those where we can expect relatively modest effects and where there is a larger behavioural component. The combination of modest effects, if better understood, might be quite important.
There's an additional factor: Marketing and public persuasion. It is one thing to say: Based on a theoretical model, air filters work, and a totally different thing to say: We saw that air filters cut transmission by X% . My hope would be that the certainty and the effect estimate could serve to overcome the collective inaction we saw in the pandemic (in that many people agree that e.g. air filters would probably help, but barely nobody installed them in schools).
Good point. This is similar to what I was trying to get at when talking about lack of willingness to engage in probabilistic reasoning.
Do you have thoughts on pandemic prevention NPIs (eg vector control)? Many of these things are technically non-pharmaceutical interventions, though of course looks very different from mask mandates or social distancing orders!
Key thought on vector control: vector control is tricky.
Mostly, we care about mosquitoes, where there is tons of work, and on mammalian carriers, where we know that people like farming / hunting, then eating the animals, so vector control is a bit different. There's lot of work on this, and a literature review for the forum might be a good thing for someone to write.
I haven't thought much about this so can't add anything useful at the moment. If I think of / come across anything I'll reply again.
Strongly agree about many of these points. I think it's worth looking at our earlier post, and the paper we wrote on almost exactly this topic which we worked on in 2019 - which obviously aren't focused on post-COVID-19 ideas.
On objections to trials, there is a large literature about the difficulty of assessing the impact of interventions, from Pearl's fundamental argument, here (pdf), to the entire corpus of work on generalizability and transferability in practice.
I think that further development of the suggested potential projects would be valuable - if you agree, I'd be happy to discuss how to turn them into more concrete proposals. Though in fact, many of these have already been done - a literature review (post is strongly recommended reading!) would probably find many pieces like this one that address many of your points.