In its cost-effectiveness estimate of StrongMinds, Happier Lives Institute (HLI) estimates that most of the benefits accrue not to the women who receive therapy, but to household members.
According to HLI’s estimates, each household member benefits from the intervention ~50% as much as the person receiving therapy. Because there are ~5 non-recipient household members per treated person, this estimate increases the cost-effectiveness estimate by ~250%. i.e. ~70-80% of the benefits of therapy accrue to household members, rather than the program participant.
I don’t think the existing evidence justifies HLI's estimate of 50% household spillovers.
My main disagreements are:
I’d guess the spillover benefit to other household members is more likely to be in the 5-25% range (though this is speculative). That reduces the estimated cost-effectiveness of StrongMinds from 9x to 3-6x cash transfers, which would be below GiveWell’s funding bar of 10x. Caveat in footnote.
I think I also disagree with other parts of HLI’s analysis (including how worried to be about reporting bias; the costs of StrongMinds’ program; and the point on a life satisfaction scale that’s morally equivalent to death). I’d guess, though I’m not certain, that more careful consideration of each of these would reduce StrongMinds’ cost-effectiveness estimate further relative to other opportunities. But I’m going to focus on spillovers in this post because I think it makes the most difference to the bottom line, represents the clearest issue to me, and has received relatively little attention in other critiques.
For context: I wrote the first version of Founders Pledge’s mental health report in 2017 and gave feedback on an early draft of HLI’s report on household spillovers. I’ve spent 5-10 hours digging into the question of household spillovers from therapy specifically. I work at Open Philanthropy but wrote this post in a personal capacity. I’m reasonably confident the main critiques in this post are right, but much less confident in what the true magnitude of household spillovers is. I admire the work StrongMinds is doing and I’m grateful to HLI for their expansive literature reviews and analysis on this question.
Thank you to Joel McGuire, Akhil Bansal, Isabel Arjmand, Alex Cohen, Sjir Hoeijmakers, Josh Rosenberg, and Matt Lerner for their insightful comments. They don’t necessarily endorse the conclusions of this post.
0. How HLI estimates the household spillover rate of therapy
HLI estimates household spillovers of therapy on the basis of the three RCTs on therapy which collected data on the subjective wellbeing of some of the household members of program participants: Mutamba et al. (2018), Swartz et al. (2008), Kemp et al. (2009).
Combining those RCTs in a meta-analysis, HLI estimates household spillover rates of 53% (see the forest plot below; 53% comes from dividing the average household member effect (0.35) by the average recipient effect (0.66)).
HLI assumes StrongMinds’ intervention will have a similar effect on household members. But, I don't think these three RCTs can be used to generate a reliable estimate for the spillovers of StrongMinds' program for three reasons.
1. Two of the three RCTs HLI relies on to estimate spillovers are on interventions specifically intended to benefit household members (unlike StrongMinds’ program, which targets women and adolescents living with depression).
I briefly skimmed each of the RCTs, but I haven’t reviewed them in depth.
- Mutamba et al. (2018) delivered psychotherapy to caregivers of children with nodding syndrome (a neurological illness that includes seizures) and modified the typical therapy guidelines with “addition of [nodding syndrome]-specific content”
- Swartz et al. (2008) delivered psychotherapy to depressed mothers whose children have psychiatric illness. It “uses specific strategies to assist mothers in managing problematic interpersonal relationships with their dependent, psychiatrically ill offspring.”
HLI does note limitations to external validity in its report, but concludes it’s not sure whether these differences would lead to an underestimation or overestimation of treatment effects (relative to StrongMinds).
“A limitation to the external validity of this evidence is that all of the samples were selected based on negative shocks happening to the children in the sample... We are not sure if this would lead to an over or underestimate of the treatment effects, but it is potentially a further deviation from the type of household we are trying to predict the effects of psychotherapy for.” Happiness for the Whole Family, 2022
Generalizing across interventions is difficult. But it seems intuitive to me that we should expect the interventions in those studies would benefit household members more than StrongMinds’ intervention, because (a) they were targeted at caregivers of children with psychiatric or neurological illnesses, who I’d expect to be particularly sensitive to the standard of caregiving, and (b) they were specifically designed to help those children.
This seems like the simplest explanation for why these studies were two of the only studies on psychotherapy to measure the effect on household members: because they were actively trying to help household members and so had a reasonable expectation of finding an effect.
A couple of caveats:
- The control group in Mutamba et al. (2018) may have also received a similar level of training in management of children with nodding syndrome. If this were the case, it’d weaken point (b), though point (a) would still stand.
- StrongMinds’ intervention is interpersonal group therapy, which one might expect to also benefit close relationships (and I do expect there’s some positive effect in expectation). But this still seems quite different from the specificity with which the interventions in these two studies were targeted. Other less targeted studies of interpersonal therapy (e.g. Bolton et al. 2003) didn’t measure effects on household members and I assume part of the reason is they didn’t expect to see a measurable effect.
2. Those RCTs only measure the wellbeing of a subset of household members most likely to benefit from the intervention
In its cost-effectiveness estimate, HLI applies the estimated spillover effect to ~five household members for each person treated for depression. But the spillover effect is estimated based on a single household member who seemed most likely to benefit from the intervention.
- Mutamba et al. (2018) treated caregivers and measured the effects on one child with nodding syndrome.
- Swartz et al. (2008) treated mothers and measured the effects on one child receiving psychiatric treatment.
Extrapolating these benefits to the entire household without discounting seems like a stretch to me for the same reasons as outlined above. It seems likely that these children were the household members most likely to benefit from the intervention which was (a) specifically targeted at their caregivers (b) specifically designed to help those children.
3. The results of the third RCT are inconsistent with HLI’s estimate.
In the forest plot above, HLI reports that Kemp et al. (2009) finds a non-significant 0.35 [-0.43,1.13] standard deviation improvement in mental health for parents of treated children.
But table 2 reports that parents in the treatment groups’ score on the GHQ-12 increased relative to the wait-list group (higher scores on the GHQ-12 indicate more self-reported mental health problems). 
To be clear, I doubt the intervention really did have a negative effect on parents’ mental health; the effect sizes on GHQ are small and far from significant. I looked at GHQ because it seems to be HLI's preferred measure of mental health. But another measure, the Impact of Event Scale (IES) does seem to have declined/improved (non-significantly) more in the treatment than the control group. And another, the General Functioning Scale (GFS) stayed about the same in both groups.
Given the inconsistency of results between different instruments, and the lack of statistical power, I don’t think we can learn much either way about household spillovers from this study.
What would I do if I wanted to get a better estimate of household spillovers from therapy?
My best guess of 5-25% spillovers is very subjective, and largely comes from a combination of adjusting the RCT evidence downwards for some of the concerns above, reflecting on my priors, and a very brief skim of some of some of the observational evidence HLI cites in its report.
If I wanted to get a better estimate, I’d consider:
- More carefully reviewing the observational evidence for associations between familial subjective wellbeing. For example, Das et al. 2008, a cross-sectional study of survey data across five low-and middle income countries, finds “a one standard deviation change in the mental health of household members is associated with a 0.22–0.59 standard deviation change in own mental health,” but notes omitted variable bias probably means this overestimates the causal effect. Other correlational studies find lower associations (5% and 25%), and note that while omitted variable bias might lead to an overestimation of the effect, measurement error might also lead to an underestimation of the effect.
- Looking at evidence for analogous effects. For example, how strong are household spillovers from education?
- Talking to people who participated in StrongMinds’ program and their household members to better understand potential mechanisms for spillover effects.
- Running a larger RCT which measures the impact of therapy on all household members.
“For psychotherapy, we estimate from three studies the spillover ratio to be 53% (95% CI: 11%, 108%)” Happiness for the Whole Family
“We use the non-recipient household size, the household size minus the recipient of the intervention. For StrongMinds, we find a non-recipient household size of 4.85 (95% CI: 1.01, 8.94).” Happiness for the Whole Family
“After including the household spillover effects, we estimate that psychotherapy is 9 times (95% CI: 2, 100) more cost-effective than cash transfers (before it was 12 times).” Happiness for the Whole Family
HLI has a different perspective on the cost-effectiveness of GiveWell’s top charities (which are mostly focused on life-saving interventions), such that it’s not entirely clear how comparable GiveWell’s funding bar relative to cash is with HLI’s implied funding bar relative to cash.
Mutamba et al. (2018), Table 1
Mutamba et al. (2018), Table 1 suggests the intervention included an adaptation of nodding-syndrome specific content. I’d also guess that in a group setting with a shared experience as specific as caring for a child with nodding-syndrome, that would inform much of the focus of the discussion, even if it wasn’t deliberately included as part of the intervention.
On the other hand, the control group received at least some relevant training. From Mutamba et al. 2018, pg 3 “All health workers at HFs in the study sites received training on the medical management of children with NS. Caregivers in the TAU and experimental-arm villages received usual care as provided for in the national management guidelines for NS (Idro et al. 2013). TAU included health education about NS, syndromic management of children with pharmacological agents (typically sodium valproate or carbamazepine), caregiver education and supportive counselling (Idro et al. 2013).”
“A child was included if they had NS and a consenting caregiver with them. In families with more than one child with NS (approximately 30%), caregivers were asked to identify one of their children most affected by NS that would participate in the study” Mutamba et al. (2018), pg 2.
“Children gave informed consent or assent after mothers were deemed eligible for inclusion. If multiple children were eligible, the mothers were asked to designate one child participant.” Swartz et al. (2008)
“Despite random allocation to group, wait-list parents reported higher self-reported health problems on the GHQ – 12” Kemp et al. (2009)
“Kemp et al. (2009)... the parents’ (who did not receive therapy) mental health was measured with the 12-items General Health Questionnaire (GHQ-12). There was a significant difference in GHQ-12 levels between control and treatment group parents, so we used the difference of differences between baseline and post-treatment to adjust for this.” Happiness for the Whole Family
“There are several possible channels for such a correlation. It may reflect omitted household variables, such as household-specific shocks or a lack of health services. It could also reflect unobserved individual traits, if assortative mating leads those with poor mental health to marry and perhaps pass on genetic factors influencing mental health of other family members. But it is also plausible that the presence of one household member with poor mental health creates a poor mental health environment for other household members, a “contagion” effect.” (Das et al. 2008)
Powdthavee and Vignoles 2008, a study in the UK, finds a one standard deviation increase in parents' mental distress in the previous year is associated with 25% of a standard deviation lower life satisfaction in the current year for girls “Given that the mean of LS for girls is 5.738 and its standard deviation is 1.348, a ceteris paribus increase of one standard deviation in either parent’s mental distress level explains around a 25% drop in the standard deviation in the girl’s LS.”
Mendolia et al. 2018, a study in Australia, finds a one standard deviation increase in partner’s life satisfaction is associated with 5% of a standard deviation increase in individual life satisfaction “In Table 7, we begin with the analysis of the impact of the partner’s standardised SF36 mental health score (0-100, where higher values represent higher level of well-being). Increasing this score by one standard deviation increases individual’s life satisfaction by 0.07 points (on a 1-10 scale), which is equivalent to 5% of a standard deviation in life satisfaction. To put this in context, this is similar to the (reversed) effect of becoming unemployed or being victim of a property crime (see Table 10).” Mendolia et al. 2018
Powdthavee 2009: “In addition to the above confounding influences which make it difficult for the true relationship between partners’ well-being to be identified, the estimates of spousal correlation in LS may also suffer from the negative measurement error bias. There may be, for example, a tendency for individuals to misreport their true LS in surveys. The low signal-to-noise ratio caused by misreporting can result in an estimated coefficient on partner LS that is biased towards zero in a large sample. In short, because there are both positive (correlated effects) and negative (measurement error) biases involved, the direction of bias is unclear on a priori ground.”