Derek

# Wiki Contributions

Donating money, buying happiness: new meta-analyses comparing the cost-effectiveness of cash transfers and psychotherapy in terms of subjective well-being

There is much to be admired in this report, and I don't find it intuitively implausible that mental health interventions are several times more cost-effective than cash transfers in terms of wellbeing (which I also agree is probably what matters most). That said, I have several concerns/questions about certain aspects of the methodology, most of which have already been raised by others. Here are just a few of them, in roughly ascending order of importance:

1. Outcomes should be time-discounted, for at least two reasons. First, to account for uncertainty as to whether they will obtain, e.g. there could be no counterfactual benefit in 10 years because of social upheaval, catastrophic events (e.g. an AI apocalypse, natural disaster), or the availability of more effective treatments for depression/ill-being/poverty. Second, to account for generally improving circumstances and opportunities for reinvestment: these countries are generally getting richer, people can invest cash transfers, etc. This will be even more important when assessing deworming and other interventions with benefits far in the future. (There is probably no need to discount costs as it seems they are incurred around the time the intervention is delivered in both cases.)
2. I've only skimmed the reports, but it isn't clear to me what exactly is included in the costs for StrongMinds, e.g. sometimes capital costs (buildings etc), or overheads like management salaries and rent, are incorrectly left out of cost-effectiveness analyses. If you haven't already, you might also want to consider any costs to the beneficiaries, e.g. if therapy recipients had to travel, pay for materials, miss work, etc. As you note, most of the difference in the cost-effectiveness is determined by the programmes' costs rather than their consequences, so it's important to get this right (which you may well have done).
3. You note that both interventions are assessed only in terms of their effect on depression. A couple years ago I summarised the findings of the four available evaluations of GiveDirectly in an unpublished draft post ( see Appendix 2.1, copied below, and the "GiveWell" subsection of section 2.2, the relevant part of which is copied below). The studies recorded data on many other indicators of wellbeing, which were sometimes combined into indices of "psychological wellbeing" with up to 10 components (as well as many non-wellbeing outcomes like consumption and education). Apologies if you explain this somewhere, but why did you only use the data on depression? Was it to facilitate an 'apples to apples' comparison, or something like that? If so, I wonder if it that was loading the dice a bit: at first blush, it seems unfair to compare two interventions in terms of outcome A when one is aimed solely at improving outcome A and the other is aimed at improving outcomes A, B, C, D, E, F, G and H (at least when B–H are relevant, i.e. indicators of subjective wellbeing).
4. I share others' concerns about the omission of spillovers. In the draft post I linked above (partly copied below), I recorded my impression that the evidence so far, while somewhat lacking, suggests only null or positive spillovers to other households (at least for the current version of the programme, which 'treats' all eligible households in the village). As part of a separate project I did last year (which I'm not allowed to share), I also concluded that non-recipients within the household benefited considerably: "Only about 1.6 members of each household (average size ~4.3) were surveyed to get the wellbeing results, of which only 1 actually received the money. There was no statistically significant wellbeing difference between the recipients and surveyed non-recipient household members, and there is evidence of many benefits to non-recipients other than psychological wellbeing (e.g. education, domestic violence, child labour). Nevertheless, we expect the effects to be a little lower among non-recipients…" Omitting the inter-household spillovers is perhaps reasonable for the primary analysis, but it seems harder to justify ignoring benefits to others within the household.
5. Whatever may be justified for the base case, I don't understand why you haven't done a proper sensitivity analysis. Stochastic uncertainty is captured well by the Monte Carlo simulations, but it is standard practice in many fields (including health economics) to carry out scenario analyses that investigate the effects of contestable structural and methodological assumptions. It should be quite straightforward to adapt the model so as to include/exclude (or vary the values of) spillovers, non-depression data, certain kinds of costs, discount rates, etc. You can present the results of these analyses yourself, but users can also put their own set of assumptions in a well-constructed model to see how that changes things. (Many other analyses are also potentially helpful, especially when the difference in cost-effectiveness between the alternatives is relatively small, e.g. deterministic one-way and two-way analyses that show how the cost-effectiveness ratio changes with high/low values for each parameter; threshold analyses that show what value a parameter must attain for the 'worse' programme to become the more cost-effective;  value of information, showing how much it would be worth spending on further studies to reduce uncertainty; and perhaps most usefully in this case, a cost-effectiveness acceptability curve  indicating the probability that StrongMinds is cost-effective at a given threshold, such as the 3-8x GiveDirectly that GiveWell is currently using as its bar for new charities. Some examples are here.)

Topic 2.2: (Re-)prioritising causes and interventions

[…]

GiveWell

[…]

Spillover effects

Secondly, there are also potential issues with ‘spillover effects’ of increased consumption, i.e. the impact on people other than the beneficiaries. This is particularly relevant to GiveDirectly, which provides unconditional cash transfers; but consumption is also, according to GiveWell’s model, the key outcome of deworming (Deworm the WorldSightsavers, the END Fund) and vitamin A supplementation (Hellen Keller International). Evidence from multiple contexts suggests that, to some extent, the psychological benefits of wealth are relative: increasing one person’s income improves their SWB, but this is at least partly offset by decreases in the SWB of others in the community, particularly on measures of life satisfaction (e.g. Clark, 2017). If increasing overall wellbeing is the ultimate aim, it seems important to factor these ‘side-effects’ into the cost-effectiveness analysis.

As usual, GiveWell provides a sensible discussion of the relevant evidence. However, it is somewhat out of date and does not fully report the findings most relevant to SWB, so I’ve provided a summary of wellbeing outcomes from the four most relevant papers in Appendix 2.1. In brief:

• All four studies found positive treatment effects, i.e. improvement to the psychological wellbeing of cash recipients, though in two cases this finding was sensitive to particular methodological choices.
• Two studies of GiveDirectly found negative psychological spillovers.
• Two found only null or positive spillovers.

As GiveWell notes, it is hard to aggregate the evidence on spillovers (psychological and otherwise) because of:

• Major differences in study methodology (e.g. components of the psychological wellbeing index, type of control, inclusion/exclusion criteria, follow-up period).
• Major differences in the programs being studied (e.g. size of transfers, proportion of households in a village receiving transfers).
• Absence of key information (e.g. how many non-recipient households are affected by spillover effects for each treated household, how the magnitude of spillovers changes with distance and over time, how they differ among eligible and ineligible households).

Like GiveWell, I suspect the adverse happiness spillovers from GiveDirectly’s current program are fairly small. In order of importance, these are the three main reasons:

• The negative findings were based on within-village analyses, i.e. comparing treated and untreated households in the same village. These may not be relevant to the current GiveDirectly program, which gives money to all eligible households in treated villages (and sometimes all households in the village). The two studies that investigated potential spillovers in untreated villages in the same area as the treated ones found no statistically significant effect.
• Eggers et al. (2019) (the “general equilibrium” study), which found only null or positive spillovers, was by far the largest, seems to have had the fewest methodological limitations, and investigated a version of the program most similar to current practice.
• At least one of the ‘negative’ studies, Haushofer & Shapiro (2018), had significant methodological issues, e.g. differential attrition rates and lack of baseline data on across-village controls (though results were fairly robust to authors’ efforts to address these).

In addition, any psychological harm seems to be primarily to life satisfaction rather than hedonic states. As noted in Haushofer, Reisinger, & Shapiro (2019): “This result is intuitive: the wealth of one’s neighbors may plausibly affect one’s overall assessment of life, but have little effect on how many positive emotional experiences one encounters in everyday life. This result complements existing distinctions between these different facets of well-being, e.g. the finding that hedonic well-being has a “satiation point” in income, whereas evaluative well-being may not (Kahneman and Deaton, 2010).” This is reassuring for those of us who tend to think feelings ultimately matter more than cognitive evaluations.

Nevertheless, I’m not extremely confident in the net wellbeing impact of GiveDirectly.

• Non-trivial comparison effects are found in many other contexts, so it is perhaps reasonable to expect them here too. (I haven’t properly looked at that evidence so I’m not sure how strong my prior should be.)
• As with any metric, there are various potential biases in wellbeing measures that could lead to under- or over-estimation of effects. When assessing the actual effect on wellbeing/welfare/utility (rather than on the specific measures of wellbeing used in the study), we should consider the evidence in the context of other findings that I haven’t discussed here.
• Even a negative spillover with a very small effect size, which seems plausible in this case, could offset much or all of the positive impact. For instance, if recipient households gain 1 happiness point from the transfer, but every transfer causes 10 other households to lose 0.1 points for the same duration, the net effect is neutral.
• I have only summarised the relevant papers; I haven’t tried to critique them in detail. GiveWell has also not analysed the latest versions of some of the key studies, which differ considerably from the working papers, so they might uncover some issues that I haven’t spotted.

A few more notes on interpreting the wellbeing effects of GiveDirectly:

• As with other health and poverty interventions, I suspect the overall, long-run impact will be more sensitive to unmeasured and unmodeled indirect effects (e.g. consumption of factory-farmed meat, population size, CO2 emissions) than to methods for estimating welfare (e.g. SWB instruments vs consumption). But I’m leaving these broader issues with short-termist methodology aside for now.
• The mechanisms of any adverse wellbeing effects have not been established in this case, and may not be pure psychological ‘comparison effects’ (jealousy, reduced status, etc). For instance, they could be mediated through consumption (e.g. poorer households selling goods to richer ones) or through some other, perhaps culture-specific, process.
• Like any metric, SWB measures are imperfect. So even when SWB data are available, an assessment of the SWB effects of an intervention may be improved by taking into account information on other outcomes, plus ‘common sense’ reasoning.

In addition, I would note that the other income-boosting charities reviewed by GiveWell could potentially cause negative psychological spillovers. According to GiveWell’s model, the primary benefit of deworming and vitamin A supplementation is increased earnings later in life, yet no adjustment is made for any adverse effects this could have on other members of the community. As far as I can tell, the issue has not been discussed at all. Perhaps this is because these more ‘natural’ boosts to consumption are considered less likely to impinge on neighbours’ wellbeing than windfalls such as large cash transfers. But I’d like to see this justified using the available evidence.

I make some brief suggestions for improving assessment of psychological spillover effects in the “potential solutions” subsection below.

Appendix 2.1

Four studies investigated psychological impacts of GiveDirectly transfers. Two of these found wellbeing gains for cash recipients (“treatment effects”) and only null or positive psychological spillovers:

• Haushofer & Shapiro (2016) (9-month follow-up)
• 0.26 standard deviation (SD; p<0.01), positive, within-village treatment effect (i.e. comparing treated and untreated households in the same village) on an index of psychological wellbeing with 10 components (Table IV, p. 2011).
• Statistically significant benefits for (in decreasing order of magnitude) Depression, Stress, Life Satisfaction, and Happiness at the 1% level, and Worries at the 10% level. Null effects (at the 10% level) on Cortisol, Trust, Locus of Control, Optimism, and Self-esteem (though point estimates were mostly positive).
• Null, precise, within-village spillover effect on the index of psychological wellbeing; point estimate positive (0.1 SD; Table III, p. 2004).
• Egger et al. (2019) (the “general equilibrium” study)
• 0.09 SD (p<0.01) within-village treatment effect (i.e. assuming all spillovers are contained within a village) on a 4-item index of psychological wellbeing.
• Driven entirely by Life Satisfaction; no effect on Depression, Happiness, or Stress. (See this table, which the authors kindly sent to me on request.)
• 0.12 SD (p<0.1) “total” treatment effect (both within-village and across-village) on psychological wellbeing.
• Driven by Happiness (0.15 SD; p<0.05); no others significant at the 10% level. (See this table.)
• Null, fairly precise “total” spillover effect (combining within- and across-village effects) on the index of psychological wellbeing (and on every individual component); point estimate small and positive (0.08 SD). (See this table.)
• Note: GiveWell reports a positive, statistically significant within-village spillover effect on psychological wellbeing of about 0.1 SD, based on an earlier draft of the paper. I can’t find this in the published paper; perhaps it was cut because of the authors’ stated preference for the “total” specification.

However, two studies are more concerning:

• Haushofer & Shapiro (2018) (3-year follow-up; working paper)
• Within-village 0.16 SD (p<0.01) treatment effect on an 8-component index of psychological wellbeing (Table 3, p. 16).
• Driven primarily by improvements to Depression and Locus of Control (p<0.05), followed by Happiness and Life Satisfaction (p<0.1). No statistically significant (at the 10% level) change in Stress, Trust, Optimism, and Self-esteem. (Table B.7, p. 55)
• Null across-village treatment effect on psychological wellbeing (Table 5, p. 22).
• Approx. -0.2 SD (p<0.01) adverse psychological wellbeing spillover on untreated households in treated villages (Table 7, p. 26).
• Driven by Stress (p<0.01), Depression (p<0.05), Happiness (p<0.1), and Optimism (p<0.1). No statistically significant (at the 10% level) change in Life Satisfaction, Trust, Locus of control, or Self-esteem. (Table B.15, p. 63)
• Haushofer, Reisinger, & Shapiro (2019)
• A 1 SD increase in own wealth causes a 0.13 SD (p<0.01) increase in the psychological well-being index (p.13; Table 3, p. 27).
• At the average change in own wealth of eligible (thatched-roof) households of USD 354, this translates into a treatment effect of 0.09 SD.
• At the average transfer of $709 among treated households, this translates into a treatment effect of 0.18 SD. • Driven by Happiness and Stress (p<0.01) then Life Satisfaction and Depression (p<0.05). No statistically significant (at the 10% level) effect on Salivary Cortisol. (Table 5, p. 29) • A 1 SD increase in village mean wealth (i.e. neighbours in one’s own village having a larger average transfer size) causes a decrease of 0.06 SD in psychological well-being over a 15 month period, only significant at the 10% level (p. 14; Table 3, p. 27). • At the average cross-village change in neighbours’ wealth of$327, this translates into an effect of -0.2 SD.
• Driven entirely by Life Satisfaction (0.14 SD; p<0.01; p. 15; Table 5, p. 29)
• At a change in neighbours’ wealth of $327, this translates into a Life Satisfaction effect of -0.4 SD (which is much larger than the own-wealth benefit, but less precisely estimated). • Subgroup analysis 1: No statistically significant within-village difference between treated and untreated households in psychological wellbeing effects of a change in neighbours’ wealth. (This suggests that what matters is how much more your neighbours received, not whether you received any transfer.) • Subgroup analysis 2: No statistically significant within-village difference in the psychological wellbeing effect of a change in neighbours’ wealth between households below versus above the median wealth of their village at baseline. (This suggests poorer households did not suffer more adverse psychological spillovers than wealthier ones.) • Methodological variations: Broadly similar results using alternative measures of the change in village mean wealth. (See p. 17 and Tables A.9–A.14 for details.) • No effect of village-level inequality on psychological wellbeing (holding constant one’s own wealth) over any time period and using three alternative measures of inequality. Note: GiveWell’s review of an earlier version of the paper reports a “statistically significant negative effect on an index of psychological well-being that is larger than the short-term positive effect that the study finds for receiving a transfer, but the negative effect becomes smaller and non-statistically significant when including data from the full 15 months of follow-up… The authors interpret these results as implying that cash transfers have a negative effect on well-being that fades over time.” I’m not sure why the authors removed those analyses from the final version. Sleep: effective ways to improve it Is the CO2 accumulation entirely due to human (or I suppose animal) respiration? So it will typically be worse in small houses with lots of people (holding other factors, like ventilation, constant)? In a modern house, with no open fires, lead paint etc, what "household air pollution" might there be? Sleep: effective ways to improve it Thanks - this is useful and I will explore some of the suggestions. Is there much research comparing immediate vs extended release melatonin? E.g.: 1. Is IR better for speeding sleep onset, as one might expect? 2. Does XR actually improve sleep maintenance/duration more than IR? 3. Do they have the same effect on sleep efficiency? 4. Is the optimal dose the same for each? 5. Dose aside, do combined IR/XR supplements, or taking a bit of each, give you the 'best of both worlds'? EA-Aligned Impact Investing: Mind Ease Case Study [Edited on 19 Nov 2021: I removed links to my models and report, as I was asked to do so.] Just to clarify, our (Derek Foster's/Rethink Priorities') estimated Effect Size of ~0.01–0.02 DALYs averted per paying user assumes a counterfactual of no treatment for anxiety. It is misleading to estimate total DALYs averted without taking into account the proportion of users who would have sought other treatment, such as a different app, and the relative effectiveness of that treatment. In our Main Model, these inputs are named "Relative impact of Alternative App" and "Proportion of users who would have used Alternative App". The former is by default set at 1, because the other leading apps seem(ed) likely to be at least as effective as Mind Ease, though we didn't look at them in depth independently of Hauke. The second defaults to 0; I suppose this was to get an upper bound of effectiveness, and because of the absence of relevant data, though I don't recall my thought process at the time. (If it's set to 1, the counterfactual impact is of course 0.) Our summary, copied in a previous comment, also stresses that the estimate is per paying user. I don't remember exactly why, but our report says: Other elements of the MindEase evaluation (i.e. parts not done by Rethink Priorities) consider a “user” to be a paying user, i.e. someone who has downloaded the app and purchased a monthly or annual plan. For consistency, we will adopt the same definition. (Note that this is a very important assumption, as the average effect size and retention is likely to be many times smaller for those who merely download or install the app.) As far as I can tell (correct me if I'm wrong), your "Robust, uncertainty-adjusted DALYs averted per user" figure is essentially my theoretical upper-bound estimate with no adjustments for realistic counterfactuals. It seems likely (though I have no evidence as such) that: • Many users would otherwise use a different app. • Those apps are roughly as effective as MindEase. • The users who are least likely to use another app, such as people in developing countries who were given free access, are unlikely to be paying (and therefore perhaps less likely to regularly use/benefit from it) – not to mention issues with translation to different cultures/languages. So 0.02 DALYs averted per user seems to me like an extremely optimistic average effect size, based on the information we had around the middle of last year. EA-Aligned Impact Investing: Mind Ease Case Study [Edited on 19 Nov 2021: I was asked to remove the links.] For those who are interested, here is the write-up of my per-user impact estimate (which was based in part on statistical analyses by David Moss): [removed] The Main Model in Guesstimate is here: [removed] The Effect Size model, which feeds into the Main Model, is here: [removed] I was asked to compare it to GiveDirectly donations, so results are expressed as such. Here is the top-level summary: Our analysis suggests that, compared to doing nothing to relieve anxiety, MindEase causes about as much benefit per paying user as donating$40 (90% confidence interval: $10 to$140) to GiveDirectly. We suspect that other leading apps are similarly effective (perhaps more so), in which case most of the value of MindEase will come from reaching people who would not have accessed alternative treatment.

Due to time constraints and lack of high-quality information, the analysis involved a lot of guesswork and simplifying assumptions. Of the parameters included in our Main Model, the results are most sensitive to the effect sizes of both MindEase and GiveDirectly, the retention of those effects over time, and the choice of outcome metric (DALYs vs WELLBYs). One large, independent study could eliminate much of this uncertainty. Additional factors worth considering include indirect effects (e.g. economic productivity, meat consumption, evidence generation), opportunity costs of team members’ time, and robustness to non-utilitarian worldviews.

Note that this was done around June 2020 so there may be better information on MindEase's effectiveness by now. Also, I think the Happier Lives Institute has since done a more thorough analysis of the wellbeing impact of GiveDirectly, which could potentially be used to update the estimate.

Health and happiness research topics—Part 1: Background on QALYs and DALYs

Hi Sam,

1. Have you done much stakeholder engagement? No. I discuss this a little bit in this section of Part 2, but I basically just suggest that people look into this and come up with a strategy before spending a huge amount of time on the research. I do know of academics who would may be able to advise on this, e.g. people who have developed previous metrics in consultation with NICE etc, but they’re busy and I suspect they wouldn’t want to invest a lot of time into efforts outside academia.

I think they’d reject the assumption that they are “not improving these metrics” and would point to considerable quantities of research in this area. The main issue, I think, is that they want a different kind of metric that what I’m proposing, e.g. they think it’s important that they are based on public preferences and are focused on health rather than wellbeing. A lot of resources are going into what I see (perhaps unfairly) as “tinkering around the edges,” e.g. testing variations of the time tradeoff/DCE and different versions of the EQ-5D, rather than addressing the fundamental problems.

As I say in Part 3 with respect to the sHALY (SWB-based HALY):

In my view, the strongest reason not to do this project is the apparent lack of interest among key stakeholders. Clinicians, patients, and major HALY “consumers” such as NICE and IHME seem strongly opposed to a pure SWB measure, even if focused on dimensions of health, and to the use of patient-reported values more broadly. As discussed in previous posts, this is due to a combination of normative concerns, such as the belief that those who pay for healthcare have the right to determine its distribution or that disability has disvalue beyond its effect on wellbeing, and doubts about the practicality of SWB measures in these domains.

So this project may only be worth considering if the sHALY would be useful for non-governmental purposes (e.g., within effective altruism), or in “supplementary” analyses alongside more standard methods (e.g., to highlight how QALYs neglect mental health). Either that, or changing the minds of large numbers of influential stakeholders will have to be a major part of the project—which may not be entirely unrealistic, given the increasing prominence of wellbeing in the public sector. We should also consider the possibility that projects such as this, which offer a viable alternative to the status quo, would themselves help to shift opinion.

That said, there is increasing increasing interest in hybrid health/wellbeing measures like the E-QALY, and scope for incremental improvement of current HALYs (see Part 2), and in the use of wellbeing for cross-sector prioritisation. In at least the latter case, you are likely to know more than me about how to effect policy change within governments.

2. Problem 4 - neglect of spillover affects – probably cannot be solved by changing the metric.  I discuss spillovers a little in Part 2 and plan to have a separate post on it in Part 6 (but it might be a while before that’s out, and it’s likely to focus on raising questions rather than providing solutions). I’m still unsure what to do about them and would like to see more research on this. I agree changing the metric alone won’t solve the issue, but it may help—knowing the extent to which the metric captures spillovers seems like an important starting point.

3. Who would you recommend to fund if I want to see more work like this? It probably depends what your aims are. If it’s to influence NICE, IHME, etc, it probably has to go via academia or those institutions. If you want to develop a metric for use in EA, funding individual EAs or EA orgs may work—but even then, it’s probably wise to work closely with relevant academics to avoid reinventing the wheel. So I guess if you have a lot of money to throw at this, funding academics or PhD students may be a good bet; there is already some funding available (I’m applying for PhD scholarships in this area at the moment), but it may be hard to get funding for ideas that depart radically from existing approaches. I list some relevant institutions and individuals in Part 2.

4. How is the E-QALY project going? It got very delayed due to COVID-19. I’m not sure what the new timeline is.

Health and happiness research topics—Part 1: Background on QALYs and DALYs

I've made a few edits to address some of these issues, e.g.:

Clearly, there are many possible “wellbeing approaches” to economic evaluation and population health summary, defined both by the unit of value (hedonic states, preferences, objective lists, SWB) and by how they aggregate those units when calculating total value. Indeed, welfarism can be understood as a specific form of desire theory combined with a maximising principle (i.e., simple additive aggregation); and extra-welfarism, in some forms, is just an objective list theory plus equity (i.e., non-additive aggregation).

However, it seems that most advocates for the use of wellbeing in healthcare reject the narrow welfarist conception of utility, while retaining fairly standard, utility-maximising CEA methods—perhaps with some post-hoc adjustments to address particularly pressing distributional issues. So it seems reasonable to consider it a distinct (albeit heterogenous) perspective.

For the purpose of exposition, I will assume that the objective is to maximise total SWB (remaining agnostic between affect, evaluations, or some combination). This is not because I am confident it’s the right goal; in fact, I think healthcare decision-making should probably, at least in public institutions, give some weight to other conceptions of wellbeing, and perhaps to distributional concerns such as fairness. One reason to do so is normative uncertainty—we can’t be sure that the quasi-utilitarianism implied by that approach is correct—but it’s also a pragmatic response to the diversity of opinions among stakeholders and the challenges of obtaining good SWB measurements, as discussed in later posts.

However, I am fairly confident that SWB-maximization—or indeed any sensible wellbeing-focused strategy—would be an improvement over current practice, so it seems like a reasonable foundation on which to build. Moreover, most of these criticisms should hold considerable force from a welfarist, extra-welfarist, or simply “common sense” perspective. One certainly does not have to be a die-hard utilitarian to appreciate that reform is needed.

Changed the first two problem headings to avoid ambiguity and, in the first case, to focus on the result of the problem rather than the cause, which helps distinguish it from 5.