Sure, happy to elaborate.
Here's figure 4 for reference:
I think each part of this chart has some assumptions I don't think are defensible.
1. I don't think a neutral point higher than 2 is defensible.
You cite three studies in this report. My read on what to conclude about the neutral point from those is:
i) IDinsight 2019 (n=70; representative of GW recipients): you highlight the average answer of 0.56, but this is excluding the 1/3 of people who say it's not possible to have a life worse than death. I think including those as 0 more accurately reflects their preferences, so 2/3*0.56=0.37/10
ii) Peasgood et al, unpublished (n=75; UK): you say 2/10 and I can't find the study so I’m taking that at face value.
iii) Jamison and Shukla, unpublished (n=600, US, Brazil, China): you highlight the average answer of 25/100. In private communication with the author, I got the impression that 1.8/10 was probably more appropriate because the scale used in this study isn’t comparable to typical life satisfaction scales.
So what to make of this? I think you could reasonably put weight in the largest study (1.8/10). Or you could put weight on the most representative study (0.37). I lean towards the latter, because I intuitively find it quite likely that less well off people will report lower neutral points (I don't feel certain about this, and hoping Jamison & Shukla will have enough sample to test it). But either way, I don't see any way of combining these studies to get an answer higher than 2.
In addition, a neutral point of 5 implies the average person in over 40 countries would be better off dead. A neutral point of 2.5 implies the average person in Afghanistan would be better off dead. I find these both jarring implications.
HLI’s belief that a neutral point of 5 is within a reasonable range seems to come from Diener et al. 2018. But that article's not explicit about what it means by "neutral point". As far as I can tell from a quick skim, it seems to be defining "neutral" as halfway between 0 and 10.
2. I don't think 38% is a defensible estimate for spillovers, which puts me closer to GiveWell's estimate of StrongMinds than HLI's estimate of StrongMinds.
I wrote this critique of your estimate that household spillovers was 52%. That critique had three parts. The third part was an error, which you corrected and brought the answer down to 38%. But I think the first two are actually more important: you're deriving a general household spillover effect from studies specifically designed to help household members, which would lead to an overestimate.
I thought you agreed with that from your response here, so I'm confused as to why you’re still defending 38%. Flagging that I'm not saying the studies themselves are weak (though it's true that they're not very highly powered). I'm saying they're estimating a different thing from what you're trying to estimate, and there are good reasons to think the thing they're trying to estimate is higher. So I think your estimate should be lower.
3. I don't think strong epicureanism is a defensible position
Strong epicureanism (the red line) is the view that death isn't bad for the person who dies. I think it's logically possible to hold this position as a thought experiment in a philosophy seminar, but I've never met anyone who actually believes it and I'd be deeply troubled if decisionmakers took action on the basis of it. You seem to agree to some extent, but by elevating it to this chart, and putting it alongside the claim that "Against Malaria Foundation is less cost-effective than StrongMinds under almost all assumptions" I think you're implying this is a reasonable position to take action on, and I don’t think it is.
So I think my version of this chart looks quite different: the x-axis is between 0.4 and 2, the StrongMinds estimate’s quite a bit closer to GiveWell than HLI, and there’s no “epicureanism” line.
What does HLI actually believe?
More broadly, I'm quite confused about how strongly HLI is recommending StrongMinds. In this post, you say (emphasis mine)
We conclude that the Against Malaria Foundation is less cost-effective than StrongMinds under almost all assumptions. We expect this conclusion will similarly apply to the other life-extending charities recommended by GiveWell.
We’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money.
But you've said elsewhere:
HLI does not (yet) take a stance on these different philosophical views.
That strikes me as inconsistent. You've defined a range of assumptions you believe are reasonable, then claimed that StrongMinds > AMF on almost all of those assumptions. And then said you don't take a stance on these assumptions. But you have to actually defend the range of assumptions you've defined as reasonable. And in my view, they're not.
"Empirical work on how individuals interpret the scale could be helpful but is extremely limited. A small (n = 75) survey in the UK found that respondents would choose death over life at a life satisfaction level of about 2/10 (Peasgood et al., unpublished, as referenced in Krekel & Frijters, 2021). A survey of people living in poverty in Ghana and Kenya estimated the neutral point as 0.56 (IDinsight, 2019, p. 92; n = 70). There are also preliminary results from a sample of 600 in the USA, Brazil, and China that finds a neutral point of 25/100 (Jamison & Shukla, private communication). At the Happier Lives Institute, we are currently working on our own survey to explore this topic further and hope to share our results soon." Elephant in the bednet
"Approximately one third of respondents stated that it’s not possible to have a life that’s worse than death. These respondents cited deontological frameworks such as the inherent and immeasurable value of life regardless of other factors. The remaining respondents (close to two thirds) indicate that there are points on the ladder where life is worse than death. For these respondents, this point is substantially lower than their current life satisfaction scores –the average point identified was 0.56 on a ladder from 0 to 10, compared to their current average life satisfaction score of 2.21" IDInsight 2019, pg 94
I'm not sharing the full reasoning because it was private correspondence and I haven't asked the authors if they'd be comfortable with me sharing.
"Other wellbeing researchers, such as Diener et al. (2018), appear to treat the midway point on the scale as the neutral point (i.e., 5 on a 0-10 scale)." Elephant in the bednet
"Although what we might call strong Epicureanism, the view that death is not bad at all, has few takers, there may be more sympathy for weak Epicureanism, where death can be bad, but relatively more weight is given to living well than living long" Elephant in the bednet
Thanks Jason, makes sense.
I think I’m more skeptical than you that reasonable alternative assumptions make StrongMinds look more cost effective than AMF. But I agree that StrongMinds seems like it could be a good fit for some donors.
FWIW I don't think GiveDirectly should be "the bar" for being considered one of the most effective organizations in the global health and development space.
I think both 5x and 10x differences are big and meaningful in this domain, and I think there are likely billions of dollars in funding gaps between GiveWell's bar (~10x) and GiveDirectly. I think donors motivated by EA principles would be making a mistake, and leaving a lot of value on the table by donating to GiveDirectly or StrongMinds over GiveWell's recommendations (I say this as someone who's donated to both StrongMinds and GiveDirectly in the past, and hugely respects the work they both do).
Recognize this might be a difference in what we mean by "one of" the most effective, but wanted to comment because this sentiment feeds into a general worry I have that a desire for pluralism and positivity within GH&D (both good and important things!) is eroding intensity about prioritization (more important IMO).
I felt happy reading the nice things your colleagues are saying about you Max, all of which ring true to me. I admire your humility, thoughtfulness and level-headedness, and I'm looking forward to seeing what you get up to next!
Thanks, this looks like a helpful report!
It looks like this estimate comes from the proportion of countries the Bloomberg consortium and World Bank worked in that passed various policies over a decade without adjusting for the counterfactual chance of policy changes without their work.
I’m curious if CE had any luck trying to estimate the counterfactual (Eg by looking at other countries, trends before BB, or diving deep on individual case studies)?
Fwiw when I looked at this a few years ago (at GiveWell, not OP) I couldn’t find any evidence of a difference in policy change by comparing countries BB worked in more vs less.
I don’t think this is good evidence against impact (policy spillovers, selection, BB’s global work, and no data before BB make it difficult to do this comparison) but it made me think unfortunately we can’t learn much about the counterfactual from the high level quantitative comparison and made me want to fall back on going deep on qualitative case studies instead.
Interesting thoughts Joel. Is the analysis in (9) public / could you point me towards it?
(I work at open phil but only made a tiny contribution to this report; I’m just curious)
(I work at Open Phil on Effective Altruism Community Building: Global Health and Wellbeing)
Our understanding is that only a small proportion of FTXFF’s grantees would be properly classified as global health or animal welfare. Among that subset, there are some grantees who we think might be a good fit for our current focus areas and strategies. We’ve reached out individually to grantees we know of who fit that description
That being said, it’s possible we’ve missed potential grantees, or work that might contribute across multiple cause areas. If you think that might apply to your project, you can apply through the same form.
>people inflate their self-reports scores generally when they are being given treatment?
Yup, that's what I meant.
>Is there one or more studies you can point me to so I can read up on this, or is this a hypothetical concern?
I'm afraid I don't know this literature on blinding very well but a couple of pointers:
(i) StrongMinds notes "social desirability bias" as a major limitation of their Phase Two impact evaluation, and suggest collecting objective measures to supplement their analysis:
"Develop the means to negate this bias, either by determining a corrective percentage factor to apply or using some other innovative means, such as utilizing saliva cortisol stress testing. By testing the stress levels of depressed participants (proxy for depression), StrongMinds could theoretically determine whether they are being truthful when they indicate in their PHQ-9 responses that they are not depressed." https://strongminds.org/wp-content/uploads/2013/07/StrongMinds-Phase-Two-Impact-Evaluation-Report-July-2015-FINAL.pdf
(ii) GiveWell's discussion of the difference between blinded and non-blinded trials on water quality interventions when outcomes were self-reported [I work for GiveWell but didn't have any role in that work and everything I post on this forum is in a personal capacity unless otherwise noted]
May be best to just chat about this in person but I'll try to put it another way.
Say a single RCT of a cash transfer program in a particular region of Kenya doubled people's consumption for a year, but had no apparent effect on life satisfaction. What should we believe about the likely effect of a future cash transfer program on life satisfaction? (taking it as an assumption for the moment that the wider evidence suggests that increases in consumption generally lead to increases in life satisfaction).
Possibility 1: there's something about cash transfer programs which mean they don't increase life satisfaction as much as other ways to increase consumption.
Possibility 2: this result was a fluke of context; there was something about that region at that time which meant increases in consumption didn't translate to increases in reported life satisfaction, and we wouldn't expect that to be true elsewhere (given the wider evidence base).
If Possibility 2 is true, then it would be more accurate to predict the effect of a future cash transfer program on life satisfaction by using the RCT effect of cash on consumption, and then extrapolating from the wider evidence base to the likely effect on life satisfaction. If possibility 1 is true, then we should simply take the measured effect from the RCT on life satisfaction as our prediction.
One way of distinguishing between possibility 1 and possibility 2 would be to look at the inter-study variance in the effects of similar programs on life satisfaction. If there's high variance, that should update us to possibility 2. If there's low variance, that should update us to possibility 1.
I haven't seen this problem discussed before (although I haven't looked very hard). It seems interesting and important to me.
Thanks Jason, mostly agree with paras 4-5, and think para 2 is a good point as well.
I think the basic philosophical perspective is a moral/philosophical judgement. But the neutral point combines that moral judgement with empirical models of what peoples' lives are actually like, and empirical beliefs about how people respond to surveys.
I wonder if, insofar as we do have different perspectives on this (and I don't think we're particularly far apart, particularly on the object level question), the crux is around how much weight to put in individual donor judgement? Or even how much individual donors have those judgements?
My experience of even EA-minded (or at least GiveWell) donors is that ~none of them have a position on these kinds of questions, and they actively want to defer. My (less confident but based on quite a few conversations) model of EA-minded StrongMinds donors is they want to give to mental health and see an EA-approved charity so give there, rather than because of a quantitative belief on foundational questions like the neutral point. As an aside, I believe that was how StrongMinds first got on EA's radar - as a recommendation for Founders Pledge donors who specifically wanted to give to mental health in an evidence-based way.
It does seem plausible to me that donors who follow HLI recommendations (who I expect are particularly philosophically minded) would be more willing to change their decisions based on these kinds of questions than donors I've talked to.
I'd be interested if someone wanted to stick up for a neutral point of 3 as something they actually believe and a crux for where they give, rather than something someone could believe, or is plausible. I could be wrong, but I'm starting out skeptical that belief would survive contact with "But that implies the world would be better if everyone in Afghanistan died" and "a representative survey of people whose deaths you'd be preventing think their lives are more valuable than that"
What do you think?