Samuel Dupret

1097Joined Nov 2021

Posts
10

Sorted by New

Comments
18

Thank you for the feedback! We are keen on that feature too, distributing credences between the views is the next step I am working on.

As my colleagues have mentioned in their responses (Michael's general response, Joel's technical response), the WELLBYs per $1000 that GiveWell put forward for AMF are dependent on philosophical choices about the badness of death and the neutral point. There are a range of plausible possible choices and these can affect the results. HLI does not hold a view.

We've whipped up an R Shiny app so that you, the reader, can play around with these choices and see how your views affect the comparison between StrongMinds and AMF.

Please note that this is a work in progress and was done very quickly. Also, I'm using the free plan for hosting the app so it might be a bit slow/limited in monthly bandwidth.

Hi Nick,

Thanks for pointing out both kinds of biases. These biases can cause a failure of comparability. Concretely, if an intervention causes you to give counterfactually higher scores as a matter of ‘courtesy’ to the researcher, then the intervention changed the meaning of each given response category.

I therefore take it that you don’t think that our particular tests of comparability will cover the two biases you mention. If so, I agree. However, my colleague has given reasons for why we might not be as worried about these sorts of biases.

I don’t think this can be tested in our current survey format, but it might be testable in a different design. We are open to suggestions!

Hello Henry,

Thank you for presenting this thought experiment. 

The core here is about whether groups like the Sentinelese who do not have the same levels of development as others would give similar levels of SWB. I think the other comments here have done a great job at pointing out possible explanations.

  • if Sentinelese have => wellbeing
    • maybe their lifestyle works really well for their wellbeing (as Charlie mentions, we might not want to be too quick to dismiss this possibility). It would be a cool area to research.
    • maybe there are issues of interpersonal comparability in scale use, which is what we are exploring with our pilot.
    • maybe they have access to factors (e.g., social ties) that improve wellbeing without being affected as much by reference frames (e.g., the benefit I get from a higher income is relative to other people)  or hedonic adaptation (as pointed out by Alex).
  • if Sentinelese have < wellbeing
    • As Nick pointed out, countries with lower levels of development tend to have lower levels of LS. [edit: adding this figure here because it represents this well and supports the validity of SWB scales]
    • It might be well worth the investment (especially as Vasco pointed out, it can increase the number of humans and life expectancy)
    • It is possible that whilst our development reached a point where we have higher wellbeing, it did so through inefficient periods and our task now is to think more carefully about how to do (and whether there are adverse consequences like climate change, x-risks, etc.)

 

Some briefs answers / pointers. Many of these things have been discussed in more details elsewhere.

  • The estimate for grief is shallow [edit: I want to make this point a bit stronger - this was a quick estimate and it is a tad unfair to compare it to the SM estimate which represents hundreds of hours of work and meta-analyses] but important in making the difficult work of live-saving vs life-improving accurate. You can see some discussion about it from my colleague. I, personally, wouldn't be too surprised if we found a higher estimate in the future but there is some reasoning as to why this might not be as big.
  • I don't think there are 'objective' measures of health widely used. DALYs and QALYs are not objective, they rely on humans reports - and contrary to SWB, these are reports about things humans are know to struggle to report on. In DALYs it is people (who do not have the conditions!) making binary judgements about which conditions are more or less healthy. In QALYs it generally involves people forecasting how their life will be in the future. See To WELLBY or not to WELLBY where we discuss the strengths and weaknesses of SWB. 
  • health, wealth, education, are all instrumental. Why do we want these? Because they contribute to what we believe is ultimately good for humans. Most charity evaluators give there answer in some form of wellbeing / good, either by measuring SWB as directly as possible (HLI) or converting lives and income into the moral weights of the evaluators (GW).
  • There is extensive debate about the relationship between SWB and growth through the lens of the Easterlin Paradox

To caveat this:
If we have higher wellbeing, and they are trying to maximise wellbeing, then all these things align.
But people might not be maximising wellbeing or might not be good at it. 

Hi Nick, A quick comment to thank you for engaging with our work and for your insights. This is super interesting.

Arthritis - same as treating most other pain - large amounts of paractamol, ibuprofen (and other nsaids) and diclofenac gel is what we do for arthritis.

This suggests that this could be really cost-effective, considering the price of NSAIDs! However, wouldn't issues of side effects also occur here? Or is this less of an issue because the gains would be higher?

Thank you for your engagement with our research. Our research team is going on winter break so we might not respond to additional comments until after the holidays.  

The point is to form a view on whether we recommend deworming. What matters to us are effects on subjective wellbeing. We would like to restate that this is the only SWB data on deworming that we could find, and this the only SWB analysis that we know of. The effects are non-significant, so this data does not give us good grounds to recommend deworming (see Section 4). The broader literature on outcomes other than SWB (reviewed in Section 1.3) is so mixed that our results do not strike us as particularly surprising – it’s just one null result amongst many.

We could just stop there, but because this is a novel analysis and dealing with non-significant findings isn’t easy, we aired on the side of being overly-thorough. For completeness, we discuss many other considerations in Section 2.3 (alternative analyses, Bayes factors, cost-effectiveness, converting GiveWell’s analysis to WELLBYs, etc. - cost-effectiveness being one of them). These converging lines of evidence all fail to provide good grounds to recommend deworming. Of course, non-significant findings do not prove there is no effect, and we discuss limitations of the data in section 5 that highlight key uncertainties about the results. In section 6, we discuss future research that could help address these limitations.   

If strong evidence is produced showing that deworming is more cost-effective than StrongMinds for SWB, then we would change our recommendation. Collecting this data could be expensive, but given the insufficient evidence we think it is necessary for proponents to clarify the effect of deworming on SWB (which we think is what ultimately matters).

Have a nice end of the year and keep doing good folks.

Hi Dan,

Our main conclusion is that these data don’t demonstrate there is an effect of deworming, as all the point estimates are all non-significant (see further discussion in Section 2.3).

We conducted the cost-benefit analysis as an exercise to see what the effects look like. We took the trend in the data at face value because the existing literature is so mixed and doesn’t provide a strong prior.

Hi Nick, 
Thank you for your comment.

About your major point first.
If it was up to us (we didn’t collect this data), we would use a nicer 0-10 scale. However, this is the only SWB data we are aware of. There are other measures of wellbeing in the data (including a 1-10 scale, some 1-6 frequency scales, and some binary scales) but the 3-point scale is the only measure that was collected across all three KLPS rounds. None of the other measures are significant. Some are negative, some are positive. In Appendix A3.1 we conduct an analysis where we use the effect sizes of all the other measures, and we obtain very similar results, which gives us more confidence about this measure.

Imagine that deworming really does increase 1000 recipient's happiness by 2% each. This won't tip them over the line from "happy" to "very happy" so people will report the same level of happiness

I’m not sure this is completely true. Some people will be tipped over the line; namely, all the people that are 2% away from changing between categories (the people who answer ‘not happy’ but will be close to answering ‘happy’).  

Where data is as bad as this, it's better to say that there is not enough meaningful data to draw a conclusion, rather than saying "with the best evidence we have (which is bad), conclusion is X". It's better to not use bad evidence at all, and say there is no meaningful evidence available, than to try and draw weak conclusions from it like you do here.  

Both the literature (that does not contain SWB data; Section 1.3) and this SWB data (the only we could find) do not give us good grounds to recommend deworming. For completeness, we go through many considerations that we present in Section 2.3 (alternative analyses, Bayes factors, cost-effectiveness, converting GiveWell’s analysis to WELLBYs, etc.). As we note in the report, this evidence does not prove the effect is zero, but these converging lines of evidence support the conclusion that the effect is zero or very small.  If strong SWB evidence that deworming is more cost-effective than StrongMinds is produced, then we would change our minds. Yes, collecting this data will be expensive, but we’d prefer some of the money going to deworming serves to nail down the effect of deworming (on SWB because we think this is what ultimately matters).

On the bigger WELLBY picture, the more  I think about it, I think the Happier Lives institute should prioritise funding a WELLBY measured RCT of StrongMinds vs.  cash transfers A.S.A.P performed by an independent organisation, to answer 2 questions.

1. If StrongMinds really are superior to cash on the WEllBY front?

2. If before and after wellbeing scores can be valid, and not spoiled by the effects I've discussed earlier

Just to clarify for people reading this (I know you are not saying this, it is just that I know that people too often misunderstand this): StrongMinds gives depression treatment for people in LMIC (who happen to also be poor). GiveDirectly transfers cash to people in poverty (some of which might be depressed). If you want to increase WELLBYs, StrongMinds will produce more WELLBYs per dollar than GiveDirectly. We are not suggesting that psychotherapy treats poverty nor that psychotherapy should be given to non-depressed poor people. The reasons why psychotherapy can help in LMICs are (1) the lower costs and (2) the counterfactual that there are many people there who need help with mental health and that LMICs rarely have any infrastructure to help (and sometimes, when they do have some, it involves actively hurting people - like chaining people up). There is a bidirectional relationship between poverty and mental health that is complex and fascinating (Ridley et al., 2020).

With that out of the way (sorry, I know this wasn’t your point), the RCT you propose would be testing whether giving the StrongMinds sessions helps their depressed patients more than giving them the cash equivalent of that session. This is, admittedly, very interesting and can provide extra data about the effect of cash on people with depression and using a very special kind of active control. We do not have the funding to do something like that for now. Additionally, I think there might be other areas that are more in need of investigation (e.g., more research on household spillovers). Resources permitting we would be interested in conducting more RCTs with partners

 

Thank you for engaging with our work, I hope our answers help.

In brief, we used Harrer et al.’s (2021) R library and their function for the power of a meta-analysis. With this, we computed the power we had to detect an initial effect (the intercept) of .08 SDs which was 97.88%.  This uses the average sample sizes and the number of effect sizes.


We selected .08 as our smallest effect size of interest based on the results from our ‘optimistic’ model (see the Appendix A1)—which assumes the effect of deworming cannot be negative (not a model we endorse). The initial effect was .04 and the total cost-effectiveness was half that of StrongMinds. So, the effect would need to be twice as large to equal the cost-effectiveness of StrongMinds. 

Load more