Samuel Dupret

687Joined Nov 2021

Comments
7

Hello Blake, There's lots of questions to ask and answer here, but I will give some brief pointers on one aspect I think is important:

How do you think I should judge success? QALYs? How would I even judge QALY improvement?… In the end, I want to be able to say $5k to a family of 5 reduces suffering by a factor of X. I hope to compare these measures to EA giving opportunities.

This is a question about how to define and measure the 'good' that an action will do. Different charity and intervention evaluators have recently presented how they do so, here on the forum.

At the Happier Lives Institute - where I am a research analyst - we use subjective wellbeing / WELLBYs. This will capture 'good' better and broader than QALYs or other measures. See here for our case for the WELLBY. We've reviewed AMF, deworming, GiveDirectly, and StrongMinds using subjective wellbeing (see our 2022 recommendations here). We've even considered household spillovers for cash transfers and psychotherapy.

I think this measure, or at the very least ours and other works in subjective wellbeing, will be very relevant. I'm happy to chat more about this.

Hi Jack,

And read this as you planning to continue evaluating everything in WELLBYs, which in turn I thought meant ruling out evaluating research - because it isn't clear to me how you evaluate something like psychedelics research using WELLBYs.

I'll try and give a general answer to your concern. I hope this helps?

Whilst there might be some aspects of research that can be evaluated without looking at WELLBYs (e.g., how costly is psychedelics treatment), the core point is still that wellbeing is what matters. More research will tell us that something is worth it if it does 'good'; namely, it increases WELLBYs (cost-effectively).

We hope to obtain wellbeing data (life satisfaction, affect, etc.) for each area we evaluate. If it is lacking, then we hope to encourage more data collection and / or to evaluate pathways (e.g., if psychedelics affect variable X, do we know the relationship between X and wellbeing).

Hello Karthik. Thank you for your comment. Apologies, it seems that we missed your comment at the time of posting so we’re providing our responses now.

I worry that they over-weight the immediate effects of interventions, and underweight the long-term effects.

This is not an issue with the measures, but rather how much data we can collect for them.

The problem is that SWB data have to be collected with a much higher frequency than income/health data. By their nature, SWB data are reliable when reporting on current state: all the studies of SWB validity I've seen are showing the validity when people introspect on their state of life now, not their state of life a year ago. I think it's very likely that people recalling SWB in the past would be highly biased by their current SWB. In contrast, income/health are more objective for people to recall, and they can also be collected from administrative data. So I don't think WELLBYs in practice could adequately measure effects with primarily long-term benefits and little to no short-term benefits.

If you measure someone’s life satisfaction at point t, it is just like measuring someone’s income at point t, in that both are at a single point in time. If you want to analyse the effects of an intervention overtime, it doesn’t matter if it is income or life satisfaction, you need to measure the effect across time.

The advantage of income is that you’re more likely to have written records of it (bank statements, etc.) compared to reports of your subjective wellbeing. However, if a researcher didn’t record income/health/etc. (e.g. they failed to record it at a certain point in their intervention), then they have the same issue in that they would have to rely on people’s memory (for past information) or predictions (for future information).

Health outcomes are not ‘objective’ when it comes to measures of quality of health. You can remember having a disease and use the DALY score for said disease, but then you rely on the survey of people that were asked (without having the disease themselves) how ‘healthy’ or not it is to have that disease. Note that this is potentially ‘easy’ to recall not so much because of ‘objectivity’ but likely because of ‘granularity of detail’: the disease is likely a ‘binary’ state - you have covid or you don’t - and not a numerical score out of 10. Either way, this question about memory is the realm of empirical psychological work, and my point is that even if it is easier to recall it is still not a great measure.

Countries like the UK collect wellbeing measures as part of their administrative data.

Just so I answer your examples, quoted below. The general answer is “we need to measure the outcome in the long run”.

Alice is a teen targeted by an education intervention that increases her test scores dramatically but also requires her to put in more effort. Alice likes getting good grades, but it's a very small part of her subjective wellbeing as a teenager, and it's also offset by the annoyance of having to spend more time on schoolwork, so she reports essentially the same SWB on her survey. Did the education intervention have zero value?

The education intervention might lead to better wellbeing in the future and wellbeing measures would capture all the potential impacts of the intervention. If you collect income or health at this very moment, you also get no difference. Why is increasing test scores good? Because it increases x or y later. Why is x or y good? Ultimately, because it increases wellbeing.

Bob is a farm laborer who gets a free bus ticket to migrate to the city and work there. He earns higher income in the city and sends much of it back to his family. But being alone in the city is lonely and difficult. He is happy that he can provide for his family, but they are far away, and the difficulty of being a migrant is much more salient to him on any given day. He reports a reduced SWB on the survey. Was migration a harmful intervention?

You would need to measure the effect on the SWB of the family and take everything into account. Just because the intervention increased income (but potentially affected social relationships) does not mean it was a good intervention.

Chris lives in a generally polluted city. He dislikes pollution, but it's usually not so bad that he notices it very saliently on a day-to-day basis. Unbeknownst to him, an air-quality intervention reduces pollution by 10%, reducing his risk of respiratory disease over twenty years. But he wasn't aware of it, or even if he was, he wasn't thinking about risks twenty years from now, so he reports the same SWB as before. Did the air-pollution intervention have zero value?

Counterfactually, 20 years from now he would rate his SWB higher. Same with income or health, the effect only occurs 20 years from now (in this scenario). With health measures one could use previous data and make a prediction that “respiratory diseases cause X DALYs”. But here we could also look at data that relates SWB and respiratory diseases and see that “respiratory diseases decrease life satisfaction by X”. Same principle with income.

A very interesting read. I didn't know much about BSD.

I was wondering if you had a spreadsheet or something equivalent where you put all the numbers you mention together? Prevalence, loss due to misdiagnoses, false positives, costs, etc. It could help with understanding the predicted cost-effectiveness of the different actions you mention. Apologies if I missed it in the core text.

Thank you for your comment!

Indeed, we did take the average of the logs instead of the log of the averages. This doesn’t change the end and start point, so it wouldn’t change the overall decay rate we estimate. We could do more complex modelling where effects between KLPS2 and KLPS3 see small growth and KLPS3 and KLPS4 see large decay. I think this shows that the overall results are sensitive to how we model effect across time.

See Figure 4 of the appendix, which shows, whether in earnings or in consumption, that the relative gains, as shown by the log difference, decrease over time.

We used the pooled data because it is what GiveWell does. In the appendix we note that the consumption and earnings data look different. So, perhaps a more principle way would be to look at the decay within earnings and within consumption. The decay within earnings (84%) and the decay within consumption (81%) are both stronger (i.e., would lead to smaller effects) than the 88% pooled decay.

Thank you for sharing this and those links. It would be useful to build a quantitative and qualitative summary of how and when early interventions in childhood lead to long-term gains. You can have a positive effect later in life and still have decay (or growth, or constant, or a mix). In our case, we are particularly interested in terms of subjective wellbeing rather than income alone.

One (small) reason one might start with a larger prior on the constant effects model is to favor simplicity

I am a bit rusty on Bayesian model comparison, but - translating from my frequentist knowledge - I think the question isn’t so much whether the model is simpler or not, but how much error adding a parameter reduce? Decay probably seems to fit the data better.

Hello, 
I am a research analyst at the Happier Lives Institute where we do cost-effective analyses (CEAs) of charities and interventions such as StrongMinds and GiveDirectly.

I am not aware of a document that collects all of the groups who do CEAs and compares and contrasts their methods. I would be quite interested in seeing one. The creation of such a document could look like a table with evaluating groups (GiveWell, OpenPhil, SoGive, Founders Pledge, Charity Entrepreneurship, Happier Lives Institute, etc.)  as rows and important methodological questions as columns. This could be useful as it allows evaluators to learn from each other's methodology and allows people to see the different assumptions under which different evaluators are operating.

This raises the question, what are the important elements/questions/assumptions to consider? I will give my list based on thinking and methods at the Happier Lives Institute.

In a CEA, first you need to think about the effect. In EA, we are concerned with doing (the most) good. This leads to the first big question, what is ‘good’? A lot of people might agree that income or health are maybe not intrinsic goods that we want to increase, but instead that these are good for people, they are instrumental. Instead, an intrinsic good (which is good in of itself) would be wellbeing. There are different theories of wellbeing, and I am not going to get into them here, but you can read some summaries here and here.

Say that, like the Happier Lives Institute, we think wellbeing is the good we want to maximise and that we take a subjective wellbeing interpretation of this: what is good are the mental states of people, their evaluations of their lives and the positive and (lack of) negative feelings they experience. Great, we have defined the good/the effect. Note that you don’t have to 100% agree with an evaluator for their evaluation to be useful, you might value elements outside of human wellbeing and still find someone’s evaluation in terms of wellbeing very useful for your prioritisation about charities.

Next step, how do we measure this good/effect? This isn’t trivial as we can agree on the good we want and measure it in very different ways. For example, we - as evaluators - could make a list of things we believe improve people’s subjective wellbeing (e.g., income and health) and make inferences about how much they improve wellbeing (e.g., for every 1 DALY prevented, X units of wellbeing are gained). Alternatively, and this is the method we use at the Happier Lives Institute, we could use the most direct measures of subjective wellbeing: Answers to questions about how happy people are, how satisfied with life people are, how many positive or negative feelings people have been experiencing.

Once we know what we are measuring and how, there are a bunch of methodological questions that can be asked, at different levels of precision. For example, what sort of data do people value? Only RCTs, or any sort of empirical data? If we combine multiple studies, what kind of technique do you use (what kind of meta-analysis)? Do you use predictions from experts? Do you combine the priors (subjective beliefs) of evaluators - which might or might not be informed by empirical data - with the data? Should evaluators give subjective estimates for variables there isn’t any data for? Should we make subjective discounts concerning the quality of the data? [At the Happier Lives Institute we are into meta-analyses of multiple studies, if possible RCTs, and we insert some reasoning about causal mechanisms but we try to avoid subjective estimates as best we can.]

I’d like to mention some bonus - but quite important - considerations about how we go about the effects at the Happier Lives Institute. We are not just interested in the effect of people when they receive the intervention (e.g., when they receive a cash transfer), but also the longterm effect: How long does the effect last? Does it decay? Does it grow? How quickly does it change? Then we also want to think about the effects on people beyond the recipients of the intervention; spillovers. How much does helping one individual affect their household or their community?

Finally, there’s the cost. I think this can be simple for charities at point of delivery: How much money do they receive divided by how many interventions they deliver gives you the cost to deliver an intervention to one person. But this can likely be more complicated in other situations. For example, what is the cost of influencing policy?

A meta consideration when one is comparing evaluators, how much time and how many people are involved in doing the analysis? Is this the work of 2 weeks or 2 months?

These are all really interesting questions and getting an idea of how different evaluators answer them is IMHO useful for the public, current evaluators, and future evaluators trying to learn how to do CEAs.