Why does EA use QALYs instead of experience sampling?

byMilan_Griffes1mo24th Apr 201916 comments


Background: Earlier this year, I attended a great presentation by Natália Mendoça about experience sampling. Here's the deck from her presentation.

A takeaway from the presentation was that QALYs are constructed in a way that skews cause prioritization towards particular causes. Alternative metrics have different skews, so using an alternative metric could lead to very different cause prioritization.

For example, under the QALY framework, one year with "some problems walking about" is considered to be about as bad as one year with "moderate anxiety or depression."

For anyone who's had some experience with depression or anxiety, as well as with "some problems walking about," it should be obvious that moderate depression or anxiety are (much) worse than moderate mobility problems, pound for pound. (Please reach out if you disagree with this, I want to pick your brain if you do.)

An alternative metric to QALYs is called experience sampling. Last month, Natália posted about experience sampling on the Forum. The post was moderately upvoted, though no one commented on it.

A takeaway from that post is that rolling out an experience-sampling framework seems very tractable.

This research direction seems like plausibly a high priority for EA, as basing cause prioritization on a different metric could lead to notably different priority causes.

In particular, experience sampling appears to give a higher weight to mental health disorders than QALYs does, so it's plausible that under an experience-sampling framework, mental health interventions would be higher priority than global health interventions.

Given the potential magnitude of this delta in prioritization (between the experience-sampling & QALY frameworks), it's surprising to me that there's not been more interest in investigating alternatives to the QALY in the EA community.

To be clear, I'm not claiming that the experience-sampling method is superior to QALYs. I'm claiming that it is constructed in an equally plausibly way to the QALY, and that it probably results in drastically different cause prioritization. One potentially robust path forward could be to split the difference between prioritization implied by QALYs and prioritization implied by experience sampling.

[Disclosure: In February 2019, I corresponded about the experience-sampling idea with Alex Foster of the EA Meta Fund. He said my points were "certainly quite compelling," but the correspondence fell off.

I heard later from another source that the EA Meta Fund didn't end up getting excited about the idea, though they didn't say why not.]


Three thoughts. First, it's not really the case that EAs use QALYs/DALYs. GWWC and GiveWell used to use them , but GWWC no longer exists as an independent entity and GiveWell now use their own metric. 80k mostly focus on the far future and so QALYs/DALYs aren't of primary interest. Have I missed someone? I think Founders Pledge do use them. Not sure what goes on 'under the hood' for The Life You Can Save's recommendations.

Second, even if you wanted to use the experience sampling method (ESM) as your measure of wellbeing, you couldn't because there isn't enough data on it. There are only two academic projects which have tried to collect data en masse - trackyourhappiness and mappiness. The former is now defunct (Killingsworth works for Microsoft now I believe) and the latter isn't actively being used (I spoke to the creator, George MacKerron a couple of months ago) I discuss this in a previous forum post. The best I think we can do, if we want to use subjective wellbeing (SWB) measure is life satisfaction.

Third, I think ESM is the theoretically ideal measure of happiness and thus EA - indeed, everyone - should use it as the outcome measure of impact (I assume wellbeing consists in happiness). What follows is that ESM is superior to all other measures of wellbeing, including QALYs/DALYs, wealth, etc. I'm hoping to do some research using ESM at some point in the future if I can.

I don't really see ESM as being in opposition to QALYs. It seems like it's a method that you would use as an input in QALY weight determinations. Wikipedia lists some of the current methods for deriving QALY weights as:

Time-trade-off (TTO): Respondents are asked to choose between remaining in a state of ill health for a period of time, or being restored to perfect health but having a shorter life expectancy.
Standard gamble (SG): Respondents are asked to choose between remaining in a state of ill health for a period of time, or choosing a medical intervention which has a chance of either restoring them to perfect health, or killing them.
Visual analogue scale (VAS): Respondents are asked to rate a state of ill health on a scale from 0 to 100, with 0 representing being dead and 100 representing perfect health. This method has the advantage of being the easiest to ask, but is the most subjective.

There's also the "day reconstruction method" (DRM). The Oxford Handbook of Happiness talks about ESM, DRM and others relevant measurement approaches at various points.

I'd guess the trouble with using ESM, DRM and some other methods like them for QALY weights is it's hard to isolate the causal effect of particular conditions using these methods.

I suspect experience sampling is much more costly and time-consuming to get data on than alternatives, and there's probably much less data. Life satisfaction or other simple survey questions about subjective wellbeing might be good enough proxies, and there's already a lot of available data out there.

Here's a pretty comprehensive post on using subjective wellbeing:

A Happiness Manifesto: Why and How Effective Altruism Should Rethink its Approach to Maximising Human Welfare by Michael Plant

Another good place to read more about this is https://whatworkswellbeing.org/our-work/measuring-evaluating/