About two months ago, Michael Plant, Joel McGuire and Clare Donaldson from the Happier Lives Institute posted on this forum about Using Subjective Well-Being to Estimate the Moral Weights of Averting Deaths and Reducing Poverty. I wrote some feedback on the post and Michael asked me to share it here.
These are my own views but I'm very grateful to my colleagues on the Founders Pledge research and advisory teams for comments and discussion.
The trade-off between saving lives and increasing consumption is really important to prioritisation in global health and development and I think subjective well-being (SWB) offers potentially really useful tools for making trade-offs like this, so I’m really excited to see this post. This post functions very well as an initial proof of concept of using SWB in cost-effectiveness analyses and I think it’s a strong early step in applying well-being analysis. The analysis impressively comes to reasonable conclusions, often with only poor quality data available. I’m excited to see this methodology develop further.
The post and guesstimate model are clear and easy to understand. I was especially impressed by the presentation of the guesstimate model. These easily get unwieldy but this is extremely clear and easy to follow.
I have one major comment and a few minor comments. Most of the minor comments are best interpreted as things to think about in future work rather than direct criticisms of the post.
One major comment
Beware the difference between an expected value of a ratio and a ratio of expected values. You calculate the expected value of a ratio but I think you should instead look at the ratio of expected values. The ratio of expected values is about 40% lower than the expected ratio.
A simple illustrative example
Suppose there are two options, A and B, and two scenarios, 1 and 2. You are uncertain about which scenario is actual but you know that they are equally likely. Suppose that you can get the following pay-offs by choosing option A or B in a scenario 1 or 2:
Which option should we choose? Ratios are helpful: if x/y > 1, then we know that x > y (as long as x and y are positive). , so should we expect A to be better than B and choose A over B? No. Note that is also greater than 1, so the expected ratio is misleading. We care about the higher expected value and , so choose B.
Similarly, we could look at the ratio of expected values: . This is the right ratio to look at when choosing between A and B.
In the context of this post
In the case of this post, we’re not actually choosing between two options, we’re trying to determine how to weigh two outcomes against each other. So perhaps it’s not immediately obvious that this criticism applies here, but it does. Suppose there are two acts:
- Act C, that doubles n people’s consumption for one year
- Act L, that saves m lives of children under the age of 5
Ideally, in choosing between these, we’d build probabilistic models like this one that incorporate the distributions here for the value of doubling consumption for one person for one year (call this c) and the value of saving one life of a child under the age of 5 (call this l). We look at the expected value of performing C and L and choose whichever is greater. The relevant ratio here is (rather than ). We might approximate the expected value of C as the expected number of income doublings multiplied by the expected value of doubling one person’s income, i.e. (and similarly, ). Then the relevant ratio for comparing C and L is:
So the relevant ratio for comparing income doublings and saving lives is the ratio of their expected values, not the expected value of their ratio. I don’t see a use for in this analysis, so this shouldn’t be the headline figure.
Another way to see that the expected ratio is misleading is to consider inverses. Suppose the expected ratio told us how many times saving a life is better than doubling one person’s consumption. One could just as easily have calculated the ratio the other way round – i.e. the expected ratio of doubling consumption over saving a life, and this would tell us how many times better doubling consumption is than saving a life. If that’s the case, then the reciprocal of this expected ratio should tell also us how many times better saving a life is than doubling one person’s consumption (e.g. if we think doubling consumption is better than saving a life, then we think saving a life is better than doubling consumption). But this gives a different answer to the expected ratio of saving a life over doubling consumption:
- Expected ratio of saving a life/doubling consumption = 240 (your model)
- Expected ratio of doubling consumption/saving a life = 0.0085 (can calculate in your model)
- 1/(expected ratio of doubling consumption/saving a life)) = 1/0.0085 = 120 ≠ 240 (simple calculation)
But 1 and 3 purport to describe the same thing, i.e. how many times better saving a life is than doubling consumption, so they should be equal. Therefore, at least one of them fails to do this and since we have no principled reason to choose one of the other, we should reject both.
Calculate the expected value of doubling consumption for one person for one year in your model (about 1.5) then read of the relevant expected values and divide:
- Deprivationism: 210/1.5 = 140 (rather than 240)
- TRIA: 45/1.5 = 30 (rather than 50)
Totalism and births averted per life saved
As you develop this methodology further, I think it’s important that you account for other moral views, most notably totalism. As you’re aware, totalism is a popular view (especially in EA) and, depending on how we ought to respond to moral uncertainty, we might think that totalism (or something similar) dominates our decision calculus when acting under moral uncertainty (Greaves and Ord 2017). I think it would be valuable to know what a similar totalist analysis yields.
As you mention, the value of saving lives, according to totalism will be sensitive to the effect that saving a child’s life has on fertility. You write:
A report written for GiveWell estimated that in some areas where it recommends charities the number of births averted per life saved is as large as 1:1, a ratio at which population size and growth are left effectively unchanged by saving lives. For totalists, the value of saving lives in a 1:1 context would be very small (compared to one where there was no fertility reduction) as the value of saving one life is ‘negated’ by the disvalue of causing one less life to be created.
This wasn’t my reading of Roodman’s report (though this isn’t my area of expertise + my memory of the report is a little hazy). I understood the conclusion to be that the 1:1 ratio is mainly in countries with low infant mortality rates, in which parents choose the number of children they want to have but that saving lives in a high infant mortality rate environment could be very different. Areas in which we’d aim to save lives of young children, almost by definition, have relatively high infant mortality rates, so saving lives is more likely to lead to additional lives. E.g. in Kenya, infant mortality rate is about 3%, compared to 0.4% in the UK. Many parents in the UK choose the size of their family, so saving a child’s life here often means an additional child won’t be born. I don’t know the effect of saving an infant’s life in Kenya on fertility but the large difference in infant mortality rate suggests that the effect on fertility could be much smaller.
I think a valuable improvement would be to investigate the effect of saving lives on fertility rates in the relevant contexts further and use this to run a totalist analysis. As you note, totalism raises difficult questions about the extent to which the world/the relevant regions are under or overpopulated, as these bear on the extent to which adding extra people affects average well-being. Even just side-stepping these for now by holding average well-being constant could be informative but further investigation of these questions could also be valuable.
Wide bounds for parameters of exponential decay model
The exponential decay model does well to make reasonable estimates from your data. The 90% confidence interval displayed in Figure 1A is really wide though and, importantly, the areas under the lower and upper bounds of the confidence interval are very different. Guesstimate does a good job of incorporating this uncertainty but it seems like it would be valuable to reduce the uncertainty with longer-term follow-up data.
Moral value as a linear function of well-being
What we really care about is moral value. We’re looking at life satisfaction here as a proxy for that. Maybe moral value is a function of life satisfaction. In this post, you treat moral value as a linear function of life satisfaction because you treat a point gain in life satisfaction as equally valuable, no matter one’s starting life satisfaction. I know it’s standard to assume this but how well-grounded is it? Is a jump from 0 to 1 life satisfaction really as valuable as a jump from 9 to 10? This seems like something that could be tested through revealed and/or stated preferences (but of course that would then partially undermine the reason to look at life satisfaction in the first place). This could be understood in prioritarian terms but I don’t intend it like that: the difference in well-being between 0 and 1 life satisfaction could be larger than the difference in well-being between 9 and 10.
Perhaps, the (near) logarithmic relationship between income and life satisfaction provides indirect evidence for a linear relationship between moral value and life satisfaction because this shows that life satisfaction has diminishing returns to income and surely so does moral value. But this is only a very weak argument: it could be like saying “ and both display diminishing returns, so maybe y and z are roughly linear.” But , so this really isn’t a linear relationship at all! (I don't intend to imply that you make this argument, I'm just highlighting that diminishing returns can look very different.)
A few small points here:
- How much variation in life expectancy is there across and within counties? I would expect the life expectancy of those whose lives are saved by GiveWell charities to be among the very lowest and if there’s lots of variation in life expectancy, then the relevant life expectancy could be significantly lower than at the county or national level.
- Your bounds seem too high to me, given the evidence you cite. There being 2 counties with life expectancy less than 60 seems fairly strong reason to believe that it’s more than 5% likely that the relevant life expectancy is less than 62. Even if life expectancy in every county were above 60, say, it’s possible that we could still expect the relevant life expectancy to be less than 60 if there’s lots of variation within counties (e.g. even if life expectancy across a county is 60, life expectancy of the relevant subgroup within the county could be less and this is more likely if there’s high variation in length of lifetime within counties).
- Technically, we should look at life expectancy given the current age rather than life expectancy at birth, and this increases as we survive more years (in practice). I would guess that life expectancy at 5 (or 2.5) is pretty similar to life expectancy at birth, so I doubt this matters much here.
- If you wanted to, you could estimate life expectancy at 5 (or 2.5 etc.) using the infant mortality rate. A simple version: If proportion p of children survive to age n, life expectancy at birth is E, then (assuming those who die before age n die at age n/2 on average), we have , where is life expectancy at age n. Just rearrange to find
- You cite median life expectancy for a one-year old (i.e. the median age at which a one-year old will die?), which is relevant, but it’s not obvious to me how much weight to put on this. What proportion of infants in Kenya are potential recipients of life-saving interventions? If it’s much less than 50%, the median might not be very informative
- I’m not entirely sure how life expectancy is calculated but I think that ideally, you’d also account for life expectancy trends over time. Current life expectancy might not predict how long we can expect current infants to live (it might instead predict how long we can expect infants born N years ago to live)
- Is it possible to get any data on the distribution of the length of lifetimes? This would actually be better than life expectancy. Some people live much longer or shorter than their country/county’s life expectancy, so estimating the life expectancy understates the uncertainty in your model. Ideally, you’d estimate how long people will live, which would be a distribution with life expectancy as the mean but with more uncertainty. I doubt this makes a big difference to the end results though
- Historically, Japanese life expectancy data has been used in DALY calculations across the world so that the lives of people living in countries with low life expectancy aren’t weighted less than the lives of people living in countries with high life expectancy
Bias in life satisfaction studies
- Psychology and economics studies (among many others) often overstate effect sizes, so the effect of cash transfer on life satisfaction might be overstated here
- However, the samples are very large, so maybe this isn’t too concerning
- This applies less to the average life satisfaction studies because there’s no particular reason to expect these to be biased one way or the other (there’s no pressure to find an effect)
- This could overstate the benefits of doubling consumption relative to saving lives
Indirect/spillover effects are hard to quantify and this is a nice start, with effects on other household members. Wider effects are hard to account for but perhaps a notable exception is the economic effects of cash transfers on non-recipients. You note that the possibility of negative spillovers but the larger more relevant Egger et al. (2019) that you cite suggests large positive spillovers on non-recipient households and firms. Well-being analysis could be well placed to account for such spillovers (though I do fear that such an investigation could quickly get tangled up in the Easterlin Paradox, in which case it might not be that tractable after all).
Variation in consumption
The use of the linear-log model to adjust from consumption increase to consumption doubling makes sense. In theory though, just looking at the mean pre-intervention consumption and mean cash transfer to estimate the mean effect on life satisfaction could be misleading if there’s lots of variation in these quantities. If there’s lots of variation in consumption, then the mean pre-intervention (and post-intervention) life satisfaction could be much lower than the life satisfaction corresponding to mean pre-intervention (and post-intervention) consumption – by Jensen’s inequality, . I haven’t thought much about the implications for estimating the effect size but in principle, this could make a difference so it might be worth looking into.
Moral vs prudential value
It might be helpful to distinguish between moral and prudential value since we could hold different views about the moral and prudential badness of death. For example, some version of TRIA strikes me as a very plausible account of the badness of my death for me (i.e. its prudential value) but less plausible as an account of the moral badness of my death (because it seems to depend on some form of person-affecting view and/or a preference-based theory of well-being). In the context of doing good, the moral value is what matters rather than the prudential value and we wouldn’t want decision-makers to erroneously give too much weight to some moral views because they mistake such views for their more plausible prudential counterparts. This isn’t really a criticism of the post (indeed, introducing this distinction might just have been distracting, confusing and unnecessary) but it could be important when it comes to putting this framework into practice.
Sensitivity analysis on TRIA representation
Caveats: (1) I’m only very vaguely familiar with the TRIA literature, so could be way off here. These comments are based on my initial thoughts and could be largely disconnected from or simply repeating the literature. (2) these concerns plausibly apply less to the badness of death of young children compared to adults, as most plausible TRIAs will be in relatively close agreement that young children aren’t strongly psychologically connected to their future adult selves. However, as I’ll elaborate later, I can see these concerns being relevant for evaluating the badness of death for infants too.
As you note, there are many possible ways of formalising TRIA. The method you’ve chosen is simple and intuitive and seems fine for a proof of concept. I have two concerns going forward though: (i) it seems plausible that the results could vary quite significantly on different versions of TRIA, (ii) it’s not obvious to me that your version is the best. Combined, these concerns suggest that the choice of representation matters (important differences between different representations) and you’ve chosen the wrong one.
I’ll start by motivating (ii). Is it standard to assume that we reach 'full psychological connectedness' at around age 10-21 and maintain that until death? Is that supposed to be a realistic or a simplifying assumption? It's clearly false for some people, e.g. with dementia, and intuitively, this seems to me to be probably false for most people. Memories fade, personalities change etc. over time and we gradually become more and more psychologically disconnected from our earlier selves. I would expect the peak TRIA discount factor to be less than 1 because even at the peak, I wouldn’t expect people to be fully psychologically connected to their entire future selves (so death isn’t as bad for them as on the deprivationist account). I can also imagine the peak coming later than 21 (but for sure, there should be a rapid increase between birth and age 10-21). I would also expect the discount to drop off again after the peak (but that’s probably not important here). I’m not trying to argue for a specific TRIA here, just to note that the TRIA used in your model doesn’t strike me as the most plausible version. Or at least that this isn’t clear enough that it’s worth considering other representations in future work.
Now onto (i). In the most extreme case, one could have an eliminativist view about personal identity, according to which persons don’t persist over time and so in each moment, I am different to all the other mes at different times. Combined with TRIA, death isn’t bad for someone who dies on this view because no one that they are psychologically connected to loses well-being. In between this extreme and the view you propose, we might hold a view according to which periods of psychological connectedness are relatively short, lasting a few years or small number of decades. Each of these views could have widely varying results for the badness of death, so it might be important to run a sensitivity analysis on this.
Returning briefly to (ii), the third view I sketched above seems more intuitively plausible to me than the one you’ve used. But I haven’t thought about this much and don’t know what the rest of the literature looks like here.
Even though this more obviously applies to the badness of death for adults, it could still have consequences for the badness of death for infants. You’ve modelled TRIA by increasing the TRIA discount factor linearly from 0 at birth to 1 at age 15 (90% CI: 10-21). If the discount factor peaks below 1, then a similar linear increase would be shallower so the relevant discount factor would be lower. I expect that, if anything, this is probably an argument for a non-linear discount factor rather than a significantly lower one. In any case, I’d be interested in more of a sensitivity analysis here.
Comparability of SWB measures across different income settings
Similar to Jack Malde’s comment, I think there are legitimate concerns about the comparability of SWB measures across different income settings due to the possibility of people using the scales differently in different contexts. I look forward to seeing your paper on this, Michael!
This might not be too significant a concern here – if the SWB measures are comparable across (e.g.) GiveDirectly recipients and (e.g.) children saved from malaria, then we’re all good. This could be much more important when trying to use SWB measure to make comparisons across more strongly differing contexts.
Location of results in guesstimate model
As mentioned above, I think the guesstimate model is really well presented. The only slight improvement in presentation (for me) would be to have the results at the top. It's good to repeatedly refresh guesstimate models to check how variable the results are over multiple runs and it's a little bit easier to do this if you don't have to scroll each time you refresh.
On moral value as a linear function of well-being and comparability of SWB measures across different income settings
As you allude to, there are two issues here. If I think person A going from 0/10 to 1/10 life satisfaction has greater moral value than B going from 9/10 to 10/10, that might be because (1) you think each has the same increase in well-being, but you want to give extra weight to the worse off. This is the prioritarian point you say you are not making.
The alternative, (2) is that you think A really has had a bigger increase in well-being than B even though both have reported a 1-unit change in life satisfaction. (2) raises a concern about whether the subjective scales are cardinally comparable. This isn’t a moral problem, so much as a scientific one of measurement. Technically, the issue is whether numerical scores from subjective self-reports are cardinally comparable. I’ve got a working paper on this topic (not public apart from this link) where I delve into this and conclude subjective scales are likely cardinally comparable. The basic issue here, I think, is about how people are use language when interpreting survey questions; not much seems to have been written about it. With regards to your point about “comparability of SWB measures across different income settings” the document I linked to provides a rationale for why I suspect they are comparable.
This is a great summary of what I was and wasn't saying :)
Thanks for the link - looking forward to reading. Might return to this after reading
On totalism and births averted per life saved
I agree it’s important to see the value of our actions is sensitive to concerns about population ethics, especially in this case where it seems it could make such a difference. A few comments.
First, it’s worth noting all views of population ethics will be somewhat sensitive to the issue of how saving lives affects total population size. This is because whether there are more or fewer people now has, arguably, an impact on the well-being of everyone else (present and future). Many people seem to think the Earth is overpopulated, in the sense that adding people now is overall worse. There are a few different ways of thinking about this but one general practical implication is that the worse it is to add people (because you want a smaller population) the worse it will also be to save lives. See Greaves (2015) analysis and Plant (2019, chapter 2) which is an extension of Greaves’ paper.
Second, I agree that if you’re thinking about how mortality rates affect fertility, this will be particularly important on totalism in this context, because totalism gives so much weight to creating new lives, although it will apply to other views of population ethics too.
Third, when trying to understand what the “lives saved:births averted” ratio is, what’s relevant is not just mortality or fertility rates by themselves, but the combination of them. If parents are trying to have a set number of children (survive to adulthood) then the effects of reducing mortality might not change the total number of future people much, because parents adjust fertility. I think this is a topic for further work and I don’t claim expertise on the population dynamics in any particular context.
Agreed. I didn't mean to imply that totalism is the only view sensitive to the mortality-fertility relationship - just that the results could be fairly different on totalism and that it's especially important to see the results on totalism and that it makes sense to look at totalism before other population ethical views not yet considered. Exploring other population ethical views would be good too!
I think my concern here was that the post suggested that saving lives might not be very valuable on totalism due to a high fertility adjustment:
Roodman's report (if I recall correctly) suggested that this likely happens to a lower degree in areas where infant mortality is high (i.e. parents adjust fertility less in high infant mortality settings) so saving lives in these settings is plausibly still very valuable according to totalism.
Okay, we're on the same page on all of this. :) A further specific empirical project would involve trying to understand population dynamics in the locations EAs are considering.
On life expectancy
That’s a good point - something we could look into more next time. (In general we spent more time on the decisions specific to using SWB rather than general technicalities, but of course, if people are going to use the results then these are important too.)
Yes that’s true - we mentioned this in a previous version which got dropped, which is my omission. From WHO life tables, the life expectancy for 0-1 year olds is 64.4 and for 1-4 year olds is 66.1 (for Kenyan boys, in 2016 - for girls it’s 68.9 and 70.3), so not a huge difference, although this could be tightened up in future.
You’re right. Our World in Data provides a helpful explanation of the different types of life expectancy. We used the UN’s projected life expectancy for 2020-2025, so this should predict how long we can expect babies born today to live. You can see graphs for Kenya here and here (I haven’t figured out the exact methodology and the differences between their ‘standard’ and ‘probabilistic’ projections).
Great, sounds like you're on top of all of this!
Thanks a lot for your detailed comments Aidan, and others at Founders Pledge! We really appreciate the feedback and think our future work on this will benefit from it a great deal. Michael, Joel and I will reply to your comments individually.
On the major comment: This is a great point and something I hadn’t thought of before. Your explanation is very helpful. It’s really striking how large a difference it makes. We will update the post and Guesstimate model soon to correct this.
One thing I can’t quite get my head round - if we divide E(C) by E(L) then don’t we lose all the information about the uncertainty in each estimate? Are we able to say that the value of averting a death is somewhere between X and Y times that to doubling consumption (within 90% confidence)?
You're very welcome! I really enjoyed reading and commenting on the post :)
Good question, I've also wondered this and I'm not sure. In principle, I feel like something like the standard error of the mean (the standard deviation of the sample divided by the square root of the sample size) should be useful here. But applying it naively doesn't seem to give plausible results because guesstimate uses 5000 samples, so we end up with very small standard errors. I don't have a super strong stats background though - maybe someone who does can help you more here