If I understand correctly, it sounds like we now agree on the math of my post, and on my arguments around which coefficients from cross-sectional vs longitudinal regressions seem to match? But I think we still disagree about whether the impacts of a gradual increase in gdp across time should be compared to cross-sectional differences?
My first thought on our disagreement is that an income doubling is a fairly arbitrary metric. I think it would be equally reasonable to zoom in on the cross sectional graph, and look at the impact of a 1% increase in income. We can imagine country Y on the cross-section graph which lies a little higher than Ethiopia on the regression line in my post. This country would have $1010 per capital GDP and a SWB of 4+1*.007=4.007, versus Ethiopia at $1000 and 4. If we compare this to what we would expect from a .007 coefficient in one of your alternative regressions, it looks like it’s exactly what we would expect from one year of 1% growth vs the counterfactual for Ethiopia? In this case we don’t need to worry about the amount of time it takes to double income, and TS and CS become more intuitively comparable?
My second thought is that if we assume that TS results are not comparable to CS results because they take a long time, wouldn’t that make the existence of the Easterlin Paradox irrelevant for making any judgements about the world? Isn’t the Easterlin Paradox a paradox precisely because we expect the coefficients to match between CS and TS, but they don’t seem to in some specifications?
“we are talking about the Gallup results and ignoring the EVS/WVS results. They are preferred for long-run periods.”
Agreed. I haven’t looked at the EVS/WVS results at all, so there is a good chance that they are less sensitive to the kinds of alternative specifications I tried for the Gallup results.
“It’s possible that many people on the lower end of the income distribution benefit greatly – indeed many economists, even happiness ones, believe this in their bones. We just need more evidence at scale.”
I share the same intuition, and find this an interesting area for further exploration. I would be curious to hear your thoughts on why the “Growth X LDC” coefficients in all of your regressions are negative (which is a surprise to me). This seems to imply that people lower down the income distribution are actually benefiting less from % income increases? Re-running your regressions on just the less-developed countries in your Gallup dataset, I also get smaller coefficients than those for the whole dataset.
Thanks again for the response!
Yes, I am definitely talking about WELLBYs. I meant to say that there are two ways of looking at both income and SWB, a level at a point in time, and the sum of the levels per year (we can think of those as the area under the curve plotted across time). We can call the summed versions INCYs and WELLBYs, and the point in time estimates Income and SWB. So I think in year 13.5, we can say that we get .2 WELLBYs for 1 INCY. Or alternatively, we can say that we get .027 SWBs for 14% Income gain. I don't think that we should be comparing SWBs (a point in time estimate) to INCYs (a summation estimate).
To illustrate I’ll try to go back to the example of boosting Ethiopia’s growth by 1pp, using your coefficient of 0.002. For simplicity, let’s say that Ethiopia starts with a per capita GDP of $1000, a SWB of 4, and a real growth rate of 0%. It seems like we agree that “The population in year 13.5 reports .027 greater SWB points after an increase in growth by one percent.” So if we boost growth to 1% I think we agree that in year 71 Ethiopia would have a per capita GDP of $2000 (versus the counterfactual $1000) and a SWB = 4 + 71*.002=4.14.
Now to address our discussion on (3) in the below thread, you say: "As you point out, our results include larger coefficient estimates using different specifications, yet we still argue they are not economically significant," and then in response to my comment that "those coefficients seem to be close to what we would expect from the cross-sectional data," you comment "I don’t agree that the results are similar in size."
Let’s assume we accept the coefficient from your regression in table 3, column 5: 0.007. That would imply that in year 71 Ethiopia would have roughly twice the GDP than it would have had counterfactually (compared to the 0% growth world), and a SWB = 4+71*.007 = 4.5. This is 0.5 points higher than the counterfactual.
Now let's imagine that in the cross section regression Ethiopia and country X are both exactly on our regression line. Ethiopia is at $1000 and SWB of 4, country X is at $2000 and SWP of 4.5 (That is roughly where the cross sectional regression lines fall as I argue in my post, and as you can see from the graph I include). If there were no Easterlin Paradox, we would expect that if Ethiopia gradually got to $2000 GDP, it would move up the regression line to where country X currently is. But it seems like that is exactly what the .007 regression coefficient implies in the preceding paragraph? If so, is this at odds with your response on discussion (3) in the below thread?
Alternatively, don’t the coefficients from Sacks, Stevenson, and Wolfers 2012 roughly correspond to the larger coefficient estimates in your regressions (since both include 10 year short-term fluctuations)? So if Sacks et. al. convincingly reran their analysis to focus on the same countries and longer time series that you use, and got the same coefficients they did in their paper, would that not update us towards thinking that longitudinal and cross-sectional results might be similar?
I think we could also use a similar argument about the Ethiopian counterfactual SWB = 4 + 71*.002=4.14 to argue that it matches the cash transfer results that I cite in my post.
“In your spreadsheet, you multiplied 0.002 by the number of years, assuming a larger increase in SWB per year (i.e., 0.004 in year 2), which is not correct.”
I meant the .004 to represent how much happier a person is after two years of faster growth than they would have been counterfactually (if growth had been 1pp lower). Since their annual change in SWB would have been .002 higher, they would have gotten .004 better off by year 2.
In other words, I think your formulas (4)-(3) represent the impact of additional growth (versus the counterfactual) on life satisfaction at time t (SWBt). So using your: 0.002*(∆G)t = .0021*2=.004 happier than the counterfactual. This is only .002 happier than the counterfactual after 1 year, but .004 happier than the counterfactual if there had been no additional growth at all. So since the person was .002 happier in year 1 and .004 happier in year 2, I would consider that a cumulative .006 happier across the two years.
I think for the cumulative life satisfaction gain to be .027, you would have to expect the person in year 13.5 to only be .002 happier than he would have been without the additional growth (that way he would only be .002 happier each year, for a total of .027 life satisfaction points summed across the 13.5 years). But that would imply that our SWB measure wasn’t annualized, and that it shouldn’t matter whether you’ve been growing for one year or 1000, you would still be happier by the same amount?
Perhaps our difference is in how we are using the word cumulative? By cumulative, I mean actually summing across the counterfactual SWB gains in each of the 13.5 years. I think this is the correct thing to look at if we are comparing it to the income gains in each year summed across the 13.5 years. Perhaps by cumulative you meant just the total counterfactual impact on life satisfaction in year 13.5? But then it seems like we need to add the counterfactual impacts at each of the preceding years?
Perhaps one useful intuition pump would be to compress the whole income doubling into 1 year. Lets say annual growth increases by 100pp. Then we counterfactually double income in the first year. The impact on SWB is 100*.002=.2 life satisfaction points. Which is a bit higher than the estimates from cash transfers.
Thanks so much for taking the time to engage in this discussion! I am going to try to reply to where we have interesting areas of disagreement, and to number the points for easier response.
- “For alternative policies that similarly cover a long period of time, see recent work by me and Easterlin, "Explaining happiness trends in Europe." "
Thank you for sending this. It’s encouraging that we may have levers to move that have larger impacts than economic growth. It definitely updates me away from believing the results of the social safety net regression I outline in my post (although as I mentioned in the post, those results were never that compelling). I used OurWordInData’s “Adequacy of Social Safety Net Programs”. There were only 30 countries, and they were mostly LMICs, so I am not surprised that the results differ from yours. I would be curious what you think of that dataset, and whether the data you use looks like it avoids some of the noise in mine. I would definitely love to see more analysis on this with bigger datasets than both of ours’ if there are any ways to create them. I wonder if the implication might be something like: large social safety nets are effective in European states which have a lot of state capacity to deliver services, and less effective in the LMICs in my dataset.
2. “Fundamentally, you cannot compare doubling one’s income at a point of time (e.g., due to lottery and investment returns or cash transfers) to doubling one’s income in 71 years… Empirically, the growth-happiness relation depends upon the time horizon; it gets smaller as the duration increases. We discuss this in the paper conceptually and in reference to the two data sets we use. The longer period in the WVS/EVS data results in lower growth- subjective well-relations.”
I think this is an interesting point. If we believe in hedonic adaptation, then we would expect the results of cash transfer RCTs to be much higher than the results over 14 or 40 years like in the two datasets you use in your paper. So the fact that the implied impacts seem to be very similar seems to be (very weak) evidence against adaptation in this context? Am I right in thinking that the results in your two sets of regression implicitly factor in adaptation, since those countries became wealthier slowly? If so, I think we should be comfortable applying the results with 14-40 years (gallup-wvs/evs) of adaptation factored in, to an estimate that looks at benefits spanning from 1-40 years for Ethiopia?
3. "Your replication / robustness tests are not so surprising. As you point out, our results include larger coefficient estimates using different specifications, yet we still argue they are not economically significant, implying we would argue your alternative results are still too small to prioritize growth.”
But those coefficients seem to be close to what we would expect from the cross-sectional data? If that is the case, are you suggesting that even if the Easterlin paradox turned out to not hold, we would not update towards thinking more of economic growth? That would imply that a low income country could increase their life satisfaction from around 4 to around 6 if they could figure out a way to enable the kind of catch-up growth that some East-Asian countries have managed.
4) " I’m reasonably assured you can find much more effective policies for short-run gains. See Table 1 of P. Frijters, A. E. Clark, C. Krekel, R. Layard, A happy choice: Wellbeing as the goal of government. Behav. Public Policy, 1–40 (2020). “
Thank you for sending this. Reducing fear of violent crime stands out as especially promising to me as a potential intervention. However, it does look like doubling income is still one of the larger results here, and is not obviously harder to achieve than some of the other large-effect interventions. I definitely hope that there are more tractable interventions than boosting growth that we can find. Also, even if we don’t, I think we can probably find ways to do a lot of good by just saving lives, rather than boosting well-being.
5) "Your robustness test results do not overturn our results; they fall within the range we estimate and only apply to one data set, indeed the one that is based on a shorter period, which is less preferred for reasons explained in the text"
I agree. I only meant to try a couple of easy alternative specifications to see how sensitive the results are to them. The Gallup World Poll Data had more countries and was easier to download so I just decided to look at that dataset. If my results are correct, they are not meant to invalidate the Easterlin Paradox. I just think we should be aware that it seems sensitive to specification (even after accepting the exclusion of transition economies, and countries with less than 12 years of data).
6) "Perhaps you can explain to me how the GiveWell team determined the “Value assigned to increasing ln(consumption) by one unit for one person for one year” and why this is used in determining the value of subjective well-being benefits.”
GiveWell assigns one unit to an income doubling, so boosting ln(consumption) by one unit is simply =1/ln(2). They then try to estimate the value of saving a life relative to an income doubling by looking at surveys of recipients, Global Burden of Disease estimates, value of statistical life approaches, internal surveys, and other sources. For the purposes of my estimation, you wouldn’t need to accept any of their assumptions except for the fact that it is difficult to find ways to help people that is more than ten times more cost-effective than simply giving cash to the very poorest people in the world.
Thanks again for the interesting exchange.
Thanks again for the discussion!
I agree that it’s very reasonable to look at the cumulative “cost” in terms of income doublings, rather than just the final number. But I think then you also need to look at the cumulative well-being gains. You don’t just get the life satisfaction gain of the doubling on your 71st year, you also get smaller gains every year before that, just like you do for the costs.
I’ve set up a spreadsheet based on your example of looking at the first 13.5 years to see when one cumulative income doubling has occurred. In that case, on the first year you get .002 life satisfaction points, on the second .004, until the 13.5th when you get .027. When you sum them you get a total of 0.2 life satisfaction points. You get those at a cost of one income doubling. This is actually larger than would be predicted by my approach of multiplying .002 by 71, which would imply 0.142 life satisfaction points. (The reason it’s larger is that when we are looking at time horizons like 13.5 years, the income doublings don’t really benefit that much from compounding yet, so the cost hasn’t grown quickly enough to get to the .142 threshold, which I believe happens closer to the 100 year mark).
I agree that we have very little evidence so far about the tractability of economic growth interventions. I just think that Easterlin and O’Connor’s work should not make us think that economic growth interventions are any less useful than we would have otherwise thought. Since these sorts of regressions seem to show smaller impacts for health and pollution than GDP, maybe they should (very very slightly) update us towards thinking a little more of economic growth interventions than whatever our prior beliefs were.
I agree that all of the increases in regression coefficients are not that large in some absolute sense, and are in some sense luck. But the increases do seem to be large enough to flip us towards rejecting rather than accepting the Easterlin Paradox. This is statistical luck in some sense, but that just seems to show that the results are very sensitive to that sort of luck. So, as we both seem to agree, we don’t really have enough data to say if the Easterlin paradox holds.
I would love to see more work around estimating the expected costs and impacts of national health, pollution, social safety net, and growth policy on life satisfaction. I suspect that these sorts of change-on-change regressions would not end up being a large part of the evidence on which we based these estimates. Since there is so little data here, we might end up having to rely on judgements about individual policies’ chances of success. My point in the post was simply that Easterlin and O’Connor’s analysis does not seem to give us any evidence to suggest that GDP is likely to be less impactful than health or pollution.
Michael, thanks so much for really engaging with the post. I think we are now very close in our big-picture views on the subject, but would love to continue the discussion on the more interesting areas of disagreement (I will respond to those points below). I agree that we don’t have enough data to say if the Easterlin paradox holds. I am also somewhat hesitant about prioritizing economic growth as an intervention, although my concerns are less about effect sizes directly, and more about whether generating growth is tractable, and whether potential interventions carry large risks.
I agree with Stephen Clare’s response that we can try to be more Bayesian here. I think it’s reasonable to start with a prior based on the very statistically significant cross-sectional correlation between a country’s GDP and its well-being. In order to believe that this correlation does not generalize to changes in one country across time, we would need to believe that Ethiopia could grow to have the current US GDP but remain as unhappy as a low income country. That would make it an extreme outlier in the cross-sectional data, and would imply that there was some kind of idiosyncratic problem with the country (and I don't think the argument about people comparing themselves to peers deals with this problem). So I think there is some burden of proof on providing evidence that there actually is a paradox. If we start with a prior based on the cross sectional data, we would initially expect a 0.5 life satisfaction point increase for an income doubling. Then we can update on HLI’s meta-analysis results, suggesting that the impacts of cash transfers only have an impact that is a quarter of that. So now we would believe that the impact is somewhere between those two values. Then we get Easterlin and O’Connor’s regression results, which are not in themselves statistically significant. However, they are pretty much the same as the HLI results, so there is no reason to move below the range we believed the effect to be in before. It does not seem to make sense to update all the way to 0 based on results that are non-zero. So even though Easterlin and O’Connor’s regressions do not in themselves have enough statistical power to provide any evidence for their being an impact of growth in happiness, the coefficients they provide should not update us away from what we believed to be the effects of income doubling before. That being said, we have very small datasets here, the individual countries are correlated to each other (making the amount of independent information we have even smaller than it seems), and all of this is simply correlation. We have not done anything here to control for omitted variables, to try to run lagged regressions, or to try quasi-experimental designs. So overall I agree that we should not expect to learn very much about causal impacts from these types of regressions.
I agree with this. And I think the amount of data we would really need would be much higher than it initially seems. Since Easterlin and O’Connor’s are running multiple different statistical tests (deciding exactly how many years of data a country needs before it counts as full-cycle, and separately deciding which countries are transition countries), we would need even more data to make up for the multiple hypotheses.
If we accept the results from the 2020 data, or alternatively assign a probability of 50% to there being no Easterlin paradox, then it would really only be 3-4 doublings to get an additional point of life satisfaction. If we accept the results from HLI’s analysis, I believe it would be about 6 income doublings (starting with 0.1 standard deviations, converting to life satisfaction points, and then discounting for decreased benefits for non-recipient household members)? A country like Ethiopia could have about 6 GDP doublings before getting to United States GDP levels. I would like to thank Matt Lerner for pointing this out.
I think we are on the same page here. I was just using "improve life satisfaction" as shorthand for "improve average life satisfaction across the whole population."
Thanks so much for the kind words.
I think the question of the the plausible range for tractability is an interesting one. I suspect that most global health interventions seriously considered by EA fall within a 100x range. But I would guess that the reason this is true is that the only interventions with enough evidence are already in the process of solving more than 0.5% of the problem. At the other end of the spectrum, I suspect intervention trying to influence the very long term trajectory of human culture might fall into a range that spans at least 6 orders of magnitude. There are probably plenty of interventions we could consider that we should expect to have much less than a one in a million chance of solving 10% of the problem. Because there is little evidence and feedback for what would work in this context, we should not expect most things we consider to have a non-tiny chance of working.
I am also a little skeptical of how much information we get out of neglectedness when working with these sorts of problems. I think something being neglected might often be a sign that experts in the space don't consider the approach plausible, or that some experts have tried it and given up on it. If that is the case, then that effect may swamp the diminishing marginal returns we might expect. Additionally, diminishing marginal returns might not be as common in fields where it's not obvious what the next good thing to do is (because there are poor feedback mechanisms).
That is entirely fair. It's reasonable to not accept the cross-sectional results as having any information value for your prior. So I should have have said we can start with a prior from the HLI meta-analysis results (which if I remember correctly are pretty statistically significant). Then when we get the information from the Easterlin and O'Connor paper, where the results are the same as our prior, but not statistically significant, to say that the new information does not shift our prior results at all. So even though the Easterlin and O'Connor paper does not give us much information one way or the other, it still seems reasonable to say there is no reason to think that the results are likely to be much lower than the HLI results?