Acronyms for Happiness, part 1: how countries try to measure economic well-being

Alexander de Vries

This is a linkpost for https://2ndhandecon.substack.com/p/acronyms-for-happiness-part-1?r=6u57o&s=w&utm_campaign=post&utm_medium=web

[Definitions Warning: in this post, I use “welfare” and “well-being” interchangeably, to mean something holistic like “quality of life”. I also use “happiness” and “subjective well-being” interchangeably, to mean something like “positive (or absence of negative) subjective experiences / qualia”. I am sincerely sorry for the confusion; all I can say in my defense is that the well-being literature’s terminology was already hopelessly confused before I read it and I’m not strong enough to solve it on my own.]

[Epistemic Status: Spent quite a while reading the papers and critiques and counter-critques and counter-counter-critiques, so I’m fairly certain that everything in this post is true, except for the bits of my opinion. However, there is some room for misinterpretation, because some economists absolutely refuse to be clear about how they intend a given metric to be used (I’m looking at you, Nordhaus & Tobin!). Obligatory reminder that I’m not an expert, just an econ student with too much free time.]

You are, presumably, a citizen of a country. If so, you might have heard your fellow citizens complain once in a while; comparing your country to others, or to some potential better version of itself. You might even indulge in such complaints yourself! I know I certainly do.

What are these things, then, which the citizens of a modern liberal democracy might complain about? There is education quality, of course (best in Finland, or so I hear); household income (Norway and the US are frontrunners); equality (Slovenia has the most equal incomes, apparently). Environment and traffic and food and stability and democratic participation … the list goes on and on.

The question, then, is: what do all of these have in common? Answer: they’re all (mostly) about quality of life. Over time, people have come to realize that: a) it is possible to have continuous growth in quality of life, b) other countries might be doing this better than we are, and c) this is probably at least somewhat because they are making different societal choices than we are.

Obvious conclusion: we should compare ourselves to our past selves, to see if we have it better than them. We should also compare ourselves to other countries, to see who is doing best and what we can learn from them. Just make a list of all the countries in the world, ranked by wellbeing, and start with whoever’s at the top of the list. It’s that easy!

It’s not that easy

This, of course, is what experts call “the hard part”. Well-being is really, really subjective, which not only means that it’s difficult to measure, but also that everyone with an incentive to fudge the measurements (read: governments) can do so very easily, with plausible deniability.

What our society did instead, for a while, is sidestep this entirely by not really trying to measure well-being at all. Rather, we took one “objective” measure of something quite different and went around optimizing our economies for that. I am talking, of course, about Gross Domestic Product (per capita).

[Sources on GDP: Diane Coyle’s 2014 book “GDP: A Brief but Affectionate History”, Moshe Syrquin’s 2016 critical review thereof, and my macroeconomics professor.]

GDP is not a measure of welfare. It was never intended to be a measure of welfare. It isn’t even a measure of economic welfare. Yet many people, not just politicians or businessmen but normal voting citizens, have placed it as a main goal of government to raise its country’s GDP. Why?

To answer that, we need to look at what GDP actually is. Strictly speaking, it is the amount a country produces, or the total income of people working in that country, or the total spending on goods & services from that country (these are all equivalent). In other words, GDP represents the output of an economy per year. It’s used for various economic calculations (debt-to-GDP ratio, NGDP targeting …) and to analyze (potential) conflicts between nations, like a trade war or even a real war. ¹ For these kinds of things, it is a very good macroeconomic indicator.

So why not just divide the GDP by the number of people in the economy? Then you can see how much material wealth is being produced for each person in the country. Surely that’ll be roughly the same as well-being, right? (I’m not sure I have ever actually heard someone make this argument, but plenty of people act as if it is true.)

Bobby Kennedy answered better than I ever could:

Too much and for too long, we seemed to have surrendered personal excellence and community values in the mere accumulation of material things. Our Gross National Product, now, is over $800 billion dollars a year, but that Gross National Product - if we judge the United States of America by that - that Gross National Product counts air pollution and cigarette advertising, and ambulances to clear our highways of carnage.
It counts special locks for our doors and the jails for the people who break them. It counts the destruction of the redwood and the loss of our natural wonder in chaotic sprawl.
It counts napalm and counts nuclear warheads and armored cars for the police to fight the riots in our cities. It counts Whitman's rifle and Speck's knife, and the television programs which glorify violence in order to sell toys to our children.
Yet the gross national product does not allow for the health of our children, the quality of their education or the joy of their play. It does not include the beauty of our poetry or the strength of our marriages, the intelligence of our public debate or the integrity of our public officials.
It measures neither our wit nor our courage, neither our wisdom nor our learning, neither our compassion nor our devotion to our country, it measures everything in short, except that which makes life worthwhile.
And it can tell us everything about America except why we are proud that we are Americans.

If this is true here at home, so it is true elsewhere in world.

Perhaps the younger Kennedy overdoes it a bit on the rhetoric (does anyone really believe that nothing worthwhile is measured by GDP?), but he hits on almost all the most important critiques of GDP as welfare measure: it neglects non-economic value² (health, joy, beauty) while including potentially harmful products (cigarettes, advertising, guns) and indicators of failure elsewhere in society (ambulances, locks, jails).

One more essential critique not mentioned in the speech: inequality. The GDP/capita of a highly unequal country can be the same as that of a very equal country, even though the equal country will probably be happier (due to diminishing returns & the fact that we compare our incomes to those of our peers), more stable, and more ‘fair’.

And finally, GDP’s main promise - its objectivity - doesn’t even bear out. There are a thousand ways to fudge the numbers: do we count domestic labor or the black market? How do we calculate inflation, or the value of financial and government services? All these degrees of freedom leave tons of room for any government to manipulate the big number behind the dollar sign. Of course, many other measures face the same problem, but objectivity was GDP’s claim to fame - without that, GDP just becomes one of many welfare measures, better suited as a component indicator than a headline number.³

Beyond GDP

Plenty of people realized this, of course. Only a decade after JFK promised and achieved 5% GDP growth per year, and four years after RFK made his famous anti-GDP speech (the Kennedy brothers must have had interesting dinner table conversations), the economists William Nordhaus and James Tobin published “Is Growth Obsolete?”, a short treatise on growth in the modern (70s) economy.

In this work, they discuss the environmentalist movement and its objections to economic growth - a forerunner of the modern ‘degrowth’ movement - giving us some insightful quotes in the process:

Growth measures nearly always involve diversions of current resources from other uses, sacrifices of current consumption for the benefit of succeeding generations of consumers. Enthusiasts for faster growth are advocates of the future against the present.

Thus:

[B]oth growth men and antigrowth men invoke the interests of future generations. The issue between them is not whether and how much provision must be made for future generations, but in what form it should be made. The growth man emphasizes reproducible capital and education. The conservationist emphasizes exhaustible resources— minerals in the ground, open space, virgin land.

And:

The mistake of the antigrowth men is to blame economic growth per se for the misdirection of economic growth. The misdirection is due to a defect of the pricing system — a serious but by no means irreparable defect and one which would in any case be present in a stationary economy. […] The proper remedy is to correct the price system so as to discourage these [polluting] technologies. Zero economic growth is a blunt instrument for cleaner air, prodigiously expensive and probably ineffectual.

Aside from their still-relevant comments on economic growth, Nordhaus and Tobin have something even more valuable to give us: a first serious attempt at measuring economic welfare. They creatively call it the Measure of Economic Welfare, or MEW.

[Note that at this point nobody’s trying to measure total welfare (i.e. including non-economic factors) yet; in the 70s, science didn’t have the proper tools to even attempt such an undertaking.]

N&T create their MEW by, essentially, taking GDP and shaking it really hard until all the loose screws fall out. “Regrettables” is their term for government spending - “made for reasons of national security, prestige, or diplomacy” - that doesn’t increase anyone’s economic welfare. We are measuring welfare, so all of it must be scrapped. Defense spending is an obvious regrettable, which only exists because of the set of Prisoner’s Dilemmas we call ‘geopolitics’. Scrapped. Expensive new furniture for the White House? Scrapped. Fuel costs of the Qatari ambassador’s private jet? Scrapped.

(Weirdly, they don’t seem to have an equivalent to regrettables for private spending, despite the existence of advertising, which also provides zero value to households. Maybe they’re assuming that advertising provides the service of information about products? It seems naïve to believe that nowadays, but ads were different back then and behavioral econ hadn’t been invented yet, so I’ll cut them some slack.)

N&T make a few other changes, like shifting commuting costs from consumption to intermediate input; and education, medical and durable goods spending from consumption to investment. The “disamenities of urban life” are adjusted for: “pollution, litter, congestion, noise, insecurity, buildings and advertisements offensive to taste, etc.”⁴ They also consider adjusting for natural resource depletion and environmental damage, but end up not doing so, because they don’t have accurate cost estimates. ⁵

That’s all sort of small fries, though. The MEW includes three very important things: household labor, leisure time, and sustainability.

There’s a quote by Arthur Cecil Pigou which must, apparently, be repeated whenever one talks about household labor. I’m not one to ignore tradition, so here it is: "If a man marries his housekeeper or his cook, the national dividend is diminished."

It really is a clever quote (though implicitly a tad sexist; it was 1962, after all). What Pigou means, precisely, is that while the man is paying his housekeeper, this is counted in GDP, but presumably he won’t be (directly) paying his spouse to continue cleaning the floor and washing the dishes, so once they marry, the same labor is no longer counted in GDP. Nevertheless, floor-cleaning and dish-washing continue to be valuable services.

Pretty much the only reason not to count household labor in GDP is that it’s hard to value for lack of a market price. Economists are rather wary of imputing values for non-market goods. To this day, household labor is excluded from standard GDP measures.

This exclusion has some negative consequences, however. First, it disproportionately ignores women’s contributions to the economy (in the US, women do 1.6x as much housework as men, according to the 2018 American Time Use Survey). And second, it deflates the GDP of countries where household labor makes up more of total labor (usually poorer countries).

Leisure is even trickier in this aspect. Clearly, it would be unfair to say that a country where everyone performs backbreaking labor 12 hours a day has the same welfare as one where people lie on the beach most of the day, even if the people of both countries have the same income. The second country is far more productive, but its citizens simply prefer leisure to material goods. In GDP, however, both countries will be regarded as equal. Again, for the purposes of GDP as an economy-size-measure, this is perfectly logical! For a welfare measure, it is nonsense.

So N&T set to work trying to estimate the values of household labor and of leisure. They do so with a set of mathematical equations my brain refuses to even try to understand, so I have no idea if their estimates are any good. Using three different calculation methods, they give us a range of 65%-74% of 1965 GDP for the value of household labor, and 158%-181% of GDP for leisure time. These are huge numbers, way higher than any of the later estimates, which leads me to believe N&T are doing something wrong, or at least very different.⁶

Their precise estimates don’t really matter, though. What’s important is that they introduced the concept of trying to include all this previously ignored value, and in doing so, influenced the next generation of economists to follow their lead. This is doubly true for their final and most influential contribution: sustainability.

Can we keep this pace?

The great question of sustainability is this: can we continue doing what we’re doing now, indefinitely? Nowadays, this is mostly about environmental degradation - pollution of all kinds, the burning of forests for farmland, global warming - but in the 70s, those concerns were on almost nobody’s minds. The key question was use of capital. Sure, this included natural capital like timber, coal, and water, but thought about that had only just begun (cfr. Club of Rome).

No, this was capital in the classic sense: stored labor used to produce goods & services, usually in the form of machines or infrastructure. Conveyor belts wear out and asphalt cracks, therefore they must be repaired. The sustainability viewpoint is that we shouldn’t count the repairs and replacements as newly created welfare when they are in fact only holding our welfare constant.

As such, in actual MEW (MEW-A, the MEW we’ve been talking about so far), all of these final costs related to capital investment are removed. To find out whether this path is sustainable, then, we need another metric: the MEW-S, or sustainable MEW. Where MEW-A represents the amount of consumption actually occurring, MEW-S is the maximum amount of consumption with which the economy can retain a growth rate equal to the trend rate of technological progress.

In other words, actual MEW is how much we are currently consuming and sustainable MEW is the most we ‘should’ consume. When MEW-A is less than MEW-S, that not only means we’re consuming less, but also that we’re investing more than just the amount needed to repair & replace all depreciated capital. That increases the amount of capital available (capital-output ratio) and thereby puts the economy on a higher growth trajectory. MEW-A greater than MEW-S implies, of course, the reverse: it means that we’re over-consuming and under-investing, so less new capital is created than the amount that breaks down in a year. This means our consumption cannot be sustained - is not ‘sustainable’ - and leads to lower economic growth.

This method is associated with the idea of weak sustainability, which assumes that overconsumption/underinvestment in one area can be compensated by investing more in other areas - say, increasing the amount of train tracks in an area to make up for deteriorating canals. Summing up sustainability in dollar amounts is only possible under an assumption of weak sustainability, because money implies fungibility - that is, at the margin, you can exchange one thing with a $ for any other thing with a $.

This assumption fails once we start taking natural capital and environmental degradation into account; because these things are often not substitutable, we’ll have to switch to strong sustainability. After all, if you’re running out of iron, you can’t exactly throw money down the iron mine in exchange for more - there’s simply a limited amount of iron on this earth. There are also a lot of things which are only sort of substitutable. Having burned down a part of the Amazon, you can reforest it, but it’ll take time, not just money, and the original biodiversity probably won’t return within our lifetimes.

Mistakes Were Made

After the creation of the MEW(-S), it took a while for the rest of the academic world to catch on. Besides a few outliers like the Greek economist/central banker Zolotas, who wrote a book about his own welfare measure in 1981, interest in such things was essentially nonexistent until 1989, when ecological economists Herman Daly and John Cobb Jr. wrote For the Common Good: Redirecting the Economy toward Community, the Environment, and a Sustainable Future.

In this book, Cobb & Daly propose the Index of Sustainable Economic Welfare (ISEW), which is still frequently used today. The exact implementation depends on the study, but Wikipedia gives us a rough formula:

ISEW = personal consumption
+ public non-defensive expenditures
- private defensive expenditures
+ capital formation
+ services from domestic labour
- costs of environmental degradation
- depreciation of natural capital

(“Defensive expenditures”, here, include but are not limited to what N&T called “regrettables”. Defensive expenditures are all costs of limiting potential damage to oneself, like insurance payments or water treatment for pollution. Critics like Neumayer (1999b) point out that in most studies, defensive spending is defined super arbitrarily and could just as easily include food (defense against starvation), leisure (defense against burnout), etc. Hamilton (1996) uses economic models to show that defensive expenditures shouldn’t be excluded from welfare measures.)

The idea here is to combine welfare and sustainability, previously measured separately - for example with the MEW and MEW-S - into one number, the ISEW. This was a huge mistake. Because I am utterly incapable of doing anything without analogies, here’s an analogy to explain why.

Imagine the economy as a car, which has a certain velocity (level of consumption) and different kinds of fuels (economic capital, natural capital, human capital …). Most of these fuels are being refilled constantly but you don’t want to use them up faster than they’re being refilled. MEW is a measure of your velocity. Various indicators like water pollution, carbon footprint, etc. tell you how much of each fuel you have left and how much you’re using. MEW-S is a measure of how fast you should be going if you don’t want to use up all your fuels.

ISEW, then, is some kind of convoluted sum of velocity and fuel use - of multiple different, non-substitutable fuels! - averaged into one number. This makes no sense! A half-blind grandma going 20 with a full tank could see the same number as a menace to society going 210 on a German highway and about to run out of fuel. If you want to know what the welfare is in a given year and whether you can sustain it, just use different numbers. It really isn’t that hard.

I’ve searched the literature for some kind of justification for this, but found very little - papers which respond to critiques of the ISEW (or the very similar Genuine Progress Indicator (GPI)), like Lawn (2003) and Talberth, Cobb, Slattery (2007) only briefly mention this critique, never actually rebutting it, before moving on to what they see as more substantial critiques like the abstractness of values for natural capital / environmental degradation (which is, indeed, also a huge problem).

I sincerely hope I’m missing something here, or misinterpreting it. To me, the ISEW (and GPI) looks like a deeply, deeply flawed metric, to the point where it’s borderline useless. But I’m not an expert on this subject (yet), so if you’re iffy about my reasoning, you shouldn’t just take my word for it. Let’s use our heuristics.

First, as I just mentioned, papers responding to critiques never seem to take the time to answer this one, which either means it’s unanswerable or it’s not as important as I think.

Second, while ISEW-style measurement does still have its proponents - since 2018, there have been some 1800 studies featuring ISEW and 1700 featuring GPI, according to Google Scholar - it’s fallen out of favor compared to the two newest methods, happiness measurement and social indicators, which have 39600 and 25700 studies respectively since 2018.

Third, it’s not unusual for some researchers to hold on to a bad idea for a decade or two, especially when the issue has a political dimension, as this one certainly does. To quote Donald Renner of the Human Economy Center:

We believe that the GNP is a faulty economic measuring rod that has political significance. Adding secondary subsidiary accounts, to account for environmental degradation is no real solution. Academically it may help but politically it will be ignored.

This heavily implies that while the best actual way of measuring well-being is with multiple metrics, it might be more politically advantageous for the environmental movement to have one single metric to replace GDP with in the public eye. Clearly, there is political incentive to hold on to this idea, regardless of its real merit.

Fourth, the Stiglitz-Sen-Fitoussi report agrees with me:

Contrary to what their authors think, the ISEW cannot at the same time function both as an indicator of current welfare and an indicator of sustainability, i.e. the capacity to provide non-declining welfare over time. This is because the ISEW consists or should ideally consist of items that should only be included in an indicator of welfare or an indicator of sustainability.

I’ll go over what makes this report such a big deal in part 2. For now, just know that this report has a lot of social scientists at the top of their fields on its coauthor list, which means that much of the academic establishment endorses this critique.

All of this, plus a few other critiques which I don’t have the room to cover in this post, leads me to conclude that the ISEW and its little brother GPI are not very useful, and we should either return to the MEW/MEW-S combination or move on to some new metrics. In part 2, we’ll take a look at some of these new metrics which have popped up recently and see if they’re any better.

Summary

There is a lot of disagreement about how to measure well-being at a state level. In this post, I’ve covered a few potential measures of economic well-being, i.e. the well-being that comes from material consumption of goods and services.

GDP was not originally intended as a measure of economic well-being, and its appropriation as such by governments and the general public alike have not made it any better at this job. I explained how GDP includes spending on harmful and useless things, neglects household labor & leisure, and ignores the adverse effects of inequality.

MEW was the first attempt to rectify this. It excludes ‘regrettable’ government spending, includes urban disamenities, tries to value household labor & leisure, and adjusts for inequality. Because household labor and leisure are a large part of the economic value produced in any given year and are very difficult to measure, they tend to dominate the MEW and change the number a lot depending on the valuation technique.

MEW-S is a companion to MEW. It attempts to determine the amount of economic welfare a country can have in a given year while maintaining a rate of economic growth equal to the rate of technological progress (in other words: while keeping the amount of capital roughly constant). The MEW-S, in measuring all capital spending in one dollar amount, assumes weak sustainability - that is, it assumes that all forms of capital can be substituted for each other.

ISEW and GPI attempt to combine sustainability and economic welfare in one dollar number. They do this by subtracting the imputed values of capital depreciation and environmental degradation from an MEW-like measure. This seems to be, plainly, a grave mistake, because the amount of welfare and its sustainability are two separate and incommensurable values. There are also significant technical problems with the attempted valuation of natural capital.

Part 2 will look at the two methods which are currently most popular: social indicators and happiness surveys. See you then!

References

Beça, P., & Santos, R. (2010). Measuring sustainable welfare: A new approach to the ISEW. Ecological Economics, 69(4), 810–819. https://doi.org/10.1016/j.ecolecon.2009.11.031

Bureau of Labor Statistics. (2018). American Time Use Survey 2018. https://www.bls.gov/news.release/archives/atus_06192019.pdf

Coyle, D. (2015). GDP: A Brief but Affectionate History - Revised and expanded Edition (Revised ed.). Princeton University Press.

Daly, H. E., & Cobb, J. B., Jr. (1994). For The Common Good: Redirecting the Economy toward Community, the Environment, and a Sustainable Future (2nd,Updated ed.). Beacon Press.

Field, A. (2016). British Economic Growth: 1270–1870. By Stephen Broadberry, Bruce M.S. Campbell, Alexander Klein, Mark Overton, and Bas van Leeuwen Cambridge: Cambridge University Press, 2015. Pp. 461. The Journal of Economic History, 76(1), 236-238. doi:10.1017/S002205071600005X

Hamilton, K. (1996). POLLUTION AND POLLUTION ABATEMENT IN THE NATIONAL ACCOUNTS. Review of Income and Wealth, 42(1), 13–33. https://doi.org/10.1111/j.1475-4991.1996.tb00143.x

Lawn, P. A. (2003). A theoretical foundation to support the Index of Sustainable Economic Welfare (ISEW), Genuine Progress Indicator (GPI), and other related indexes. Ecological Economics, 44(1), 105–118. https://doi.org/10.1016/s0921-8009(02)00258-6

Neumayer, E. (1999). The ISEW: Not an Index of Sustainable Economic Welfare. Social Indicators Research, 48(1), 77–101. http://www.jstor.org/stable/27522403

Nordhaus, W. D., & Tobin, J. (1981). Is Growth Obsolete? Amsterdam University Press.

Syrquin, M. (2016). A Review Essay on “GDP: A Brief but Affectionate History” by Diane Coyle [Review of GDP: A Brief but Affectionate History, by D. Coyle]. Journal of Economic Literature, 54(2), 573–588. http://www.jstor.org/stable/43966745

Talberth, J., Cobb, C., & Slattery, N. (2007, February). The Genuine Progress Indicator 2006. Redefining Progress. https://d3pcsg2wjq9izr.cloudfront.net/files/24200/articles/12128/GPI202006.pdf

Zolotas, X. (1981). Economic Growth and Declining Social Welfare (First Edition). Bank Of Greece.

Background & Extra Reading

Coyle, D. (2021). Cogs and Monsters: What Economics Is, and What It Should Be. Princeton University Press. Highly recommended!

Ellis, H. S. (1985). [Review of Economic Growth and Declining Social Welfare., by X. Zolotas]. The Journal of Economic History, 45(3), 770–772. http://www.jstor.org/stable/2121806

Ferrer-i-Carbonell, A. (2012). Happiness economics. SERIEs, 4(1), 35–60. https://doi.org/10.1007/s13209-012-0086-7

Fitoussi, J. P., Sen, A., & Stiglitz, J. (2009). Report by the Commission on the Measurement of Economic Performance and Social Progress. https://ec.europa.eu/eurostat/documents/8131721/8131772/Stiglitz-Sen-Fitoussi-Commission-report.pdf

MacKerron, G. (2011). HAPPINESS ECONOMICS FROM 35 000 FEET. Journal of Economic Surveys, 26(4), 705–735. https://doi.org/10.1111/j.1467-6419.2010.00672.x

Max-Neef, M. (1995). Economic growth and quality of life: a threshold hypothesis. Ecological Economics, 15(2), 115–118. https://doi.org/10.1016/0921-8009(95)00064-x

Neumayer, E. (1999). Global warming: discounting is not the issue, but substitutability is. Energy Policy, 27(1), 33–43. https://doi.org/10.1016/s0301-4215(98)00063-9

Suzanne M. Bianchi, Liana C. Sayer, Melissa A. Milkie, John P. Robinson, Housework: Who Did, Does or Will Do It, and How Much Does It Matter?, Social Forces, Volume 91, Issue 1, September 2012, Pages 55–63, https://doi.org/10.1093/sf/sos120

Effective Altruism Forum
EA Forum