Research scholar @ FHI and assistant to Toby Ord. Philosophy student before that. I do a podcast about EA called Hear This Idea.
Ok, got it. I'm curious — how do you see people using ITN in practice? (If not for making and comparing estimates of good doneadditional resources?)
Also this post may be relevant!
That's a good point. It is the case that preferences can be about an indefinite number of things. But I suppose there is still a sense in which a preference satisfaction account is monistic, namely in essentially valuing only the satisfaction of preferences (whatever they are about); and there is no equivalent sense in which objective list theories (with more than one item) are monistic. Also note that objective list theories can contain something like the satisfaction of preferences, and as such can be at least as complex and ecumenical as preference satisfaction views.
Thanks, this is a good post. A half-baked thought about a related but (I think) distinct reason for this phenomenon: I wonder if we tend to (re)define the scale of problems such that they are mostly unsolved at present (but also not so vast that we obviously couldn't make a dent). For instance, it's not natural to think that the problem of 'eradicating global undernourishment' is more than 90% solved, because fewer than 10% of people in the world are undernourished. As long as problems are (re)defined in this way to be smaller in absolute terms, then tractability is going to (appear to) proportionally increase, as a countervailing factor to diminishing returns from extra investment of resources. A nice feature of ITN is that (re)defining the scale of a problem such that it is always mostly unsolved at present doesn't affect the bottom line of utility per marginal dollar, because (utility / % of problem solved) increases as (% of problem solved / marginal dollar) decreases. To the extent this is a real phenomenon, it could emphasise the importance of not reading too much into direct comparisons between tractability across causes.
I think it would be very valuable if more reports of this kind were citable in contexts where people are sensitive to signs of credibility and prestige. In other words, I think there are contexts where if this existed as a report on SSRN or even ArXiV, or on the website of an established institution, I think it could be citable and would be valuable as such. Currently I don't think it could be cited (or taken seriously if cited). So if there are low-cost ways of publishing this or similar reports in a more polished way, I think that would be great.
Caveats that (i) maybe you have done this and I missed it; (ii) this comment isn't really specific to this post but it's been on my mind and this is the most recent post where it is applicable; and (iii) on balance it does nonetheles seem likely that the work required to turn this into a 'polished' report means doing so is not (close to) worthwhile.
That said: this is an excellent post and I'm very grateful for these forecasts.
Thanks for writing this — I'm curious about approaches like this, and your post felt unusually comprehensive. I also don't yet feel like I could faithfully represent your view to someone else, possibly because I read this fairly quickly.
Some scattered thoughts / questions below, written in a rush. I expect some or many of them are fairly confused! NNTR.
Thanks for writing this Rose, I love it.
Small note: my (not fully confident) understanding is that a typical day still does not involve a launch to orbit. My cached number is something like 2 or 3 launches / week in the world; or ~100–150 days / year with a launch. This is the best cite I can find. Launches often bring multiple 'objects' (satellites) into orbit, which is why it can be true that the average number of objects launched into space each day can exceed 1. So maybe the claim that "humans launch 5 objects into space" is somewhat misleading, despite being true on average. (This is ignorable pedantry!)
Thanks for writing this! What I took from it (with some of my own thoughts added):
The ITN framework is a way of breaking down good doneadditional resources into three components —good done% of the problem solved×% of the problem solved% increase in resources×% increase in resourcesadditional resources
As such ITN is one way of estimating good doneadditional resources. But you might sometimes prefer other ways to break it down, because:
I should add that I find the "Fermi estimates vs ITN" framing potentially misleading. Maybe "ITN isn't the only way to do Fermi estimates of impact" is a clearer framing?
Anyway, curious if this all lines up with what you had in mind.
Thanks Dwarkesh, really enjoyed this.
This section stood out to me:
Instead, task a specific, identifiable agency with enforcing posterity impact statements. If their judgements are unreasonable, contradictory, or inconsistent, then there is a specific agency head that can be fired and replaced instead of a vast and unmanageable judiciary.
I've noticed this distinction become relevant a few times now: between wide, department-spanning regulation / intiatives on one hand; and fociused offices / people / agencies / departments with a narrow, specific remit on the other. I have in mind that the 'wide' category involves checking for compliance with some desiderata, and stopping or modifying existing plans if they don't; while the 'focused' category involves figuring out how to proactively achieve some goal, sometimes by building something new in the world.
Examples of the 'wide' category are NEPA (and other laws / regulation where basically anyone can sue); or new impact assessments required for a wide range of projects, such as the 'future generations impact assessment' proposal from the Wellbeing of Future Generations Bill (page 7 of this PDF).
Examples of the 'focused' category are the Office of Technology Assessment, the Spaceguard Survey Report, or something like the American Pandemic Preparedness Plan (even without the funding it deserves).
I think my examples show a bias towards the 'focused and proactive' category but the 'wide regulation' category obviously is sometimes very useful; even necessary. Maybe one thought is that concrete projects should often precede wide regulation, and wide regulation often does best when it's specific and legible (i.e. requiring that a specific safety-promoting technology is installed in new builds). We don't mind regulation that requires smoke alarms and sprinklers, because they work and they are worth the money. It's possible to imagine focused projects to drive down costs of e.g. sequencing and sterilisation tech, and then maybe following up with regulation which requires specific tech be installed to clear standards, enforced by a specific agency.
Thanks very much for writing this — I'm inclined to agree that results from the happiness literature are often surprising and underrated for finding promising neartermist interventions and thinking about the value of economic growth. I also enjoyed hearing this talk in person!
The "aren't people's scales adjusting over time?" story ('scale norming') is most compelling to me, and I think I'm less sure that we can rule it out. For instance — if I'm reading you right, you suggest that one reason to be skeptical that people are adjusting their scales over time is that people mostly agree on which adjectives like "good" correspond with which numerical scores of wellbeing. This doesn't strike me as strong evidence that people are not scale norming, since I wouldn't be surprised if people adjust the rough meaning of adjectives roughly in line with numbers.
If people thought this task was meaningless, they’d answer at random, and the lines would be flat.
I don't see a dichotomy between "people use the same scales across time and context for both words and adjectives" and "people view this task as meaningless".
You also suggest a story about what people are doing when they come up SWB scores, which if true leaves little room for scale norming/adjustment. And since (again, if I'm reading you right) this story seems independently plausible, we have an independently plausible reason to be skeptical that scale norming is occurring. Here's the story:
the way we intuitively use 0 to 10 scales is by taking 10 to be the highest realistic level (i.e. the happiest a person can realistically be) and 0 as the lowest (i.e. the least happy a person could realistically be) (Plant 2020). We do this, I claim, so that [...] we can use the same scales as other people and over time. If we didn’t do this, it would make it very difficult for our answers to be understood.
I think I don't find this line of argument super compelling, and not even because I strongly disagree with that excerpt. Rather: the excerpt underdetermines what function you use to project from an extremely wide space onto a bounded scale, and there is no obvious such 'Schelling' function (i.e. I don't even know what it would mean for your function to be linear). And indeed people could change functions over time while keeping those 0 and 10 pegs fixed. Another thing that could be going on is that people might be considering how to make their score informationally valuable, which might involve imagining what kind of function would give a relatively even spread across 0–10 when used population-wide. I don't think this is primarily what is going on, but to the extent that it is, such a consideration would make a person's scale more relative to the population they understand themselves to be part of, and as such to re-adjust over time.
Two extra things: (i) in general I strongly agree that this question (about how people's SWB scales adjust across time or contexts) is important and understudied, and (ii) having spoken with you and read your stuff I've become relatively less confident in scale-norming as a primary explanation of all this stuff.
I would change my mind more fully that scale norming is not occuring if I saw evidence that experience-sampling type measures of affect also did not change over the course of decades as countries become/became wealthier (and earned more leisure time etc). I'd also change my mind if I saw some experiment where people were asked to rate how their lives were going in relation to some shared reference point(s), such as other people's lives descibed in a good amount of detail, and where people's ratings of how their lives were going relative to those reference points also didn't change as countries became significantly wealthier.
(Caveat to all of above that I'm writing in a hurry!)
If almost everyone falls between 6–7 on the widest scale I can imagine, maybe the scale I actually use should significantly zoom in on that region.