Research Scholar @ Future of Humanity Institute
Working (0-5 years experience)
1592Oxford, UKJoined Apr 2019


Research scholar @ FHI and assistant to Toby Ord. Philosophy student before that. I do a podcast about EA called Hear This Idea.


Ok, got it. I'm curious — how do you see people using ITN in practice? (If not for making and comparing estimates of  ?)

Also this post may be relevant!

That's a good point. It is the case that preferences can be about an indefinite number of things. But I suppose there is still a sense in which a preference satisfaction account is monistic, namely in essentially valuing only the satisfaction of preferences (whatever they are about); and there is no equivalent sense in which objective list theories (with more than one item) are monistic. Also note that objective list theories can contain something like the satisfaction of preferences, and as such can be at least as complex and ecumenical as preference satisfaction views. 

Thanks, this is a good post. A half-baked thought about a related but (I think) distinct reason for this phenomenon: I wonder if we tend to (re)define the scale of problems such that they are mostly unsolved at present (but also not so vast that we obviously couldn't make a dent). For instance, it's not natural to think that the problem of 'eradicating global undernourishment' is more than 90% solved, because fewer than 10% of people in the world are undernourished. As long as problems are (re)defined in this way to be smaller in absolute terms, then tractability is going to (appear to) proportionally increase, as a countervailing factor to diminishing returns from extra investment of resources. A nice feature of ITN is that (re)defining the scale of a problem such that it is always mostly unsolved at present doesn't affect the bottom line of utility per marginal dollar, because (utility / % of problem solved) increases as (% of problem solved / marginal dollar) decreases. To the extent this is a real phenomenon, it could emphasise the importance of not reading too much into direct comparisons between tractability across causes.

I think it would be very valuable if more reports of this kind were citable in contexts where people are sensitive to signs of credibility and prestige. In other words, I think there are contexts where if this existed as a report on SSRN or even ArXiV, or on the website of an established institution, I think it could be citable and would be valuable as such. Currently I don't think it could be cited (or taken seriously if cited). So if there are low-cost ways of publishing this or similar reports in a more polished way, I think that would be great.

Caveats that (i) maybe you have done this and I missed it; (ii) this comment isn't really specific to this post but it's been on my mind and this is the most recent post where it is applicable; and (iii) on balance it does nonetheles seem likely that the work required to turn this into a 'polished' report means doing so is not (close to) worthwhile.

That said: this is an excellent post and I'm very grateful for these forecasts.

Thanks for writing this — I'm curious about approaches like this, and your post felt unusually comprehensive. I also don't yet feel like I could faithfully represent your view to someone else, possibly because I read this fairly quickly.

Some scattered thoughts / questions below, written in a rush. I expect some or many of them are fairly confused! NNTR.

  • On this framework, on what grounds can someone not "defensibly ignore" another's complaint? Am I right in thinking this is because ignoring some complaints means frustrating others' goals or preferences, and not frustrating others' goals or preferences is indefensible, as long as we care about getting along/cooperating at all (minimal morality)?
  • You say The exact reach of minimal morality is fuzzy/under-defined. How much is entailed by “don’t be a jerk?”. This seems important. For instance, you might see 'drowning child' framings as (compellling) efforts to move charitable giving within the purview of "you're a jerk if you don't do this when you comfortably could." Especially given the size of the stakes, could you imagine certain longtermist causes like "protecting future generations" similarly being framed as a component of minimal morality?
    • One speculative way you could do this: you described 'minimal morality' as “contractualist” or “cooperation-focused” in spirit. Certainly some acts seem wrong because they just massively undermine the potential for many people living at the same time with many different goals to cooperate on whatever their goals are. But maybe there are some ways in which we collaborate/cooperate/make contracts across (large stretches of) time. Maybe this could ground obligations to future people in minimal morality terms.
  • I understand the difference  in emphasis between saying that the moral significance of people's well-being is derivative of its contribution to valuable states of affairs, as contrasted with saying that what makes states of affairs valuable just is people's well-being (or something to that effect). But I'm curious what this means in a decision-relevant sense?
    • Here's an analogy: my daily walk isn't important because it increases the counter on my podometer; rather the counter matters because it says something about how much I've walked (and walking is the thing I really care about). To see this, consider that intervening on the counter without actually walking does not matter at all.
    • But unlike this analogy, fans of axiology might say that "the value of a state of affairs" is not a measure of what matters (actual people and their well-being) that can be manipulated independently of those things; rather it is defined in terms of what you say actually matters, so there is no substantial disagreement beyond one of emphasis (this is why I don't think I'm on board with 'further thought' complaints against aggregative consequentialism). Curious what I'm missing here, though I realise this is maybe also a distraction.
  • I found the "court hearing analogy" and the overall discussion of population ethics in terms of the anticipated complains/appeals/preferences of future people a bit confusing (because, as you point out, it's not clear how it makes sense in light of the non-identity problem). In particular your tentative solution of talking about the interests of 'interest groups' seems like it's kind of veering into the axiological territory that you wanted to avoid, no? As in: groups don't literally have desires or preferences or goals or interests above and beyond the individuals that make them up. But we can't compare across individuals here, so it's not clear how we can meaningfully compare the interests of groups in this sense. So what are we comparing? Well, groups can be said to have different kinds of intrinsic value, and while that value could be manifested/realised/determined only by individuals, you can comfortably compare  value across groups with different sets of individuals.
  • Am I right in thinking that in order to creatively duck things like the RP, pinprick argument, arguments against asymmetry (etc) you are rejecting that there is a meaningful "better than" relation between certain states of affairs in population ethics contexts? If so this seems somewhat implausible because there do seem to be some cases where one state of affairs is better than another, and views which say "sure, some comparisons are clear, but others are vague or subjective" seem complicated. Do you just need to opt out of the entire game of "some states of affairs are better than other states of affairs (discontinuous with our own world)"? Curious how you frame this in your own mind.
  • I had an overall sense that you are both explaining the broad themes of an alternative to populaiton ethics grounded in axiology; and then building your own richer view on top of that (with the court hearing analogy, distinction between minimal and ambitious morality, etc), such that your own view is like a plausible instance of this broad family of alternatives, but doesn't obviously follow from the original motivation for an alternative?  Is that roughly right?
  • I also had a sense that you could have written a similar post just focused on simpler kinds of aggregative consequentialism (maybe you have in other posts, afraid I haven't read them all); in some sense you picked an especially ambitious challenge in (i) developing a perspective on ethics that can be applied broadly; and then (ii) applying it to an especially complex part of ethics. So double props I guess!

Thanks for writing this Rose, I love it.

Small note: my (not fully confident) understanding is that a typical day still does not involve a launch to orbit. My cached number is something like 2 or 3 launches / week in the world; or ~100–150 days / year with a launch. This is the best cite I can find. Launches often bring multiple 'objects' (satellites) into orbit, which is why it can be true that the average number of objects launched into space each day can exceed 1. So maybe the claim that "humans launch 5 objects into space" is somewhat misleading, despite being true on average. (This is ignorable pedantry!)

Thanks for writing this! What I took from it (with some of my own thoughts added):

The ITN framework is a way of breaking down  into three components —

As such ITN is one way of estimating . But you might sometimes prefer other ways to break it down, because:

  • Sometimes the units for I,T, or N  are ambigious, and that can lead to unit inconsistensies in the same argument, i.e. by equivocating between "effort" and "money". These inconsistencies can mislead.
  • The neat factorisation might blind us to the fact that the meaning of 'good done' is underspecified, so it could lead us into thinking it is easier or more straightforward than it actually is to compare across disparate causes. Having more specific s for  can make it clearer when you are comparing apples and oranges.
  • ITN invites marginal thinking (you're being asked to estimate derivatives), but sometimes marginal thinking can mislead, when 'good done' is concave with resources.
  • Maybe most important of all: sometimes there are just much clearer/neater ways to factor the problem, which better carves it at its joints. Let's not constrain ourselves to one factorisation at the cost of more natural ones!

I should add that I find the "Fermi estimates vs ITN" framing potentially misleading. Maybe "ITN isn't the only way to do Fermi estimates of impact" is a clearer framing?

Anyway, curious if this all lines up with what you had in mind.

Thanks Dwarkesh, really enjoyed this.

This section stood out to me:

Instead, task a specific, identifiable agency with enforcing posterity impact statements. If their judgements are unreasonable, contradictory, or inconsistent, then there is a specific agency head that can be fired and replaced instead of a vast and unmanageable judiciary.

I've noticed this distinction become relevant a few times now: between wide, department-spanning regulation / intiatives on one hand; and fociused offices / people / agencies / departments with a narrow, specific remit on the other. I have in mind that the 'wide' category involves checking for compliance with some desiderata, and stopping or modifying existing plans if they don't; while the 'focused' category involves figuring out how to proactively achieve some goal, sometimes by building something new in the world.

Examples of the 'wide' category are NEPA (and other laws / regulation where basically anyone can sue); or new impact assessments required for a wide range of projects, such as the 'future generations impact assessment' proposal from the Wellbeing of Future Generations Bill (page 7 of this PDF).

Examples of the 'focused' category are the Office of Technology Assessment, the Spaceguard Survey Report, or something like the American Pandemic Preparedness Plan (even without the funding it deserves).

I think my examples show a bias towards the 'focused and proactive' category but the 'wide regulation' category obviously is sometimes very useful; even necessary. Maybe one thought is that concrete projects should often precede wide regulation, and wide regulation often does best when it's specific and legible (i.e. requiring that a specific safety-promoting technology is installed in new builds). We don't mind regulation that requires smoke alarms and sprinklers, because they work and they are worth the money. It's possible to imagine focused projects to drive down costs of e.g. sequencing and sterilisation tech, and then maybe following up with regulation which requires specific tech be installed to clear standards, enforced by a specific agency.

Thanks very much for writing this — I'm inclined to agree that results from the happiness literature are often surprising and underrated for finding promising neartermist interventions and thinking about the value of economic growth. I also enjoyed hearing this talk in person!

The "aren't people's scales adjusting over time?" story ('scale norming') is most compelling to me, and I think I'm less sure that we can rule it out. For instance — if I'm reading you right, you suggest that one reason to be skeptical that people are adjusting their scales over time is that people mostly agree on which adjectives like "good" correspond with which numerical scores of wellbeing. This doesn't strike me as strong evidence that people are not scale norming, since I wouldn't be surprised if people adjust the rough meaning of adjectives roughly in line with numbers.

If people thought this task was meaningless, they’d answer at random, and the lines would be flat.

I don't see a dichotomy between "people use the same scales across time and context for both words and adjectives" and "people view this task as meaningless".

You also suggest a story about what people are doing when they come up SWB scores, which if true leaves little room for scale norming/adjustment. And since (again, if I'm reading you right) this story seems independently plausible, we have an independently plausible reason to be skeptical that scale norming is occurring. Here's the story:

the way we intuitively use 0 to 10 scales is by taking 10 to be the highest realistic level (i.e. the happiest a person can realistically be) and 0 as the lowest (i.e. the least happy a person could realistically be) (Plant 2020). We do this, I claim, so that [...] we can use the same scales as other people and over time. If we didn’t do this, it would make it very difficult for our answers to be understood.

I think I don't find this line of argument super compelling, and not even because I strongly disagree with that excerpt. Rather: the excerpt underdetermines what function you use to project from an extremely wide space onto a bounded scale, and there is no obvious such 'Schelling' function (i.e. I don't even know what it would mean for your function to be linear). And indeed people could change functions over time while keeping those 0 and 10 pegs fixed. Another thing that could be going on is that people might be considering how to make their score informationally valuable, which might involve imagining what kind of function would give a relatively even spread across 0–10 when used population-wide. I don't think this is primarily what is going on, but to the extent that it is, such a consideration would make a person's scale more relative to the population they understand themselves to be part of[1], and as such to re-adjust over time.

Two extra things: (i) in general I strongly agree that this question (about how people's SWB scales adjust across time or contexts) is important and understudied, and (ii) having spoken with you and read your stuff I've become relatively less confident in scale-norming as a primary explanation of all this stuff.

I would change my mind more fully that scale norming is not occuring if I saw evidence that experience-sampling type measures of affect also did not change over the course of decades as countries become/became wealthier (and earned more leisure time etc). I'd also change my mind if I saw some experiment where people were asked to rate how their lives were going in relation to some shared reference point(s), such  as other people's lives descibed in a good amount of detail, and where people's ratings of how their lives were going relative to those reference points also didn't change as countries became significantly wealthier.

(Caveat to all of above that I'm writing in a hurry!)

  1. ^

    If almost everyone falls between 6–7 on the widest scale I can imagine, maybe the scale I actually use should significantly zoom in on that region.

Load More