• I developed a system that tried to deal with moral uncertainty regarding the moral status of "potential people," which is a key crux in the debate between so-called "average utilitarians" vs. "total utilitarians."
  • To simplify the system's underlying principle: when comparing between two decision options which lead to different population sizes with the difference equaling X, an evaluator should treat the world with a smaller population as also having X additional people with wellbeing equal to zero ("non-existent potential people")—weighted by the evaluator's probability estimate for the claim that "potential people matter"—and then look at the average of the two worlds.
  • I think the reasoning behind the system seems facially intuitive/agreeable for many average utilitarians, which I feel somewhat qualified to believe given that I leaned more towards average utilitarianism up until ~1 year ago.
  • However, the system I developed can be gamed in a way that leads to illogical conclusions (e.g., concluding A < B < C < A).
  • I'm unclear whether this problem is fixable with minor tweaks, but open to the possibility.
  • It's plausible but definitely not obvious that this could be a legitimate argument (perhaps even a "self-trap"[1]) against the idea of focusing on averages as opposed to aggregates. However, I do not know whether future research will show that a similar system can be set up to fool total utilitarian reasoning.

Summary of Post Goals

In this post, I am seeking to:

  1. Solicit responses that point me towards other attempts to account for moral uncertainty between average and total utilitarianism, including systems which may not actually be better but are more well known or widely used. I am especially interested to know if there is already something similar to what I describe.
  2. Describe one method/system that I thought of which initially seemed reasonable but does not seem to work—it leads to apparent self-contradictions—yet, it appears to occasionally be more reasonable/effective than the alternatives I am currently aware of (e.g., “completely assume one or the other framework is correct,” “go with your gut/intuition”), at least as a minimal test rather than a sufficient criterion.
  3. Get initial feedback on the described system, if only for obvious flaws that I may have overlooked.
  4. Highlight the self-contradiction point as a possible argument against “average utilitarianism”[2] and see if people have quick thoughts about the validity/impact of this argument, with the disclaimer that I haven’t explored it very deeply on my own because (a) I want to first get a better sense of the literature on moral uncertainty that already exists (so that I avoid completely reinventing the wheel here, which I admittedly may have already partially done), and (b) I would not be surprised to find that the system I developed has important flaws which render this discussion moot.



Over the past two years, I’ve updated my beliefs away from average utilitarianism (“averagism”) and towards total utilitarianism (“totalism”).[3] In this post, I’ll simplify the differences between the two frameworks to just focus on the idea of “potential people”—for example, “is a world with 100 people who on average experience ‘10 utils’ better than a world with 1000 people who experience 5 utils on average.”

For most everyday decisions, this is not very important, but when thinking about some issues (especially those relating to longtermism/x-risks/s-risks)[4] it becomes a potentially very important distinction. However, I still have uncertainties about how to balance between the two frameworks given some residual uncertainty, and I also want to have some advice/framework to give to people who might also be unsure of how to deal with this moral uncertainty—at least as a way to challenge extremely one-sided reasoning in favor of smaller populations (e.g., “if I think that averagism is >50% likely to be valid, I should just ignore all concerns from totalism and maximize the average wellbeing no matter how much supposed opportunity cost this involves from the standpoint of total wellbeing"). I am especially inclined to discover such a system since I think that you can make compelling arguments (e.g., references to repugnant conclusions and reverse repugnant conclusions) that suggest it is unjustified to assume a high degree of confidence (e.g., >95%) in one system or the other without doing deeper analysis.

A few months ago, I decided that I would try to develop a system that helps to handle this moral uncertainty at least better than “this is what my gut tells me,” or “I’ll make decisions per totalism 25% of the time, and averagism 75% of the time,” or “I’ll try to make decisions where the sum of the percentage increases in average wellbeing and total wellbeing is net positive (e.g., ‘25% increase in average wellbeing with only a 10% decrease in total wellbeing’).”[5]


The Conceptual Reasoning

The system I developed is based on the reasoning behind the (non-Rawlsian) veil of ignorance, where the goal is to maximize wellbeing among “people that matter.” The problem is that averagism contends that “non-existent potential people” don’t matter for this calculation whereas totalism contends such people do matter, and it is unclear how to compare aggregates to averages. So, I wondered, “what if you just resolve the disagreement by using the common metric of ‘average wellbeing among people that matter’—but assume that if totalism is valid then when comparing two options (World A and World B) additional ‘potential people’ who would exist in one world but don’t in the other should be treated as experiencing zero utils in the latter?”

One could perhaps conceptualize this proposal by assuming that the veil of ignorance predates the question of whether people exist—as if one is trying to maximize average wellbeing among “potential people” in a “pre-existence waiting room,” where only a limited number of “people” come into existence while the rest never exist: totalism says that these non-existent people should be treated as if they have zero wellbeing, while averagism says that these non-existent people should be completely left out of the average. (For now I will just set aside the objection that this waiting room could somehow have “infinite” potential people, and simply focus on the differences between actual options, e.g., 1 billion vs. 10 billion people.)

The following conceptual diagram may or may not help, but in the next section I give a concrete example with numbers and eventually even a spreadsheet with example cases.

The following is a simplified illustration of the model, still without concrete values. It may look like a lot, but at its core it's basically just a 2x2 matrix: do potential people matter, and option A or option B. The apparent complexity mainly just enters when one has to weight for their credence in different ethical positions. (Disclaimer: there may be errors in this, as I added it somewhat hastily after initially posting the article)


A Concrete/Quantitative Example

To use concrete numbers, suppose you could take some action that would lead to a world (World A) with 1 billion people and an average utility of 10 utils per person, or you could take an action which leads to a different world (World B) with 10 billion people and an average utility of 5 utils per person. Following the reasoning I describe, World B is preferable if one assumes that totalism is 100% valid, whereas World A is preferable if one assumes that averagism is 100% valid. But what if you’re somewhat uncertain between the two—for example, if you put 60% credence on averagism (non-existent potential people don’t matter) and 40% credence on totalism (non-existent potential people’s wellbeing should be treated as 0)? What if the average utilities among existing people change? The screenshot below shows some example calculations.


Key Observations

It seems intuitively appealing and occasionally better than alternatives I know of

I realized fairly quickly that this probably had some logical flaws—after all, it is simple and is fast and loose with reasoning around averages. However, one of the key observations I’ve had is that I feel like this is more reasonable than the alternatives I was aware of, and it at least sometimes moderates dubiously extreme choices, such as “my credence in averagism is 80% and World A has one single person with 100 utility, whereas World B has one billion people with 95 utility; I choose World A because it has the higher average utility and I think averagism is more likely to be correct.” 

Additionally, I probably could have been persuaded to accept this compromise even when I preferred averagism: I was supportive of the reasoning behind the (non-Rawlsian) veil of ignorance, but I wasn’t sure how to factor in “non-existent potential people” and I was skeptical of totalism in light of arguments such as the repugnant conclusion. However, if someone highlighted that there’s a chance that non-existent potential people should be treated as zeros in the average, I might have accepted that in my reasoning, especially since it still appeals to the “maximizing expected wellbeing as if you didn’t know who you are (or could have been)” aspect of the veil of ignorance. 

But it still seems to have problems with scale-insensitivity

Admittedly, I also found this system somewhat unsatisfying because it seems fairly scale-insensitive and still seemingly endorses extreme choices such as “my credence in averagism is 51% and World A has one single person with 100 utility, whereas World B has ten trillion people with 50 utility; I should choose World A.” (If I understood and remember correctly, if one’s ratio of credence in averagism vs. totalism exceeds the ratio of average utility in World A vs. World B, it automatically rules in favor of World A no matter how large the disparity in population size is between the two worlds, which just seems… wrong?) 

In some ways this could make the system worse than non-explicit alternatives such as “going with one’s intuition.” Still, it at least seems to be a more-agreeable floor than something like “just operate with the assumptions of whichever framework you assign a higher credence to.” In other words, it seems that passing this criterion may not be sufficient to justify a decision in favor of a world with a smaller population, but that failing this criterion should be strong evidence against a decision that favors a world with a smaller population. (To be clear, I still have a lot of uncertainty regarding these claims.)

And it seems to produce self-contradictory results

Another key observation is that this system seems to break down from logical self-contradiction (see below), and I’m not sure 1) if this can be easily resolved with modifications, and 2) whether this is an indictment against the idea of using “average wellbeing.”

The apparent self-contradiction comes when one breaks down an individual decision into multiple, partial steps, as shown in the spreadsheet below: the system in test case 9-1-A states that World A is preferable to World B in that scenario, yet by breaking this one decision down into two steps/decisions through test cases 9-1-B and 9-1-C (which both rule in favor of their respective World B), you cause the system to seemingly prefer the original World B in test case 9-1-A. In other words, the system basically claims that A is preferable to B, B is preferable to C, and C is preferable to A (or in a shorter format, A > B > C > A), which is an illogical inequality. Moreover, as shown with test case group 9-2, it produces the reverse reasoning (A < B < C < A) when the average utility is set to a negative value. (The bright red, orange, and yellow cell colors below correspond to a given recurrent world.)

It seems entirely plausible that I’m just doing something illogical with the averages and/or that some minor modifications to the logic would resolve the apparent contradiction, but I’ve not been able to quickly figure out what I might be doing wrong. Moreover, the original chain of reasoning still seems intuitively appealing: “perhaps I should incorporate the possibilities that I wouldn’t exist in World A and that such non-existence is morally relevant.”


Is this contradiction an argument against using average wellbeing?

I have yet to devise a similar system that uses the common metric of aggregate wellbeing instead of average wellbeing, and more generally I have not explored the contradiction in substantial depth. The reason for this is that, as hinted in the summary, I am concerned that all of my efforts in creating this system and writing this post will be obliterated within 20 minutes of someone else reading and saying “oh, something similar was already covered in XYZ introduction” or “you have a flawed assumption in your reasoning when you say ABC.” Of course, it’ll be a good lesson for the future, but given this possibility I think I should refrain from deeper investigation, given the opportunity costs.

That caveat aside, I ask this question because the apparent contradiction I highlight might indicate a flaw in the reasoning behind averagism: If

  1. Someone who supports averagism concedes that you should incorporate the possibility that non-existent potential people matter, and 
  2. The conceptual reasoning I describe above is a legitimate extension/interpretation of averagism attempting to incorporate such possibility, yet
  3. The reasoning does have and cannot escape self-contradiction, 

Then: The whole system seems to have a defect which could stem from the reliance on averagism.

As I initially developed this system, I found myself agreeing with the first two points (and probably would have agreed with them back when I leaned towards averagism), but I discovered the apparent contradiction and now agree with the third point without finding a reason to change my views on the first two points. I suspect that the flaw lies with my own reasoning rather than an entire ethical framework, but pending further insights I cannot pinpoint where the flaw lies, or whether this whole thing is a fool’s errand.


Concluding Remarks/Questions

I struggled to quickly find a system that was remotely similar to this when I did some research (although I realize I probably should have tried harder earlier on). Thus, if anyone could point me in the right direction for similar attempts at resolving this question of moral uncertainty, I would be quite appreciative. Moreover, I would love to hear if anyone has feedback for the system or reasoning I describe, including whether there is something to explore with regards to the self-contradiction undermining averagism.


  1. ^

    I couldn't quickly think of a less negative phrase, but this is just meant to say "an argument which leads someone to realize contradictions in their own position."

  2. ^

    More precisely, by “average utilitarianism” I am referring to the position that “non-existent potential people” (as described later) should not be counted as part of the average wellbeing of a given world.

  3. ^

    If I had to put numbers on it, I would probably say I was like 60-40 in favor of average utilitarianism a few years ago (but never really spent that much time thinking about it), whereas now I’m probably more than 80% in favor of total utilitarianism.

  4. ^

    It actually may not be important for many relevant decisions regarding x-risk—i.e., averagism might conclude something like “the average person in the future will be extremely happy, and thus it is worth achieving that future to raise the total average across time.” However, one might not think that future people will be extremely happy (as opposed to being numerous)—or if an averagism-leaning CEO of a company that developed aligned AGI wanted to impose their own moral view on reality, they might choose to optimize average wellbeing at extremely high costs to total wellbeing (which I think some people would consider an s-risk or existential catastrophe).

  5. ^

    To be clear, I accept that there may be some situations where this last approach is more effective/efficient if only as a way to resolve disagreement between decision-makers.