Arguments for moral indefinability

by richard_ngo8th Feb 20198 comments

31

Frontpage

Epistemic status: I endorse the core intuitions behind this post, but am only moderately confident in the specific claims made. Also, while I do have a degree in philosophy, I am not a professional ethicist, and I’d appreciate feedback on how these ideas relate to existing literature.

Moral indefinability is the term I use for the idea that there is no ethical theory which provides acceptable solutions to all moral dilemmas, and which also has the theoretical virtues (such as simplicity, precision and non-arbitrariness) that we currently desire. I think this is an important and true perspective on ethics, and in this post will explain why I hold it, with the caveat that I'm focusing more on airing these ideas than constructing a watertight argument.

Here’s another way of explaining moral indefinability: let’s think of ethical theories as procedures which, in response to a moral claim, either endorse it, reject it, or do neither. Moral philosophy is an attempt to find the theory whose answers best match our intuitions about what answers ethical theories should give us (e.g. don’t cause unnecessary suffering), and whose procedure for generating answers best matches our meta-level intuitions about what ethical theories should look like (e.g. they should consistently apply impartial principles rather than using ad-hoc, selfish or random criteria). None of these desiderata are fixed in stone, though - in particular, we sometimes change our intuitions when it’s clear that the only theories which match those intuitions violate our meta-level intuitions. My claim is that eventually we will also need to change our meta-level intuitions in important ways, because it will become clear that the only theories which match them violate key object-level intuitions. In particular, this might lead us to accept theories which occasionally evince properties such as:

  • Incompleteness: for some claim A, the theory neither endorses nor rejects either A or ~A, even though we believe that the choice between A and ~A is morally important.
  • Vagueness: the theory endorses an imprecise claim A, but rejects every way of making it precise.
  • Contradiction: the theory endorses both A and ~A (note that this is a somewhat provocative way of framing this property, since we can always add arbitrary ad-hoc exceptions to remove the contradictions. So perhaps a better term is arbitrariness of scope: when we have both a strong argument for A and a strong argument for ~A, the theory can specify in which situations each conclusion should apply, based on criteria which we would consider arbitrary and unprincipled. Example: when there are fewer than N lives at stake, use one set of principles; otherwise use a different set).

Why take moral indefinability seriously? The main reason is that ethics evolved to help us coordinate in our ancestral environment, and did so not by giving us a complete decision procedure to implement, but rather by ingraining intuitive responses to certain types of events and situations. There were many different and sometimes contradictory selection pressures driving the formation of these intuitions - and so, when we construct generalisable principles based on our intuitions, we shouldn't expect those principles to automatically give useful or even consistent answers to very novel problems. Unfortunately, the moral dilemmas which we grapple with today have in fact "scaled up" drastically in at least two ways. Some are much greater in scope than any problems humans have dealt with until very recently. And some feature much more extreme tradeoffs than ever come up in our normal lives, e.g. because they have been constructed as thought experiments to probe the edges of our principles.

Of course, we're able to adjust our principles so that we are more satisfied with their performance on novel moral dilemmas. But I claim that in some cases this comes at the cost of those principles conflicting with the intuitions which make sense on the scales of our normal lives. And even when it's possible to avoid that, there may be many ways to make such adjustments whose relative merits are so divorced from our standard moral intuitions that we have no good reason to favour one over the other. I'll give some examples shortly.

A second reason to believe in moral indefinability is the fact that human concepts tend to be open texture: there is often no unique "correct" way to rigorously define them. For example, we all know roughly what a table is, but it doesn’t seem like there’s an objective definition which gives us a sharp cutoff between tables and desks and benches and a chair that you eat off and a big flat rock on stilts. A less trivial example is our inability to rigorously define what entities qualify as being "alive": edge cases include viruses, fires, AIs and embryos. So when moral intuitions are based on these sorts of concepts, trying to come up with an exact definition is probably futile. This is particularly true when it comes to very complicated systems in which tiny details matter a lot to us - like human brains and minds. It seems implausible that we’ll ever discover precise criteria for when someone is experiencing contentment, or boredom, or many of the other experiences that we find morally significant.

I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs. I explore this objection in much more detail in this post. (In fact, the more general idea that humans aren’t really “utility maximisers”, even approximately, is another good argument for moral indefinability.) And even if idealised reflection is a coherent concept, it simply passes the buck to your idealised self, who might then believe my arguments and decide to change their meta-level intuitions.

So what are some pairs of moral intuitions which might not be simultaneously satisfiable under our current meta-level intuitions? Here’s a non-exhaustive list - the general pattern being clashes between small-scale perspectives, large-scale perspectives, and the meta-level intuition that they should be determined by the same principles:

  • Person-affecting views versus non-person-affecting views. Small-scale views: killing children is terrible, but not having children is fine, even when those two options lead to roughly the same outcome. Large-scale view: extinction is terrible, regardless of whether it comes about from people dying or people not being born.
  • The mere addition paradox, aka the repugnant conclusion. Small-scale views: adding happy people and making people more equal can't make things worse. Large-scale view: a world consisting only of people whose lives are barely worth living is deeply suboptimal. (Note also Arrhenius' impossibility theorems, which show that you can't avoid the repugnant conclusion without making even greater concessions).
  • Weighing theories under moral uncertainty. I personally find OpenPhil's work on cause prioritisation under moral uncertainty very cool, and the fundamental intuitions behind it seem reasonable, but some of it (e.g. variance normalisation) has reached a level of abstraction where I feel almost no moral force from their arguments, and aside from an instinct towards definability I'm not sure why I should care.
  • Infinite and relativistic ethics. Same as above. See also this LessWrong post arguing against applying the “linear utility hypothesis” at vast scales.
  • Whether we should force future generations to have our values. On one hand, we should be very glad that past generations couldn't do this. But on the other, the future will probably disgust us, like our present would disgust our ancestors. And along with "moral progress" there'll also be value drift in arbitrary ways - in fact, I don't think there's any clear distinction between the two.

I suspect that many readers share my sense that it'll be very difficult to resolve all of the dilemmas above in a satisfactory way, but also have a meta-level intuition that they need to be resolved somehow, because it's important for moral theories to be definable. But perhaps at some point it's this very urge towards definability which will turn out to be the weakest link. I do take seriously Parfit's idea that secular ethics is still young, and there's much progress yet to be made, but I don't see any principled reason why we should be able to complete ethics, except by raising future generations without whichever moral intuitions are standing in the way of its completion (and isn't that a horrifying thought?). From an anti-realist perspective, I claim that perpetual indefinability would be better. That may be a little more difficult to swallow from a realist perspective, of course. My guess is that the core disagreement is whether moral claims are more like facts, or more like preferences or tastes - if the latter, moral indefinability would be analogous to the claim that there’s no (principled, simple, etc) theory which specifies exactly which foods I enjoy.

There are two more plausible candidates for moral indefinability which were the original inspiration for this post, and which I think are some of the most important examples:

  • Whether to define welfare in terms of preference satisfaction or hedonic states.
  • The problem of "maximisation" in utilitarianism.

I've been torn for some time over the first question, slowly shifting towards hedonic utilitarianism as problems with formalising preferences piled up. While this isn't the right place to enumerate those problems (see here for a previous relevant post), I've now become persuaded that any precise definition of which preferences it is morally good to satisfy will lead to conclusions which I find unacceptable. After making this update, I can either reject a preference-based account of welfare entirely (in favour of a hedonic account), or else endorse a "vague" version of it which I think will never be specified precisely.

The former may seem the obvious choice, until we take into account the problem of maximisation. Consider that a true (non-person-affecting) hedonic utilitarian would kill everyone who wasn't maximally happy if they could replace them with people who were (see here for a comprehensive discussion of this argument). And that for any precise definition of welfare, they would search for edge cases where they could push it to extreme values. In fact, reasoning about a "true utilitarian" feels remarkably like reasoning about an unsafe AGI. I don't think that's a coincidence: psychologically, humans just aren't built to be maximisers, and so a true maximiser would be fundamentally adversarial. And yet many of us also have strong intuitions that there are some good things, and it's always better for there to be more good things, and it’s best if there are most good things.

How to reconcile these problems? My answer is that utilitarianism is pointing in the right direction, which is “lots of good things”, and in general we can move in that direction without moving maximally in that direction. What are those good things? I use a vague conception of welfare that balances preferences and hedonic experiences and some of my own parochial criteria - importantly, without feeling like it's necessary to find a perfect solution (although of course there will be ways in which my current position can be improved). In general, I think that we can often do well enough without solving fundamental moral issues - see, for example, this LessWrong post arguing that we’re unlikely to ever face the true repugnant dilemma, because of empirical facts about psychology.

To be clear, this still means that almost everyone should focus much more on utilitarian ideas, like the enormous value of the far future, because in order to reject those ideas it seems like we’d need to sacrifice important object- or meta-level moral intuitions to a much greater extent than I advocate above. We simply shouldn’t rely on the idea that such value is precisely definable, nor that we can ever identify an ethical theory which meets all the criteria we care about.

8 comments, sorted by Highlighting new comments since Today at 2:54 AM
New Comment

Here is a vaguely related rough project proposal I once wrote (apologies for the academic philosophy jargon):

"Implications of evaluative indeterminacy and ‘ontological crises’ for AI alignment

There is a broadly Wittgensteinian/Quinean view which says that we just can’t make meaningful judgments about situations we’re too unfamiliar with (could apply to epistemic judgments, evaluative judgments, or both). E.g. Parfit briefly mentions (and tries to rebut) this in Reasons and Persons before discussing the more science-fiction-y thought experiments about personal identity.

A more moderate variant would be the claim that such judgments at least are underdetermined; e.g. perhaps there are adequacy conditions (say, consistency, knowing all relevant facts, ...) on the process by which the judgment is being made, but the judgment’s content is sensitive to initial conditions or details of the process that are left open.

One reason to believe in such views could be the anticipation that some of our current concepts will be replaced in the future. E.g. perhaps ‘folk psychology’ will be replaced by some sort of scientific theory of consciousness. In LW terminology, this is known as ‘ontological crisis’.

Due to its unusual spatiotemporal and conceptual scope, the question of how to best shape the long-term future – and so by extension AI alignment – depends on many judgments that seem likely candidates for being un(der)determined if one of those views is true, in areas such as: Population ethics on astronomical scales; consciousness and moral patienthood of novel kinds of minds; axiological value of alien or posthuman civilizations; potential divergence of currently contingently relatively convergent axiologies (e.g. desire satisfaction vs. hedonism); ethical implications of creating new universes; ethical implications of ‘acausal effects’ of our actions across the multiverse, etc. (Some of these things might not even make sense upon scrutiny.)

Some of these issues might be about metaethics; some might be theoretical challenges to consequentialism or other specific theoretical views; some might have practical implications for AI alignment or EA efforts to shape the long-term future more broadly."

This post really turned on a lightbulb for me, and I have thought about it consistently in the months since I read it.

Multiple terminal values will always lead to irreconcilable conflicts.

(1) Do you hold suffering to be a terminal value motivating its minimization for its own sake?

(2) Do you also hold that there is some positive maximand that does not ultimately derive its value from its instrumental usefulness for minimizing suffering?

Anyone who answers yes to both (1) and (2) is not a unified entity playing one infinite game with one common currency (infinite optimand), but contains at least two infinite optimands, because with limited resources, we will never fully satisfy any single terminal value (e.g., its probability of being minimized or maximized throughout space & time).

I’ve been working on a compassion-centric motivation unification (as an improvement of existing formulations of negative utilitarianism) because I find it the most consistent & psychologically realistic theory that solves all these theoretical problems with no ultimately unacceptable implications. To arrive at practical answers from thought experiments, we want to account for all possibly relevant externalities of our scenarios. For example, the practical situations of {killing children} vs. {not having children} do not have “roughly the same outcome” in any scenario I can think of (due to all kinds of inescapable interdependencies). Similarly, compassion for all sentient beings does not necessarily imply attempting to end Earth (do you see Buddhists researching that?), because technocivilization might reach more & more exoplanets the longer it survives, or at least want to remain to ensure that suffering won’t re-evolve here.

To further explore intuitions between terminal value monism vs. terminal value pluralism, can you order the following motivations by your certainty of holding them as absolutes?

(A) You want to minimize suffering moments.

(B) You want to minimize the risk of extinction (i.e., prolong the survival of life/consciousness).

(C) You want to maximize happy moments.

I sometimes imagine I’m a Dyson sphere of near-infinite resources splitting my budget between these goals. I find that B is often instrumental for A on a cosmic scale, but that C derives its budget entirely from the degree to which it helps A: equanimity, resilience, growth, learning, awe, gratitude, and other phenomena of positive psychology are wonderful tools for compassionate actors for minimizing suffering, but I would not copy/boost them beyond the degree to which they are the minimizing way. In other words, I could not tell my Exoplanet Rescue Mission Department why I wanted to spend their resources on creating more ecstatic meditators on Mars, because A is only interested in instruments for minimizing suffering. Besides, I wouldn’t undergo surgery without anaesthesia for any number of meditators on Mars, because they wouldn’t help my suffering; in a world where anaesthetics opportunity-cost a monastery on Mars, what would you do? Is “outweighing” between terminal values an actual physical computation taking place anywhere outside an ethicist’s head, or a fiction?

Multiple terminal values will always lead to irreconcilable conflicts.

This is not the case when there's a well-defined procedure for resolving such conflicts. For example, you can map several terminal values onto a numerical "utility" scale.

This is not the case when there's a well-defined procedure for resolving such conflicts.

Yes, but there isn’t. The theoretical case for terminal value monism is strong because monism doesn’t need such a procedure. All forms of terminal value pluralism run into the problem of incommensurability; monism doesn’t. With monism, we can evaluate and compare x-risks and positive psychology states & traits (vs. suffering) by their instrumental effects for minimizing suffering (which may be empirically difficult, but not theoretically impossible). What more do we want from a unified theory?

Do we want the slightest (epsilon) increase in x-risk to end up weighing more than any suffering? If this is our pre-decided definition, are we going to give up on suffering as a terminal value, caring about suffering only if it increases x-risk?

For example, you can map several terminal values onto a numerical "utility" scale.

I can’t. Who can? (In a non-arbitrary way we could agree on from behind the veil of ignorance.) The analogy of a scale requires a common dimension by which the comparator can sort the two (mass, in the analogy of a scale). What is a common dimension for intercomparing suffering & x-risk, or suffering & positive states? An arbitrary numerical assignment?

To arrive at an impartial theory that doesn’t sanctify our self-serving intuitions, we’d want to formulate our terminal value pluralism, “behind the veil”, by agreeing on independent numerical utility values for different terminal value-grounded currencies, such as:

(1) +epsilon probability of human extinction

(2) +epsilon probability that someone undergoes, e.g., a cluster headache episode (or equivalent)

(3) +epsilon probability that someone instantiates/deepens a positive psychology state [requiring a common dimension for “positivity”, unlike monism]

With monism, we don’t need to agree on definitions & values for multiple such currencies. Instead, we want to ground the (dis)value of other values in their relationships to extreme suffering, which everyone already finds terminally motivating in their own case (unlike x-risk reduction or positivity-production, worth noting). I wouldn’t agree to any theory where extreme suffering can be outweighed by enough positivity elsewhere, because outweighing “does not compute”: positivity can only outweigh suffering if it reduces even more suffering, but not by itself, because a positive fantasy of infinite utility is not an antidote to suffering, because the aggregate terminal positivity physically exists only as a fantasy [an imaginary spreadsheet cell] that never interacts with our terminal suffering.

Without monism, how do we agree on the pluralist numerical values if we could end up undergoing the cluster headache-equivalent suffering ourselves (i.e., simulating an impartial compassion)? Are we to trust that cluster headaches aren’t so bad, when they’re outweighed according to a formula that some (not all) people agreed on?

This is great – thank you for taking the time to write it up with such care.

I see overlap with consequentialist cluelessness (perhaps unsurprising as that's been a hobbyhorse of mine lately).

My vague impression is that this is referred to as pluralism in the philosophy literature, and there are a few philosophers at GPI who subscribe to this view.

From skimming the SEP article on pluralism, it doesn't quite seem like what I'm talking about. Pluralism + incomparability comes closer, but still seems like a subset of my position, since there are other ways that indefinability could be true (e.g. there's only one type of value, but it's intrinsically vague)