Power Laws of Value

tylermjohn

14-word summary: What if weird stuff matters a lot but no one cares about weird stuff?

In the long term future, how much of a mismatch will there be between what matters most and what agents value? The hypothesis I want to consider is that the mismatch will be enormous, such that a tiny fraction of value is captured, perhaps as small a fraction as 10^-20.^[1]

The reason is that moral value seems to be a power law. A few things are extremely valuable and most things are not. More specifically, over all of the different things that humans like and find morally appealing (or would if that thing were detailed) a very small subset of those things are many orders of magnitude more important than others.

Suppose furthermore that it is not likely that the moral arc of the universe bends towards justice, such that our descendants ultimately converge on valuing the right things in the right ways.

Then there is a powerful argument that the expected value of the future is very low. If morality is profoundly power-law distributed across human motivations, then we’re likely to attain only a very small amount of total value if human motivations shape the long-term future. Given what people want to create, promote, and preserve, we should expect the vast majority of what exists to be of minimal value.^[2]

But if it’s not very power-law distributed, we should expect to attain most long-run value if human motivations shape the long-term future, since most of what humans want to promote is of roughly the same value.

The value of the future, relative to what is possible, is a direct function of the shape of the underlying distribution of value across state space.

Let’s formalise the hypothesis slightly further:

Moral Power Laws (MPL): Across all of the states of the world that humans like intrinsically and want to promote and preserve (weighted by the strength of human desires to do so), moral value is a power law.

To get the intuition for what this means, you can imagine a two-dimensional line plot.^[3] On the x-axis are properties that humans like. These properties get more space on the x-axis the more humans like them, and they’re arranged in order of moral value. On the y-axis is moral value. In linear scale, the line plotted is a hockey stick.

Here are some intuition pumps for buying the Moral Power Laws hypothesis:

There are a lot of ways to arrange 86 billion neurons. You could give them to one human, to 430 rats, or to 86 billion nematodes. Intuitions differ on the direction of the effect, but probably one of these is vastly more valuable than the other two.
Beyond biology, there are probably many ways to create conscious awareness, but some of these ways are vastly more efficient than others. Neurons are likely an inefficient medium for awareness.
For classical utilitarians, “hedonium” is likely many orders of magnitude more valuable than human brains (or the equivalent instantiated in silico). Bostrom calculates that, at the limit, computers could use matter about 10^20 times as efficiently as brains for whatever computations they do. And beyond that there is wire-heading, i.e. cutting away all of the computations in human brains inessential for pleasure. Given how many functions the brain performs, and how infrequently the pleasure centres are turned on, this should increase the total amount of pleasure at least a few orders of magnitude.

Benthamite utilitarians aren’t the only ones who look like they might want to buy into the Moral Power Laws hypothesis. Lots of non-utilitarian moral theories are fragile,^[4] where seemingly tiny and unimportant differences make massive differences in moral importance.

On views where diversity is valued intrinsically, the value of a thousand identical spheres is (say) 1/1,000 of the value of those spheres with a single electron varied in each.
Given various kinds of lexicality (e.g. Millian superiorities), a tiny amount of one object will be vastly better than a lot of that object slightly modified. (Consider the precise borderline between Mill’s higher and lower pleasures.)
If you value pretty much anything (e.g. consciousness, desire satisfaction), there’s likely to be a sharp line in phase space where a tiny change to the property makes an all-or-nothing difference to value. And these are ill-defined notions in folk theory, meaning that it may be very difficult to identify precisely the border where the phase change happens, such that we could get it wrong.

For moral theories where value is extremely fragile in this way, you could easily miss out on most of the value from getting small, seemingly unimportant details wrong.

So with the intuition pumps out of the way let’s evaluate the hypothesis.

Is the Moral Power Laws hypothesis true?

Let’s assess whether MPL seems true today and how that will change in the future, before moving on to specific objections. The key ideas here are twofold:

In the future we’re going to have numerous options and pursue a wider variety of possible goals as a consequence, leading to more divergence between humanity’s values and what matters.
From a number of moral viewpoints, it looks like there are going to be things that matter many orders of magnitude more than anything else, and which almost no one is very interested in making happen.

Return to the semi-metaphorical line plot constructed in the opening. The x-axis is what humans value, arranged by their importance, with more space going to things that humans value more. The y-axis is value. How hockey-stick shaped is the line that we would plot on that graph today, and how will that plot change in the future as technology makes more things possible?

Whether MPL holds today depends substantially on how much you value nonhuman animals. If you don’t value nonhuman animals at all, then you might think human values are pretty well correlated with what matters. Sure, humans value intrinsically and want to promote much that isn’t intrinsically very valuable (like Notre Dame — sorry), and could care more about their outgroup. But in general, we put most of our moral weight on humans’ welfare, and in a fairly universalist way, which is where about all of the value is. We don’t care that much about pleasure, and could desire to optimize it more, but we are pretty constrained by technology and nature in what we can do, so we wouldn’t gain that much more value if we did.

But if nonhuman animals are valued significantly, the graph quickly becomes a hockey stick. Humans in the round care little about farmed animals or wild animals (let alone insects), who may account for around 99.99999% of total value.

So depending on your values you might think that our graph is a sloped line or a sideways L. Even if it’s a sideways L, though, the fact that humans care fairly universally about human welfare is a good start. If animals account for around 99.9999% of total value, then human values are still correlated with about 1 in a millionth of total value, which could be worse.

The direction of history seems overall pretty good regardless of your perspective: however much suffering there is in the world today, we care a lot more than we did in years past about outsiders’ welfare, including that of nonhuman animals.

However, in the future I expect this trend to reverse, and for value to be much more power-law distributed as our option space changes drastically. As humanity wins victory over nature, the world will increasingly be much less selected for by biological necessity and much more selected for by human desires and the physical laws. In this situation we won’t be constrained to a relatively small range of valuable objects (animals) but will instead be able to create any conceivable thing in value space.

So the option space is going to get much bigger. We’ll have to make choices about whether to preserve habitable solar systems for biological life or turn them into Dyson swarms; or about whether to turn each Dyson swarm into insentient digital slaves to learn and accrue power, 10^20 human minds in silico, a single God-like mind fully consuming all of the star’s energy, or 10^40 disintegrated momentary pleasure blips — and thousands of nearby computational variants of each of these.

In addition to increasing option space, I expect there to be increasing possibilities for variation in what we value. In a slow takeoff scenario, where there is no value lock-in and humans gradually settle the stars and develop their own local cultures and tastes, I’d expect humans to come to value a very wide variety of things. If you think the value differences between Islam, Hinduism, Catholicism, Republicans, leftists, environmentalists, rationalists, transhumanists, utilitarians, Kantians, and Aristotelians are large, they’d grow vastly larger again with a much wider set of possibilities for us to actualize as well as intellectual and aesthetic currents to explore. In this world I would expect massively greater divergence by default.

(I don’t know if my folk history is accurate, but my understanding is that early religions and cultures had a lot in common with each other — e.g. big gods, animism — because these groups had a lot in common economically and geographically. Things seem vastly more ideologically varied today due to some combination of time, information, and possibility.)

While I don’t expect slow takeoff like this (I instead expect eternity in 6 hours), the slow takeoff scenario is an intuition pump for how much divergence we could see in different fast takeoff futures. If a slow takeoff would see this much divergence across space, then at a first pass one would expect to see this much divergence as well across alternative future trajectories in lock-in worlds.

So in the future we’re going to have numerous options and pursue a wider variety of possible goals as a consequence.

A key question, then — echoing back to our dilemma around nonhuman animals — is whether these widely diverging goals are fairly similar in moral value, or whether some of these goals are vastly more important than the others.

The simplest way to argue that MPL will hold over our descendants’ future option space is to point to some things that humans won’t want to create a lot of but which are overwhelmingly valuable. Then MPL follows.

On classical utilitarianism this is straightforward. If what you want to maximize is the aggregate amount of total pleasure in the universe, you are going to have to pursue something wholly bizarre. Let’s go back to 86 billion neurons. There’s currently very little that can convincingly be said for or against allocating those neurons to one human, 430 rats, or 86 billion nematodes. In this Earthly case, we might already find that the utilitarian is going to make some odd judgments that fly in the face of what most humans care about. Now let’s ratchet up the neuron count to 86 trillion. Will the utilitarian want to create 1,000 humans, or instead create a much larger mind consisting of all of these neurons? It would be very surprising if, between a single archangel’s brain, 1,000 human brains, 430,000 rat brains, and 86 trillion nematode brains, the function from neurons to value was such that the human brains sat directly in the moral optimum.^[5] The human brain is probably not the global optimum for hedonic value. But that’s unfortunate, because humans really like human brains, and don’t much care for hedonium.

Suppose you find it boring to launch the same hedonic shockwave over and over again, and instead value diversity intrinsically. Then you might want to take great care to ensure that we have a future mirroring Thomas Aquinas’s ontology,^[6] where every creature that is possible under Heaven has been created, and ensure that we don't lock in a small set of possibilities, destroying most value in the process.

And this argument generalises. Probably our evolved biological substrates are nowhere near the global optimum for anything. So if due to our ancestral environment we place disproportionate weight on the Earthly properties that we nostalgically hold dear, we’ll miss a large amount of value.

Overall, I think there is a good prima facie case that morality will be power-law distributed in the distant future on a range of ethical views.

I want to now consider a couple of specific objections from metaethics and ethics, and then finally consider a realistic future in which MPL is somewhat less relevant to the case for moral trajectory change than it may seem.

How value might not be power-law distributed part I: Metaethics

Here’s a dilemma for the Moral Power Laws hypothesis:

Either moral realism is true or moral antirealism is true.^[7]
If moral realism is true, MPL is false.
If moral antirealism is true, MPL is false.
Therefore, MPL is false.

The intuition is this: if moral realism is true, we’ll converge on the best world. If moral antirealism is true, then bizarre edge cases like hedonium don’t matter, because humans don’t care about bizarre edge cases like hedonium.

My main reply is that this is much too quick, and it depends on a lot of choice points. The key reply to the moral realist is that morality is not like gravity, some universal attraction that pulls everything together over infinite time. Whether to expect convergence on the best outcomes is highly sensitive to initial conditions. The key reply to the moral antirealist is that valuers can greatly value some strange things, even if these things aren’t written in the stars. But both objections have some bite.

Moral realism

Philosophers are generally attracted to three kinds of moral realism.^[8]

Non-naturalist moral realism
Naturalist moral realism
Kantian constructivism

On non-naturalist moral realism, morality is like a Platonic Form. There’s a truthmaker for moral statements independent of any physical property or any human desire. However, for non-naturalists morality has no causal power. The standard view is that humans just happened to get lucky by stumbling upon the right moral view. Morality doesn’t pull us towards it, it’s just out there and some of us happen to be lucky enough to agree with it. If this view is true, then we should very simply not expect convergence by default, since morality is not an attractor state.

On naturalist moral realism, morality is a natural property like mass or charge which we detect with our senses. For example, some naturalists think we detect moral properties through empathy, or through introspection, instead of, say, through vision and touch. If this is true then we should expect some convergence towards morality since humans can’t help but sense it. But we better hope that our descendants share the same humanistic senses, otherwise they will not themselves be attracted towards it.

On Kantian constructivism, there are attractor states in reasoning. Once you get in a certain kind of reasoning game, you’re forced by your own lights to a particular moral conclusion. Here again, we need to ask what sorts of reasoning properties you need to access this space of moral reasons. Do next-token prediction algorithms have the right architecture and foundational normative concepts to find their way into this attractor state? If not, then we should worry that our digital descendants will not find their way to the best world.

So on all of these popular realist metaethical views, there’s an open question whether to expect convergence, or how much convergence to expect. All realist metaethical views seem variously sensitive to the initial conditions of moral deliberation. Which makes sense — if you’re detecting a property you need to make sure your property detector is in good order.

Moral antirealism

Moral antirealism as I am articulating it here is the view that moral realism is false — faultless disagreement among ideal moral reasoners is possible. Classically, antirealists tend to think that morality is reducible to our values.

There’s two types of objections the moral antirealist might make to MPL. One is more convincing than the other.

First, the antirealist might complain that MPL is false by definition — things can’t be more valuable than how much we value them, since morality reduces to our values. So we’ll value things exactly as much as we should.

This objection falters for two reasons. One is moral disagreement. Perhaps in aggregate humans value things exactly as much as we should, but we don’t value hedonium (say) as much as you do. So you have a strong reason to change that fact. The second is that antirealism allows for moral error and learning. Someone who isn’t deeply in touch with what they care about can come to be better in touch with what they care about and thereby come to value things as much as they, in an important sense, should. So the analytic objection doesn’t work.

Second, the antirealist might doubt that anyone cares about very strange properties like maximal hedonic value. All of us grew up on Earth and evolved from Earthlings, so we care about Earthly things. Some of us might think we value heavenly things like hedonium, but if we thought carefully enough and learned enough cognitive science we would realize that we actually don’t have values that are an exponential function of aggregate pleasure. Perhaps we’d find that we are indifferent between lots of different ways of organizing matter, and value hedonium and shmedonium (its close cousin) and all of its other variants equally, even if they have vastly different computational properties.

I think this objection is a good one. If your values are just not very determinate over things not of this Earth, then you might not care much at all whether we turn every Dyson swarm into a single God-like mind or 10^40 momentary pleasure blips — or any of their nearby computational variants. And if you encountered all of the arguments for and against each alternative, and learned everything you could about the physical world, you still wouldn’t move an inch, or sway slightly one way or the other.

I think it’s a good objection, but I just don’t know if this is true. Certainly, I find myself swayed towards specific views on how much to value various insect minds. I came around to thinking that the whole of insect life is intrinsically more valuable than the whole of the human species. Is this just bad faith, and I should really be speciesist or indifferent, or is this getting in touch with what really matters to me? I find it difficult to say. But the objection is clearly not a knock-down, nor can it be dismissed.

Summing up

Moral realists still have to be worried about missing out on convergence, and moral antirealists can seemingly still value some strange things highly. So whichever horn you take on the dilemma, you should be open (at least on this basis) to the possibility that moral value is power-law distributed.

How value might not be power-law distributed part II: Ethics

There do however seem to be ethical views on which MPL is false. Here are a few:

You could accept diminishing returns to value in utility (with average utilitarianism as the most extreme version of this) so that happiness doesn’t matter much after you’ve already got a lot of it — but you’re unlikely to be a longtermist, laser focused on extinction risk if you do.
You could think that we are just on a razor-thin lucky edge where human brains are really good for making value, or that an axiologically serendipitous form of functionalism is true where the most computationally efficient world is also the happiest world.
You could accept deep incommensurability in value, as with the antirealist considered above, such that many futures can simply not be compared to one another, and you’d be indifferent towards them.

These views seem striking in their unattractiveness, but there are probably other, more attractive ethical views that do not imply MPL, or imply it less extremely.

A lower bound? How we might still capture most value given MPL

I’ve argued that MPL is quite plausible. A natural inference to draw is that there’s a good chance that our descendants attain only a power-law-small amount of total value, since by implication their values will be mostly uncorrelated with what matters most. But there are some arguments that we could nonetheless attain a large portion of total value, even if human values in the round are broadly uncorrelated with what matters.

The best argument I’ve heard for how we might capture a large portion of value, even if MPL is true, is the implicit argument in Will MacAskill’s afterwords to What We Owe the Future. (Spoilers ahead, you could first read the short story here.) In the story, humanity develops ASI and plans to settle the universe. Before doing so, they split up all of the stars in the affectable universe evenly amongst all of humanity and allow humanity to trade and bargain for stars amongst each other. The environmentalists trade for the habitable star systems (a small portion of the cosmos), so that they can preserve biological life. The fictional stand-in for classical utilitarians trade for a large number of the very far-away star systems – as they are patient and can wait a long time to promote moral value — and eventually use it to make hedonium. As a result, the environmentalists capture most of moral value by their lights, preserving all of the habitable planets. And classical utilitarians also capture most moral value by their lights, bargaining for an outsized proportion of the universe and the majority of usable stars, and maximizing utility there.

If the universe is settled via ideal trade and positive-sum cooperation, then most people might be able to capture most of what they value, by their own lights, even if morality is power-law distributed and even if moral disagreement is widespread. While I very much don’t expect this to happen by default, it shows the possibility of attaining most moral value even under MPL. Even if people’s values are in general highly uncorrelated with what matters most, outcomes like trade and cooperation can still allow most value to be attained. I think this is a very powerful argument for trade-based futures, and in general advancing futures where the cardinality in humanity’s utility function is well-preserved by judgment aggregation instead of using coarser methods of judgment aggregation like first-past-the-post voting.

If MPL is true, then the likelihood that an ideal form of trade and cooperation happens, multiplied by the amount of resources allocated to agents whose values are aligned with the best outcomes, forms one kind of lower bound for the value of the future.

On the other hand, If we don’t achieve ideal trade and positive-sum cooperation, or you can’t personally steer the world in the direction of your values, then you should take the MPL hypothesis as an argument that the expected total value of the future is extremely low compared to its possible ceiling.

On the positive side, it may also imply that there is still a role for effective altruism in 202500.

^{^}
I assume throughout that the value of the future is greater than zero. This is a simplification, but it is one I mostly stand by.
^{^}
Compare also Shulman 2012:
hedonistic utilitarians could approximate the net pleasure generated in our galaxy by colonization as the expected production of hedonium, multiplied by the "hedons per joule" or "hedons per computation" of hedonium (call this H), minus the expected production of dolorium, multiplied by "dolors per joule" or "dolors per computation"
^{^}
The plot is somewhat metaphorical but if you like we can make it more rigorous. For example, the x-axis can represent a computationally or resource-equivalent amount of each property to ensure that each property takes the same amount of resources to instantiate and so we are comparing apples to apples.

^{^}

Thanks to Fin Moorhouse for helping me frame this point.

^{^}

This is less surprising given certain versions of irrealism about consciousness and ethics. More on that in the next section.

^{^}

“we must say that the distinction and multitude of things come from the intention of the first agent, who is God. For He brought things into being in order that His goodness might be communicated to creatures, and be represented by them; and because His goodness could not be adequately represented by one creature alone, He produced many and diverse creatures, that what was wanting to one in the representation of the divine goodness might be supplied by another. For goodness, which in God is simple and uniform, in creatures is manifold and divided and hence the whole universe together participates in the divine goodness more perfectly, and represents it better than any single creature whatever.” Summa Theologica I:47:1

^{^}

One way to take issue with this argument is to think that moral realism is so poorly formulated that it isn’t even true or false, and so (1) is false, since antirealism is commonly just the negation of realism. If you have this complaint, so do I, and I’m sure I can come up with a reformulation that satisfies you.

^{^}

For simplicity I’m conflating realism and objectivity, with apologies to Kantians.

Show all footnotes

54 Reactions

Mentioned in

62Deep Democracy as a promising target for positive AGI futures

More posts like this

Comments21

Sorted by

New & upvoted

Click to highlight new comments since: Today at 1:13 AM

GideonFMar 1711

I really like this piece, and I think I share in a lot of these views. Just on some fairly minor points:

Deep Incommensurability. It seems like incommensurability helps with regards to avoiding MPL, but not actually that much. For example, there seem many moral theories (ie something that is somewhat like Person Affecting Views) that are incommensurable (or indifferent) between different size worlds, but not different qualities. So they may really care if it is a world of humans, or insects, or hedonium.

I can imagine views (they do run into non-identity, but maybe there is ways of formulating them that don't) that this would be a real problem. For example, imagine a view that holds that simulated human existence if the best form of life, but is indifferent between that and non-existence. As such, they won't care whether we leave the universe insentient, but faced with a pair-wise choice between hedonium and simulated humans, they will take the simulated humans everytime. So they don't care much if we do extinct, but do care if the hedonistic utilitarians win. indeed, these views may be even less willing to take trades than many views that care about quantity. I imagine many religions, particularly universalist religions like Christianity and Islam, may actually fall into this category.

I think some more discussion of the 'kinetics' vs 'equilibrium' point you sort of allude to seems pretty interesting. I think you could reasonably hold the view that rational (or sensing or whatever other sort of beings) beings converge to moral correctness in infinite time. But we are likely not waiting infinite time before locking in decisions that cannot be reversed. Thus, because irreversible moral decisions could occur at a faster rate than correct moral convergence (ie the kinetics of the process is more important than what it would be at equilibrium), we shouldn't expect the equilibrium process to dominate. I think you gesture towards this, but I think exploration of the ordering further would be very interesting.
I also wonder if views that are pluralist rather than monist about value may make the MPL problem worse or better. I think I could see arguments either way, depending on exactly how those views are formulated, but would be interesting to explore.

Very interesting piece anyway, thanks a lot, and really resonates with a lot I've been thinking about

I'm sure I'll have a few more comments at some point as I revisit the essay.

Michael St Jules 🔸Mar 18*11

(Edited to elaborate and for clarity.)

Thomas (2019) calls these sorts of person-affecting views "wide". I think "narrow" person-affecting views can be more liberal (due to incommensurability) about what kinds of beings are brought about.

And narrow asymmetric person-affecting views, as in Thomas, 2019 and Pummer, 2024, can still tell you to prevent "bad" lives or bads in lives, but, contrary to antinatalist views, "good" lives and goods in lives can still offset the bad. Pummer (2024) solves a special case of the Nonidentity problem this way, by looking at goods and bads in lives.

But these asymmetric views may be less liberal than strict/symmetric narrow person-affecting views, because they could be inclined to prevent the sorts of lives of which many are bad in favour of better average lives. Or more liberal, depending on how you think of liberalism. If someone would have a horrible life to which they would object, it seems illiberal to force them to have it.

I think these papers have made some pretty important progress in further developing person-affecting views.^[1]

^{^}
I think they need to be better adapted to choices between more than 2 options, in order to avoid the Repugnant Conclusion and replacement (St. Jules, 2024). I've been working on this and have a tentative solution, but I'm struggling to find anyone interested in reading my draft.

tylermjohnMar 173

Thanks a lot, @Gideon Futerman! Good additions, which all seem right to me.

Another question I've had on my mind is how much MPL is related to additive separability. You might at a first pass think that moral atomism makes you more likely to buy MPL, since you have so many different spaces of value to optimise. But holistic views can in principle lead to even sharper differences in the value of worlds — for example, you could have a view that says that you need to align all of the stars forever or you don't capture any value.

I'd like to have a clearer view on when moral viewpoints will tend to end up at MPL, but I don't have one yet.

LinchMar 267

Thanks a lot for this post! I tried addressing this earlier by exploring "extinction" vs "doom" vs "not utopia," but your writing here is clearer, more precise and more detailed. One alternative framing I have for describing the "power laws of value," hypothesis as a contrast of your 14-word summary:

"Utopia" by the lights of one axiology or moral framework might be close to worthless under other moral frameworks, assuming an additive axiology.

It's 23 words and has more jargon, but I think it describes my own confusions better. In particular, I don't think you need to believe in "weird stuff" to get to many OOMs of difference between "best possible future" and "realistic future", unless additive/linear axiology itself is weird.

As one simple illustration, humanity can either be correct or incorrect in colonizing the stars with biological bodies instead of digital emulations. Either way, if you're wrong you lose many OOMs of value

If we decide to go the biological route: biological bodies are much less efficient than digital emulations. it's also much more difficult, as a practical/short-term matter, to colonize stars with bodies, so you capture a smaller fraction of the lightcone.).
If we decide to go the digital route, and it turns out emulations don't have meaningful moral value (eg at the level of fidelity that emulations are seeded on, digital emulations are in practice not conscious), then we lose ~100.0000% of the value.

tylermjohnMar 263

Appreciate this comment, and very much agree. I generally think that humanity's descendents are going to saturate the stars with Dyson swarms making stuff (there's good incentives to achieve explosive growth) but I think we're (1) too quick to assume that, (2) too quick to assume we will stop being attached to inefficient earth stuff, and (3) too quick to assume the Dyson swarms will be implementing great stuff rather than, say, insentient digital slaves used to amass power or solve scientific problems.

Let's say there are three threat models here: (a) Weird Stuff Matters A Lot, (b) Attachment to Biological Organisms, (c) Disneyland With No Children (the machines aren't conscious).

I focused mainly on Weird Stuff Matters A Lot. The main reason I focused on this rather than Attachment to Biological Organisms is that I still think that computers are going to be so much more economically efficient than biology that in expectation ~75% of everything is computer. Computers are just much more useful than animals for most purposes, and it would be super crazy from most perspectives not to turn most of the stars into computers. (I wouldn't totally rule out us failing to do that, but incentives push towards it strongly.) If, in expectation, ~75% of everything is computer, then maximizing computer only makes the world better by 1/3.

I think the Disneyland With No Children threat model is much scarier. I focused on it less here because I wanted to shore up broadly appealing theoretical reasons for trajectory change, and this argument feels much more partisan. But on my partisan worldview:

Consciousness is an indeterminate folk notion that we will reduce to computational properties. (Or something very close to this.)
These computational properties are going to be much, much more precise and gradable than our folk notion and they won't wear folk psychological properties on their sleeves.
As a result we're just going to have to make some choices about what stuff we think is conscious and what isn't. There's going to be a sharp borderline we're going to have to pick arbitrarily, probably based on nothing more than whimsical values.
People will disagree about where the borderline is.
Even if people don't disagree about the borderline, they'll disagree substantially about cardinality, i.e. how much to value different computational properties relative to others.
Given the Power Laws of Value point, some people's choices will be a mere shadow of value from the perspective of other people's choices.

If this "irrealist" view is right, it's extremely easy to lose out on almost all value.

Separately, I just don't think our descendents are going to care very much about whether the computers are actually conscious, and so AI design choices are going to be orthogonal to moral value. On this different sort of orthogonality thesis, we'll lose out on most value just because our descendents will use AI for practical reasons other than moral reasons, and so their intrinsic value will be unoptimized.

So Disneyland With No Children type threat models look very credible to me.

(I do think humans will make a lot of copies of themselves, which is decently valuable, but not if you're comparing it to the most valuable world or if you value diversity.)

You could have a more realist view where we just make a big breakthrough in cognitive science and realize that a very glowy, distinctive set of computational properties was what we were talking about all along when we talked about consciousness, and everyone would agree to that. I don't really think that's how science works, but even if you did have that view it's hard to see how the computational properties would just wear their cardinality on their sleeves. Whatever computational properties you find you can always value them differently. If you find some really natural measure of hedons in computational space you can always map hedons to moral value with different functions. (E.g. map 1 hedon to 1 value, 2 hedons to 10 value, 3 hedons to 100 value...)

So I didn't focus on it here, but I think it's definitely good to think about the Disneyland concern and it's closely related to what I was thinking about when writing the OP.

I really liked @Joe_Carlsmith articulation of your 23-word summary: what if all people are paperclippers relative to one another? Though it does make stronger assumptions than we are here.

tylermjohnMar 185

Some nice and insightful comments from Anders Sandberg on X:

This is very interesting, I liked reading it. I am not sure I entirely agree with the analysis, but I think MPL may well be true, and if this is true leads to very different long term strategies (e.g. a need for more moral hedging).
I am less convinced that we cannot find high-value states, or that they have to be human-parochial.
A key assumption seems to be that not getting maximal value is a disaster, but I think one can equally have a glass-half-full positive view that the search will find greater and greater values.
This essay also fits with my thinking that there might be new values out there, as yet unrealized. Once there was no life, and hence none of the values linked to living beings. Then consciousness, thinking and culture emerged, adding new kinds of value.
I suspect this might keep on going. Not obvious that new levels add fundamentally greater values, but potentially fundamentally different values. And it is not implausible that some are lexically better than others.
... if we sample new values at a constant rate and the values turn out to have a power law distribution, then the highest value found will in expectation grow (trying to work out the formula, but roughly linearly).

finmMar 175

Nice! Consolidating some comments I had on a draft of this piece, many of them fairly pedantic:

Why would value be disributed over some suitable measure of world-states in a way that can be described as a power law specifically (vs some other functional form where the most valuable states are rare)? In particular, shouldn't we think that there is a most valuable world state (or states)? So at least we need to say it's a power law distribution with a max value.
"Then there is a powerful argument that the expected value of the future is very low."
- Very low as a fraction of the best futures, but not any lower relative to (e.g.) the world today, or the EV of the future. Indeed the future could be amazing by all existing measures. One framing on what you are saying is that it could be even better than we think, which is not a pessimistic result!
  - Another framing is more like “it's easier than we thought to make amazing-seeming worlds far less valuable than they seem, by making mistakes like e.g. ignoring animal farming”. That is indeed bad news.
- And of course the decision-relevance of MPL depends on the feasibility of the best futures, not just how they're distributed.
One possibility that would support MPL is the possibility that value scales superlinearly with the amount of "optimised matter" — e.g. with brain size. The task of a risk-neutral classical utilitarian can then effectively be boiled down to "maximising the chance of getting ~the most optimized state possible", as long as "~the most optimized state possible" is at all feasible.
"If you value pretty much anything (e.g. consciousness, desire satisfaction), there’s likely to be a sharp line in phase space where a tiny change to the property makes an all-or-nothing difference to value." — this is true, but that the [arrangements of matter] → [value] function has discontinuities doesn't imply that the very most valuable states are extremely rare and far more valuable than all the others. So I think it's weak evidence for MPL.
Some distinctions which occur to me below. Assuming some measure over states:
- EV of the world at a state, vs value of a state itself (where EV cares about future states)
  - Note that the [state]→[EV] function should be discontinuous, because the evolution of states over time is discontinuous, because that's how the world works! E.g. in some cases you should evaluate a big difference in EV just by changing a vote counter after an election by one.
- Fragility of value/EV, something like how much value tends to change with small changes in space of states
- Rarity of value, something like what fraction of all states are >50% as valuable as the most valuable state(s)
- Unity of value, something like whether all the most valuable states are clumped together in state space, or whether the 'peaks' of the landscape are far apart and separated by valleys of zero or negative value
I think it's a true and important point that people currently converge on states they agree are high EV, because the option space is limited, and most good we value are still scarce instrumental goods — but when the option space grows, the latent disagreement becomes more important.
- See: The Tails Coming Apart As Metaphor For Life
"I don’t know if my folk history is accurate, but my understanding is that early religions and cultures had a lot in common with each other"
- I guess it depends on how you interpret ~indexical beliefs like "I want my group to win and the group over the hill to lose" — both sides can think that same thing, but might hate any compromise solution.
- I think this is a reason for pessism about different values agreeing on the same states, and ∴ supportive of MPL.
Re brains, there are some (weak) reasons to expect finite optimal sizes, like speed of light. A 'Jupiter brain' is not very different from many smaller brains with high bandwidth (but laggy) communication.
I doubt how rare near-best futures are among desired futures is a strong guide to the expected value of the future. At least, you need to know more about e.g. the feasibility of near-best futures; whether deliberative processes and scientific progress converge on an understanding of which futures are near-best, etc.
- There is an analogous argument which says: "most goals in the space of goals are bad and lead to AI scheming; AI will ~randomly initialise on a goal; so AI will probably scheme". But obviously whenever we make things by design (like cars or whatever), we are creating things which are astronomically unlikely configurations in the "space of ways to organise matter". And the likelihood that humans build cars just doesn't have much to do with what fraction of matter state space they occupy. It's just not an illuminating frame. The more interesting stuff is "will humans choose to make them", and "how easy are they to make". (I think Ben Garfinkel has made roughly this point, as has Joe Carlsmith more recently.)

LinchMar 268

Why would value be disributed over some suitable measure of world-states in a way that can be described as a power law specifically (vs some other functional form where the most valuable states are rare)?

I agree with this. I'm probably being too much of a pedant, but it's a slight detriment to our broader epistemic community that people use "power law" as a shorthand for "heavy-tailed distribution" or just "many OOMs of difference between best and worst/median outcomes." I think it makes our thinking a bit less clear when we try to translate back and forth between intuitions and math.

tylermjohnMar 271

For what it's worth, I do specifically have power-law shaped intuitions about the value of pleasure as you arrange matter to optimize for it more and more. But I agree with you both, I didn't argue for this and it's not important to my core point.

LinchMar 292

Apologies for doubting you!

Very much of a tangent, but do you have an short explanation for why the shape is likely to be a power-law? I think power laws are relatively rare in nature, and the more common generators of power law distributions (e.g. network effects) don't seem to apply here.

tylermjohnMar 173

These are excellent comments, and unfortunately they all have the virtue of being perspicuous and true so I don't have that much to say about them.

I doubt how rare near-best futures are among desired futures is a strong guide to the expected value of the future. At least, you need to know more about e.g. the feasibility of near-best futures; whether deliberative processes and scientific progress converge on an understanding of which futures are near-best, etc.

Is the core idea here that human desires and the values people reach on deliberation come apart? That makes sense, though it also leaves open how much deliberation our descendants will actually do / how much their values will be based on a deliberating process. I guess I'll just state my view without defending it that after a decade in philosophy I have become pretty pessimistic about convergence happening through deliberation rather than more divergence as more choice points are uncovered and reasoners either think they have a good loss function or just choose not to do backpropagation.

DavidmanheimMar 175

Thanks for the post, very interesting, definitely resonates with the empirical EA view of power-law returns, which I was surprised you didn't mention.

A couple issues:

1. The version of non-naturalist moral realism that the divergence seems both very strong, and strange to me. It assumes that the true moral code is unlike mathematical realism, where it's accessible with reflection and would be a natural conclusion for those who cared.

2. "You could accept diminishing returns to value in utility... but you’re unlikely to be a longtermist, laser focused on extinction risk if you do." I think this is false under the view of near-term extinction risk that is held by most of those who seem concerned about AI extinction risk, or even varieties of the hinge-of-history view whereby we are affected in the near term by longtermist concerns.

tylermjohnMar 173

Thanks a lot @Davidmanheim!

definitely resonates with the empirical EA view of power-law returns, which I was surprised you didn't mention.

Thanks, I meant to mention this but I think it got cut when I revised the second section for clarity, alas.

1. The version of non-naturalist moral realism that the divergence seems both very strong, and strange to me. It assumes that the true moral code is unlike mathematical realism, where it's accessible with reflection and would be a natural conclusion for those who cared.

Thanks for this. I'd love to hear non-naturalist moral realists talk about how they think moral facts are epistemically accessible, if it's not just luck. (Some philosophers do explicitly assume it's luck.) I think the problem here is extremely hard, including for mathematics, and my own view on mathematics is closest to Millian empiricism (we learn basic math e.g. arithmetic by observing physics and for more advanced mathematics we freely choose axioms that we test against reality for usefulness and explanatory power).

The best philosopher writing on the epistemology of both mathematics and ethics is Justin Clarke-Doane, who combines a form of pluralist realism about mathematics with expressivism about ethics.

However we access moral facts, I'd expect my points about dependency on initial conditions to generalise.

2. "You could accept diminishing returns to value in utility... but you’re unlikely to be a longtermist, laser focused on extinction risk if you do." I think this is false under the view of near-term extinction risk that is held by most of those who seem concerned about AI extinction risk, or even varieties of the hinge-of-history view whereby we are affected in the near term by longtermist concerns.

True, you could accept this moral view and also accept that:

X-risk is so high and so tractable that reducing it is better than GiveWell on relatively NT grounds
Animals don't matter much, so don't beat NT x-risk

And then you'd avoid the objection.

Or you could think utility doesn't matter much after you've got, say, 10^12 humans, and then x-risk still looks good but making sure the right future happens looks less important.

In general I think I was too quick here, good catch.

GideonFMar 175

I think averageists may actually also care about the long term future a lot, and it may still have a MPL if they don't hold (rapid) diminish returns to utility WITHIN lives (ie it is possible for the average life to be a lot worse or a lot better than today). Indeed, given (potentially) plausible views on interspecies welfare comparisons, and how bad the lvies of lots of non-humans seem today, this just does seem to be true. Now, its not clear they shouldn't be at least a little more sympathetic to us converging on the 'right' world (since it seems easier), but it doesn't seem like they get out of much of the argument either

tylermjohnMar 173

Nice point. I shouldn't have picked averageism as the most extreme version of this view. It would have been more apt to pick a "capped" model where the value on additional utility (or utility of a specific type) becomes zero after enough of it has been achieved.

GideonFMar 174

Ye, I might be wrong, but something like Larry Temkin's model might work best here (been a while since I read it so may be getting it wrong)

DavidmanheimMar 173

I'd love to hear non-naturalist moral realists talk about how they think moral facts are epistemically accessible...

The lack of an answer to that is a lot of the reason I discount the view as either irrelevant or not effectively different from moral non-realism.

True, you could accept this moral view....

Thanks!

And as I noted on the other post, I think there's a coherent argument that if we care about distinct moral experiences in some way, rather than just the sum, we get something like a limited effective utility, not at 10^12 people specifically, but plausibly somewhere far less than a galaxy full.

Vasco Grilo🔸Mar 225

Thanks for the post, Tyler!

There are a lot of ways to arrange 86 billion neurons. You could give them to one human, to 430 rats, or to 86 billion nematodes.

The above implies nematodes have 1 neuron, but they have around 300 neurons. So 86 billion neurons correspond to around 300 M nematodes.

For classical utilitarians, “hedonium” is likely many orders of magnitude more valuable than human brains (or the equivalent instantiated in silico).

I estimated the welfare range per calorie consumption of bees is 4.88 k times that of humans, which suggests bees produce welfare 4.88 k times as efficiently if welfare is proportional to the welfare range.

zdgroffMar 174

I think I agree with the Moral Power Laws hypothesis, but it might be irrelevant to the question of whether to try to improve the value of the future or work on extinction risk.

My thought is this: the best future is probably a convergence of many things going well, such as people being happy on average, there being many people, the future lasting a long time, and maybe some empirical/moral uncertainty stuff. Each of these things plausibly has a variety of components, creating a long tail. Yet you'd need expansive, simultaneous efforts on many fronts to get there. In practice, even a moderately sized group of people is only going to make a moderate to small push on a single front, or very small pushes on many fronts. This means the value we could plausibly affect, obviously quite loosely speaking, does not follow a power law.

tylermjohnMar 173

Thanks, @zdgroff! I think MPL is most important if you think that there are going to be some agents shaping things, these agents' motivations are decisive for what outcomes are achieved, and you might (today) be able to align these agents with tail-valuable outcomes. Then aligning these agents with your moral values is wildly important. And by contrast marginal improvements to the agents' motivations are relatively unimportant.

You're right that if you don't have any chance of optimizing any part of the universe, then MPL doesn't matter as much. Do you think that there won't be agents (even groups of them) with decisive control over what outcomes are achieved in (even parts of) the world?

It seems to me in the worst case we could at least ask Dustin to try to buy one star and then eventually turn it into computronium.

SummaryBotMar 182

Executive summary: The distribution of moral value follows a power law, meaning that a tiny fraction of possible futures capture the vast majority of value; if humanity's motivations shape the long-term future, most value could be lost due to misalignment between what matters most and what people value.

Key points:

Moral value follows a power law—a few outcomes are vastly more valuable than others, meaning that even minor differences in future trajectories could lead to enormous moral divergence.
Human motivations may fail to capture most value—if the long-term future is shaped by human preferences rather than an ideal moral trajectory, only a tiny fraction of possible value may be realized.
The problem worsens with greater option space—as technology advances, the variety of possible futures expands, increasing the likelihood that human decisions will diverge from the most valuable outcomes.
Metaethical challenges complicate the picture—moral realism does not guarantee convergence on high-value futures, and moral antirealism allows for persistent misalignment between human preferences and optimal outcomes.
There are ethical views that weaken the power law effect—some theories, such as diminishing returns in value or deep incommensurability, suggest that the difference between possible futures is not as extreme.
Trade and cooperation could mitigate value loss—if future actors engage in ideal resource allocation and bargaining, different moral perspectives might preserve large portions of what each values, counteracting the power law effect to some extent.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.