(Probably the most important post of this sequence.)
Summary: Some values are less adapted to the “biggest potential futures” than others (see my previous post), in the sense that they may constrain how one should go about colonizing space, making them less competitive in a space-expansion race. The preference for reducing suffering is one example of a preference that seems particularly likely to be unadapted and selected against. It forces the suffering-concerned agents to make trade-offs between preventing suffering and increasing their ability to create more of what they value. Meanwhile, those who don’t care about suffering don’t face this trade-off and can focus on optimizing for what they value without worrying about the suffering they might (in)directly cause. Therefore, we should – all else equal – expect the “grabbiest” civilizations/agents to have relatively low levels of concern for suffering, including humanity (if it becomes grabby). Call this the Upside-focused Colonist Curse (UCC). In this post, I explain this UCC dynamic in more detail using an example. Then, I argue that the more significant this dynamic is (relative to competing others), the more we should prioritize s-risks over other long-term risks, and soon.
The humane values, the positive utilitarians, and the disvalue penalty
Consider the concept of disvalue penalty: the (subjective) amount of disvalue a given agent would have to be responsible for in order to bring about the highest (subjective) amount of value they can. The story below should make what it means more intuitive.
Say they are only two types of agents:
- those endorsing “humane values” (the HVs) who disvalue suffering and value things like pleasure;
- the “positive utilitarians” (the PUs) who value things like pleasure but disvalue nothing.
These two groups are in competition to control their shared planet, or solar system, or light cone, or whatever.
The HVs estimate that they could colonize a maximum of [some high number] of stars and fill those with a maximum of [some high number] units of value. However, they also know that increasing their civilization’s ability to create value also increases s-risks (in absolute). They, therefore, face a trade-off between maximizing value and preventing suffering which incentivizes them to be cautious with regard to how they colonize space. If they were to purely optimize for more value without watching for the suffering they might (directly or indirectly) become responsible for, they’d predict they would cause x unit of suffering for every 10 units of value they create. This is the HVs’ disvalue penalty: x/10 (which is a ratio; a high ratio means a heavy penalty).
The PUs, however, do not care about the suffering they might be responsible for. They don’t face the trade-off the HVs face and have no incentive to be cautious like them. They can – right away – start colonizing as many stars as possible to eventually fill them with value, without worrying about anything else. The PU’s disvalue penalty is 0.
Image 1: Niander Wallace, a character from Blade Runner 2049 who can be thought of as a particularly baddy PU.
Because they have a higher disvalue penalty (incentivizing them to be more cautious), the humane values are less “grabby” than those of the PUs. While the PUs can happily spread without fearing any downside, the HVs would want to spend some time and resources thinking about how to avoid causing too much suffering while colonizing space (and about whether it’s worth colonizing at all), since suffering would hurt their total utility. This means, according to the Grabby Values Selection Thesis, that we should – all else equal – expect PU-ish values to be selected over HV-ish values in the space colonization race. Obviously, if there were values prioritizing suffering reduction more than the HVs, these would be selected against even more strongly. This is the Upside-focused Colonist Curse (UCC).
The concept of disvalue penalty is somewhat similar to that of alignment tax. The heavier it is, the harder it is for you to win the race. The situation PUs and the HVs are in, in my story, is analogous to the situation the AI capability and AI safety people are currently in, in the real world.
This UCC selection effect can occur both within a civilization (intra-civ; where HVs and PUs are part of the same civilization) and in between different civs (inter-civ selection; where HVs and PUs are competing civilizations).
Addressing obvious objections
(Mostly relevant to the inter-civ context) But why don’t the suffering-concerned agents simply prioritize colonizing space and actually think about how to best maximize their utility function later, in order not to lose the race against the PUs?
This is what my previous post refers to as the convergent preemptive colonization argument. We'll see that this argument does not apply well to our present example. (See Shulman 2012 for an example in which the argument works better.) It, indeed, seems like the HVs could postpone some actions like thinking about how to create happy minds without requiring some non-trivial incidental suffering, and therefore the action of eventually doing it. However, …
- this, of course, conditions on them being patient enough.
- they can’t similarly delay reducing all kinds of s-risks. The idea of prioritizing space colonization and postponing suffering prevention may be highly similar to that of prioritizing AI capabilities and postponing AI safety. Once the former is prioritized, it may be too late for the latter. A few examples:
- they can’t delay preventing disvalue from conflict with other agents (see Clifton 2019; Sandberg 2021). They need to think about this before meeting them so they’ll spend time/resources thinking about this instead of colonizing sooner/faster.
- they also can’t delay preventing suffering subroutines/simulations (see Tomasik 2015) that might be instrumentally useful during the colonization process. They also need to think about this beforehand.
- the HVs should be more concerned (relative to the PUs) about the possibility of their values drifting or being hacked (by, e.g., a malevolent actor), since this might result in hyper-catastrophic near-misses (see Tomasik 2019; Leech 2022) from their perspective. The HVs have more to lose than the PUs, such that they would want to spend time and resources on preventing this early on.
- in our current world, concern for suffering seems to correlate with skepticism and uncertainty regarding the value of colonizing space. This correlation is so strong that it might be hard to select against the latter without selecting against the former. And while this might seem specific to today’s humans, it is arguably a tendency that is somewhat generalizable to our successors and aliens.
- the HVs’ concern for suffering may very well be asymmetric on the action-omission axis, such that they’re more worried about the suffering they might create than the suffering they might prevent, making them less bullish on space colonization.
Note that all those points are disjunctive. You don’t need to buy them all to agree with my general claim.
(Mostly relevant to the intra-civ context) But why don’t the suffering-concerned agents try to join the PUs in their effort to colonize space and push for more cautiousness instead of trying to beat them?
They will likely try that, indeed. But we should expect them to be progressively selected against since they drag down the PU project. The few HVs who might manage not to value drift and not to get kicked out of the group of PUs driving colonization efforts are those who are complacent and not pushing really hard for more cautiousness.
If the Upside-focused Colonist Curse is real, then so too is the selection effect against anything that differs from “let’s just grab as much as possible and nothing else matters”.
Yes, that is (technically) true. However, the selection effect against concern for suffering/disvalue seems much stronger than the selection effect against PU-ish values. In fact, Carl Shulman (2012) argues that the selection effect against PU-ish values is very small. And although I find some of his assumptions poorly backed up, I think the considerations he raises make a lot of sense. Disvaluing something, however, seems to be a notable constraint, especially when this something is suffering, i.e., a thing that might be brought about for quite a few different reasons (see my response to the first objection). Value systems that demand the minimization of suffering seem therefore more likely to be selected against.
Figure 1: Vague illustration of my intuitions regarding how grabby different moral preferences are.
I think I have demonstrated that the biggest futures correlate with negligible concern for suffering (i.e., that UCC is a thing), all else equal. Obviously, all else is probably not equal. The extent to which UCC is decisive relative to other dynamics seems to mainly depend on the strength of the (broader) grabby values selection effect, which I briefly and roughly try to assess in this section in my previous post.
In the present section, however, I want to draw some implications from UCC, assuming it is somewhat of a decisive factor.
The values of aliens may not be worse than that of our successors, which reduces the importance of (certain ways of) reducing X-risks
Brauner and Grosse-Holz (2018, Part 2.1) argue that “[w]hether (post-)humans colonizing space is good or bad, space colonization by other agents seems worse”, and that this pushes in favor of prioritizing X-risk reduction.
As they explain:
If humanity goes extinct without colonizing space, some kind of other beings would likely survive on earth. These beings might evolve into a non-human technological civilization in the hundreds of millions of years left on earth and eventually colonize space. Similarly, extraterrestrials (that might already exist or come into existence in the future) might colonize (more of) our corner of the universe, if humanity does not.
They then claim that “our reflected preferences likely overlap more with a (post-)human civilization than alternative civilizations.” While this seems likely true in small futures where civilizations are not very grabby, the Grabby Values Selection Thesis suggests that the grabbiest civilizations converge on values that are particularly expansion-conducive. More specifically, UCC suggests that we should expect them to have little concern for suffering, independently of whether they are human or alien. Therefore, it seems that Brauner and Grosse-Holz’s claim is false in humanity's biggest futures, conditional on…
- sentience is a convergent feature among different civilizations, such that there is no strong evidence to assume grabby aliens are less likely – than grabby humans – to value things like pleasure.
- us – people considering this argument – being impartial (see MacAskill 2019). We don’t make arbitrary discriminations like (dis)valuing X more on the sole pretext that X has been caused by humans rather than by other agents.
While I 100% endorse #2, #1 does not seem necessarily obvious, which means that Brauner and Grosse-Hotz’s (2018, Part 2.1) above argument still holds to some extent. All else equal, grabby humans might be somewhat more likely to spread things like pleasure than the average grabby alien civ.
Moreover, even assuming the grabbiest aliens and the grabbiest humans have very similar values, one may argue that it is still preferable not to wait for aliens to spread something potentially valuable that humanity could have spread sooner (assuming the expected value of humanity’s future impact is positive). The importance of this consideration depends on how delayed the colonization of our corner of the universe is, in expectation, if it is done by another civilization than humanity.
However, I think UCC may seriously dampen the importance of reducing X-risks. In expectation, grabby humans and grabby aliens still seem somewhat likely to create things that are similarly (dis)valuable, which assails quite a big crux for prioritizing (some) X-risks (see Guttman 2022, Tomasik 2015; Aird 2020).
Present longtermists may have a strong comparative advantage in reducing s-risks
This one is pretty obvious. We’ve argued that in the worlds with the most stakes, upside-focused agents are selected for such that we should – all else equal – expect (or act as if) those who will control the corner of our universe to care about things like pleasure much more than about suffering. In expectation, this means that while we can – to some extent – count on our descendants and/or aliens to create things that we (or our reflected preferences) would find valuable, we can’t count on them to avoid s-risks. While s-risks are already highly neglected considering only present humans (see Baumann 2022; Gloor 2023), this neglectedness may very well notably increase in the far future.
And I doubt that any longtermist thinks the importance and tractability of s-risk reduction are low enough (see, e.g., Gloor 2023 for arguments against that) to endorse not doing anything about this expected increased neglectedness.
Therefore, longtermists might want to increase the extent to which they prioritize reducing s-risks (in long-lasting ways), by e.g., coming up with effective safeguards against near misses to spare upside-focused AIs the high opportunity cost of preventing suffering.
As a side note, I haven’t yet really thought about whether – or how – the Upside-focused Colonist Curse consideration should perhaps change s-risks researchers’ priorities (see, e.g., the research agendas of CLR and CRS), but that might be a promising research question to work on.
To the extent that UCC is a likely implication of the Grabby Values Selection Thesis, it seems almost trivially true.
However, I am deeply uncertain regarding its significance and therefore whether it is a strong argument for prioritizing s-risks over other long-term risks. Is there any crucial consideration I’m missing? For instance, are there reasons to think agents/civilizations that care about suffering might – in fact – be selected for and be among the grabbiest? How could we go about Fermi-estimating the strength of the selection effect? Thoughts are more than welcome!
Also, if UCC is decisively important, does that imply anything else than “today’s longtermists might want to prioritize s-risks”? What do you think?
More research is needed, here. Please, reach out if you’d be interested in trying to make progress on these questions!
Thanks to Robin Hanson, Maxime Riché, and Anthony DiGiovanni for our insightful conversations around this topic. Thanks to Antonin Broi for valuable feedback on an early draft. Thanks to Michael St. Jules, Charlie Guthmann, James Faville, Jojo Lee, Euan McLean, Timothy Chan, and Matīss Apinis for helpful comments on later drafts. I also benefited from being given the opportunity to present my work on this topic during a retreat organized by EA Oxford (thanks to the organizers and the retreat attendees who took part in the discussion.)
Most of my work on this sequence so far has been funded by Existential Risk Alliance.
All assumptions/claims/omissions are my own.
I.e., the futures with the largest magnitude in terms of what we can create/affect (see MacAskill 2022, Chapter 1).
This seems generally true, in our world, even if you replace s-risks with some other things a given agent disvalues. In general, you can’t increase your ability to create what you value without non-trivially increasing the risk of (directly or indirectly) bringing about more of what you disvalue (i.e., as long as you disvalue something, you have a non-zero disvalue penalty). For instance, say Bob values buildings and disvalues trees. It might seem like it is quite safe for him to start colonizing space (with the aim of eventually creating a bunch of buildings), without taking any non-trivial risk of becoming responsible for the creation of more trees. But, there are at least two reasons to think Bob has a non-negligible disvalue penalty. First, errors or value drift might lead to a catastrophic near miss (see Tomasik 2019, Leech 2022). Second, Bob may very well not be the only powerful agent there is around, such that his expansion straightforwardly increases the chance of triggering a conflict with other agents or civilizations. Such conflict could become catastrophic by bringing about what the involved agents disvalue (see Clifton 2019; Sandberg 2021).
And while Bob’s expansion involves a non-trivial risk of tree creation, the risk of the HVs’ expansion causing (indirectly/incidentally) suffering may be greater, for three reasons. First, negative reinforcement may be instrumentally useful to the accomplishment of many tasks (unlike tree creation). Second, a catastrophic near-miss seems much more likely with humane values than with Bob’s, since pleasure-like sources of value and suffering are much closer than trees and buildings in the space of all things one can create. Third, the technologies enabling the creation of more value might empower potential sadistic/retributivist actors willing to create suffering.
One counter-consideration is that the more the HVs colonize space, the more they might be able to reduce potential non-anthropogenic suffering (see Vinding and Baumann 2021). However, bringing about suffering seems far easier than reducing the suffering brought about by other civilizations, such that we should still expect the HVs' expansion to increase s-risks overall.
Where x > 0, and where the value-disvalue metric is built such as 1 unit of value subjectively outbalances 1 unit of suffering.
“You love pain. Pain reminds you the joy you felt was real. More joy, then! Do not be afraid. [...] Every leap of civilization was built on the back of a disposable workforce, but I can only make so many.”. Thanks to Euan McLean for bringing this interesting fictional example to my attention.
And if there are Wallace-like PUs (who also value the increased s-risks/suffering due to their expansion/progress, because “it is part of life!” or “the existence of hell makes us appreciate heaven more!” or something), we may expect them to be even more strongly selected for. Suffering might not only become something no one cares about but potentially even something valued.
UCC is somewhat similar to general potential dynamics that have already been described by others. From my post #2 in this sequence: ‘Nick Bostrom (2004) explores “scenarios where freewheeling evolutionary developments, while continuing to produce complex and intelligent forms of organization, lead to the gradual elimination of all forms of being that we care about.” Paul Christiano (2019) depicts a scenario where “ML training [...] gives rise to “greedy” patterns that try to expand their own influence.” Allan Dafoe (2019; 2020) coined the term “value erosion” to illustrate a dynamic where “[j]ust as a safety-performance tradeoff, in the presence of intense competition, pushes decision-makers to cut corners on safety, so can a tradeoff between any human value and competitive performance incentivize decision makers to sacrifice that value.”’ My claim, however, is weaker and less subject to controversy than theirs. The dynamic I describe is one where concern for disvalue (i.e., one specific bit of our moral preferences) is selected against, while the dynamics they describe are ones where our moral preferences, as a whole, are gradually eliminated. The reason why my claim is weaker is that concern for suffering is obviously less adaptive to space colonization races than things like caring about “human flourishing”. For instance, preferences for filling as many stars as possible with value are much more competitive, in this context, than preferences for reducing s-risks. To be clear, the former might also be selected against to the extent that it might be less adaptive than things like pure intrinsic preferences for spreading as much as possible, but concern for suffering is far less competitive and therefore far more likely to be selected against. I expand a bit on this in my response to the third objection in the next section.
See my previous post for more detail on intra-civ vs inter-civ selection.
Brauner and Grosse-Hotz’s (2018, Part 2.1) suggest that we should find one’s values “worse” if they don’t “depend on some aspects of being human, such as human culture or the biological structure of the human brain”. I 100% disagree with this and appreciate their footnote stating that “[s]everal people who read an early draft of [their] article commented that they would imagine their reflected preferences to be independent of human-specific factors.”
Only “some”? Indeed, this – in some cases – doesn’t apply to AGI misalignment, insofar as a misaligned AGI might not only prevent humanity from colonizing space but also accumulate resources and prevent the emergence of other civilizations, and/or slow down grabby aliens by “being in their way” and forcing them to accept moral compromises while the aliens could have simply filled our corner of the universe with whatever they value if the misaligned AI wasn’t there.