By Jan M. Brauner and Friederike M. Grosse-Holz
Work on this article has been funded by the Centre for Effective Altruism, but the article represents the personal views of the authors.
There are good reasons to care about sentient beings living in the millions of years to come. Caring about the future of sentience is sometimes taken to imply reducing the risk of human extinction as a moral priority. However, this implication is not obvious so long as one is uncertain whether a future with humanity would be better or worse than one without it.
In this article, we try to give an all-things-considered answer to the question: “Is the expected value of efforts to reduce the risk of human extinction positive or negative?”. Among others, we cover the following points:
- What happens if we simply tally up the welfare of current sentient beings on earth and extrapolate into the future; and why that isn’t a good idea
- Thinking about the possible values and preferences of future generations, how these might align with ours, and what that implies
- Why the “option value argument” for reducing extinction risk is weak
- How the potential of a non-human animal civilisation or an extra-terrestrial civilisation taking over after human extinction increases the expected value of extinction risk reduction
- Why, if we had more empirical insight or moral reflection, we might have moral concern for things outside of earth, and how that increases the value of extinction risk reduction
- How avoiding a global catastrophe that would not lead to extinction can have very long-term effects
If most expected value or disvalue lies in the billions of years to come, altruists should plausibly focus their efforts on improving the long-term future. It is not clear whether reducing the risk of human extinction would, in expectation, improve the long-term future, because a future with humanity may be better or worse than one without it.
From a consequentialist, welfarist view, most expected value (EV) or disvalue of the future comes from scenarios in which (post-)humanity colonizes space, because these scenarios contain most expected beings. Simply extrapolating the current welfare (part 1.1) of humans and farmed and wild animals, it is unclear whether we should support spreading sentient beings to other planets.
From a more general perspective (part 1.2), future agents will likely care morally about the same things we find valuable or about any of the things we are neutral towards. It seems very unlikely that they would see value exactly where we see disvalue. If future agents are powerful enough to shape the world according to their preferences, this asymmetry implies the EV of future agents colonizing space is positive from many welfarist perspectives.
If we can defer the decision about whether to colonize space to future agents with more moral and empirical insight, doing so creates option value (part 1.3). However, most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents. Such “bad” agents will make worse decisions than we, currently, could. Thus, the option value in reducing the risk of human extinction is small.
The universe may not stay empty, even if humanity goes extinct (part 2.1). A non-human animal civilization, extraterrestrials or uncontrolled artificial intelligence that was created by humanity might colonize space. These scenarios may be worse than (post-)human space colonization in expectation. Additionally, with more moral or empirical insight, we might realize that the universe is already filled with beings or things we care about (part 2.2). If the universe is already filled with disvalue that future agents could alleviate, this gives further reason to reduce extinction risk.
In practice, many efforts to reduce the risk of human extinction also have other effects of long-term significance. Such efforts might often reduce the risk of global catastrophes (part 3.1) from which humanity would recover, but which might set technological and social progress on a worse track than they are on now. Furthermore, such efforts often promote global coordination, peace and stability (part 3.2), which is crucial for safe development of pivotal technologies and to avoid negative trajectory changes in general.
Aggregating these considerations, efforts to reduce extinction risk seem positive in expectation from most consequentialist views, ranging from neutral on some views to extremely positive on others. As efforts to reduce extinction risk also seem highly leveraged and time-sensitive, they should probably hold prominent place in the long-termist EA portfolio.
Introduction and background
The future of Earth-originating life might be vast, lasting millions of years and containing many times more beings than currently alive (Bostrom, 2003). If future beings matter morally, it should plausibly be a major moral concern that the future plays out well. So how should we, today, prioritise our efforts aimed at improving the future?
We could try to reduce the risk of human extinction. A future with humanity would be drastically different from one without it. Few other factors seems as pivotal for how the world will look like in the millions of years to come as whether or not humanity survives the next few centuries and millennia. Effective efforts to reduce the risk of human extinction could thus have immense long-term impact. If we were sure that this impact was positive, extinction risk reduction would plausibly be one of the most effective ways to improve the future.
However, it is not at first glance clear that reducing extinction risk is positive from an impartial altruistic perspective. For example, future humans might have terrible lives that they can’t escape from, or humane values might exert little control over the future, resulting in future agents causing great harm to other beings. If indeed it turned out that we weren’t sure if extinction risk reduction was positive, we would prioritize other ways to improve the future without making extinction risk reduction a primary goal.
To inform this prioritisation, in this article we estimate the expected value of efforts to reduce the risk of human extinction.
Throughout this article, we base our considerations on two assumptions:
- That it morally matters what happens in the billions of years to come. From this very long-term view, making sure the future plays out well is a primary moral concern.
- That we should aim to satisfy our reflected moral preferences. Most people would want to act according to the preferences they would have upon idealized reflection, rather than according to their current preferences. The process of idealized reflection will differ between people. Some people might want to revise their preferences after they became much smarter, more rational and had spent millions of years in philosophical discussion. Others might want to largely keep their current moral intuitions, but learn empirical facts about the world (e.g. about the nature of consciousness).
Most arguments further assume that the state the world is brought into by one’s actions is what matters morally (as opposed to e.g. the actions following a specific rule). We thus take a consequentialist view, judging potential actions by their consequences.
Parts 1.1 and 1.2 further take a welfarist perspective, assuming that what matters morally in states of the world is the welfare of sentient beings. In a way, that means assuming our reflected preferences are welfarist. Welfare will be broadly defined as including pleasure and pain, but also complex values or the satisfaction of preferences. From this perspective, a state of the world is good if it is good for the individuals in this world. Across several beings, welfare will be aggregated additively, no matter how far in the future an expected being lives. Additional beings with positive (negative) welfare coming into existence will count as morally good (bad). In short, parts 1.1 and 1.2 take the view of welfarist consequentialism with a total view on population ethics (see e.g. (Greaves, 2017)), but the arguments also hold for other similar views.
If we make the assumptions outlined above, nearly all expected value or disvalue in a future with humanity arises from scenarios in which (post-)humans colonize space. The colonizable universe seems very large, so scenarios with space colonization likely contain a lot more beings than scenarios with earthbound life only (Bostrom, 2003). Conditional on human survival, space colonization also does not seem too unlikely, thus nearly all expected future beings live in scenarios with space colonization. We thus take “a future with humanity” to mean “(post-)human space colonization” for the main text and briefly discuss what a future with only earthbound humanity might look like in Appendix 1.
Outline of the article
Ultimately, we want to know “What is the expected value (EV) of efforts to reduce the risk of human extinction?”. We will address this question in three parts:
In part 1, we ask “What is the EV of (post-)human space colonization?”. We first attempt to extrapolate the EV from the amounts of value and disvalue in today’s world and how they would likely develop with space colonization. We then turn toward a more general examination of what future agents’ tools and preferences might look like and how they will, in expectation, shape the future. Finally, we consider if future agents could make a better decision on whether to colonize space (or not) than we can, so that it seems valuable to let them decide (option value).
In part 1 we tacitly assumed the universe without humanity is and stays empty. In part 2, we drop that assumption. We evaluate how the possibility of space colonization by alternative agents and the possibility of existing but tractable disvalue in the universe change the EV of keeping humans around.
In part 3, we ask “Besides reducing extinction risk, what will be the consequences of our efforts?”. We look at how different efforts to reduce extinction risk might influence the long-term future by reducing global catastrophic risk and by promoting global coordination and stability.
We stress that the conclusions of the different parts should not be separated from the context. Since we are reasoning about a topic as complex and uncertain as the long-term future, we take several views, aiming to ultimately reach a verdict by aggregating across them.
A note on disvalue-focus
The moral view on which this article is based is very broad and can include enormously different value systems, in particular different degrees of ‘disvalue-focus’. We consider a moral view disvalue-focused if it holds the prevention/reduction of disvalue is (vastly) more important than the creation of value. One example are views that hold the prevention or reduction of suffering as an especially high moral priority.
The degree of disvalue focus one takes chiefly influences the EV of reducing extinction risk.
From very disvalue-focused views, (post-) human space colonization may not seem desirable even if the future contains a much better ratio of value to disvalue than today. There is little to gain from space colonization if the creation of value (e.g. happy beings) morally matters little. On the other hand, space colonization would multiply the amount of sentient beings and thereby multiply the absolute amount of disvalue.
At first glance it thus seems that reducing the risk of human extinction is not a good idea from a strongly disvalue-focused perspective. However, the value of extinction risk reduction for disvalue-focused views gets shifted upwards considerably by the arguments in part 2 and 3 of this article.
Part 1: What is the EV of (post-)human space colonization?
1.1: Extrapolating from today’s world
Space colonization is hard. By the time our technology is advanced enough, human civilization will possibly have changed considerably in many ways. However, to get a first grasp of the expected value of the long-term future, we can model it as a rough extrapolation of the present. What if humanity as we know it colonized space? There would be vastly more sentient beings, including humans, farmed animals and wild animals. To estimate the expected value of this future, we will consider three questions:
- How many humans, farmed animals and wild animals will exist?
- How should we weigh the welfare of different beings?
- For each of humans, farmed animals and wild animals:
- Is the current average welfare net positive/average life worth living?
- How will welfare develop in the future?
We will then attempt to draw a conclusion. Note that throughout this consideration, we take an individualistic welfarist perspective on wild animals. This perspective stands in contrast to e.g. valuing functional ecosystems and might seem unusual, but is increasingly popular.
There will likely be more farmed and wild animals than humans, but the ratio will decrease compared to the present
In today’s world, both farmed and wild animals outnumber humans by far. There are about 3-4 times more farmed land animals and about 13 times more farmed fish than humans alive. Wild animals prevail over farmed animals, with about 10 times more wild birds than farmed birds and 100 times more wild mammals than farmed mammals alive at any point. Moving on to smaller wild animals, the numbers increase again, with 10 000 times more vertebrates than humans, and between 100 000 000 - 10 000 000 000 times more insects and spiders than humans.
In the future, the relative number of animals compared to humans will likely decrease considerably.
Farmed animals will not be alive if animal farming substantially decreases or stops, which seems more likely than not for both for moral and economical reasons. Humanity’s moral circle seems to have been expanding throughout history (Singer, 2011) and further expansion to animals may well lead us to stop farming animals. Also financially, plant-based meat alternatives or lab-grown meat will likely develop to be more efficient than growing animals (Tuomisto and Teixeira de Mattos, 2011). However, none of these developments seems unequivocally destined to end factory-farming, and the historical track record shows that meat consumption per head has been growing for > 50 years. Overall, it seems likely but not absolutely clear that the number of farmed animals relative to humans will be smaller in the future. For wild animals, we can extrapolate from a historical trend of decreasing wild animal populations. Even if wild animals were spread to other planets for terraforming, the animal / human ratio would likely be lower than today.
Welfare of different beings can be weighted by (expected) consciousness
To determine the EV of the future, we need to aggregate welfare across different beings. It seems like we should weigh the experience of a human, a cow and a beetle differently when adding up, but by how much? This is a hard question with no clear answer, but we outline some approaches here. The degree to which an animal is conscious (“the lights are on”, the being is aware of its experiences, emotions and thoughts), or the confidence we have in an animal being conscious, can serve as a parameter by which to weight welfare. To arrive at a number for this parameter, we can use proxies such as brain mass, neuron count and mental abilities directly. Alternatively, we may aggregate these proxies with other considerations into an estimate of confidence that a being is conscious. For instance, the Open Philanthropy Project estimates the probability that cows are conscious at 80%.
The EV of (post-)human lives is likely positive
Currently, the average human life seems to be perceived as being worth living. Survey data and experience sampling suggests that most humans are quite content with their lives and experience more positive than negative emotions on a day-to-day basis. If they find it not worth living, humans can take their life, but relatively few people commit suicide (Suicide accounts for 1.7 % of all deaths in US). We could conclude that human welfare is positive.
We should, however, note the two caveats in this conclusion. First, a live can be perceived as worth living even if it is negative from a welfarist perspective. Second, the average life might not be worth living if the suffering of the worst off was sufficiently more intense than the happiness of the majority of people.
Overall, it seems that from a large majority of consequentialist views, the current aggregated human welfare is positive.
In the future, we will probably make progress that will improve the average human life. Historic trends have been positive across many indicators of human well-being, knowledge, intelligence and capability. On a global scale, violence is declining, cooperation increasing (Pinker, 2011). Yet, the trend does not include all indicators: subjective welfare has (in recent times) remained stable or improved very little, and mental health problems are more prevalent. These developments have sparked research into positive psychology and mental health treatment, which is slowly bearing fruit. As more fundamental issues are gradually improved, humanity will likely shift more resources towards actively improving welfare and mental health. Powerful tools like genetic design and virtual reality could be used to further improve the lives of the broad majority as well as the worst-off. While there are good reasons to assume that human welfare in the future will be more positive than now, we still face uncertainties (e.g. from low probability events like malicious, but very powerful autocratic regimes and unknown unknowns).
EV of farmed animals’ lives is probably negative
Currently, 93% of farmed animals live on factory farms in conditions that likely make their lives not worth living. Although there are positive sides to animal life on farms compared to life in the wild, these are likely outweighed by negative experiences. Most farmed animals also lack opportunities to exhibit naturally desired behaviours like grooming. While there is clearly room for improvement in factory farming conditions, the question “is the average life worth living?” must be answered separately for each situation and remains controversial. On average, a factory farm animal life today probably has negative welfare.
In the future, factory farming is likely to be abolished or modified to improve animal welfare as our moral circle expands to animals (see above). We can thus be moderately optimistic that farm animal welfare will improve and/or less farm animals will be alive.
The EV of wild animals’ lives is very unclear, but potentially negative
Currently, we know too little about the lives and perception of wild animals to judge whether their average welfare is positive or negative. We see evidence of both positive and negative experiences. Meanwhile, our perspective on wild animals might be skewed towards charismatic big mammals living relatively good lives. We thus overlook the vast majority of wild animals, based both on biomass and neural count. Most smaller wild animal species (invertebrates, insects etc) are r-selected, with most individuals living very short lives before dying painfully. While vast numbers of those lives seem negative from a welfarist perspective, we may chose to weight them less based on the considerations outlined above. In summary, most welfarist views would probably judge the aggregated welfare of wild animals as negative. The more one thinks that smaller, r-selected animals matter morally, the more negative average wild animal welfare becomes.
In future, we may reduce the suffering of wild animals, but it is unclear whether their welfare would be positive. Future humans may be driven by the expansion of the moral circle and empowered by technological progress (e.g. biotechnology) to improve wild animal lives. However, if average wild animal welfare remains negative, it would still be bad to increase wild animal numbers by space colonization.
It remains unclear whether the EV of a future in which a human civilization similar to the one we know colonized space is positive or negative.
To quantify the above considerations from a welfarist perspective, we created a mathematical model. This model yields a positive EV for a future with space colonization if different beings are weighted by neuron count and a negative EV if they are weighted by sqrt(neuron count). In the first case, average welfare is positive, driven by the spreading of happy (post-)humans. In the second case, average welfare is negative as suffering wild animals are spread. The model is also based on a series of low-confidence assumptions, alteration of which could flip the sign of the outcome again.
More qualitatively, the EV of an extrapolated future heavily depends on one’s moral views. The degree to which one is focused on avoiding disvalue seems especially important. Consider that every day, humans and animals are being tortured, murdered, or in psychological despair. Those who would walk away from Omelas might also walk away from current and extrapolated future worlds.
Finally, we should note how little we know about the world and how this impacts our confidence in considerations about an extrapolated future. To illustrate the extent of our empirical uncertainty, consider that we are extrapolating from 100 000 years of human existence, 10 000 years of civilizational history and 200 years of industrial history to potentially 500 million years on earth (and much longer in the rest of the universe). If people in the past had guessed about the EV of the future in a similar manner, they would most likely have gotten it wrong (e.g. they might not have considered moral relevance of animals, or not have known that there is a universe to potentially colonize). We might be missing crucial considerations now in analogous ways.
1.2: Future agents’ tools and preferences
While part 1.1 extrapolates directly from today’s world, part 1.2 takes a more abstract approach. To estimate the EV of (post-)human space-colonization in more broadly applicable terms, we consider three questions:
- Will future agents have the tools to shape the world according to their preferences?
- Will future agents’ preferences resemble our 'reflected preferences' (see 'Moral assumptions' section)?
- Can we expect the net welfare of future agents and powerless beings to be positive or negative?
We then attempt to estimate the EV of future agents colonizing space from a welfarist consequentialist view.
Future agents will have powerful tools to shape the world according to their preferences
Since climbing down from the trees, humanity has changed the world a great deal. We have done this by developing increasingly powerful tools to satisfy our preferences (i.e. preferences to eat, stay healthy and warm, and communicate with friends (even if they are far away)). As far as humans have altruistic preferences, powerful tools have made acting on them less costly. For instance, if you see someone is badly hurt and want to help, you don’t have to carry them home and care for them yourself anymore, you can just call an ambulance. However, powerful tools have also made it easier to cause harm, either by satisfying harmful preferences (e.g. weapons of mass destruction) or as a side-effect of our actions that we are indifferent to. Technologies that enable factory farming do enormous harm to animals, although they were developed to satisfy a preference for eating meat, not for harming animals.
It seems likely that future agents will have much more powerful tools than we do today. These tools could be used to make the future better or worse. For instance, biotechnology and genetic engineering could help us cure diseases and live longer, but they could also enforce inequality if treatments are too expensive for most people. Advanced AI could make all kinds of services much cheaper but could also be misused. For more potent and complex tools, the stakes are even higher. Consider the example of technologies that facilitate space colonization. These tools could be used to cause the existence of many times more happy lives than would be possible on Earth, but also to spread suffering.
In summary, future agents will have the tools to create enormous value (more examples here) or disvalue (more examples here). It is thus important to consider the values/preferences that future agents might have.
We can expect future agents to have other-regarding preferences that we would, after reflection, find somewhat positive
When referring to future agents’ preferences, we distinguish between ‘self-regarding preferences’, i.e. preferences about states of affairs that directly affect an agent, and ‘other-regarding preferences’, i.e. preferences about the world that remain even if an agent is not directly affected (see footnote for a precise definition). Future agents’ other-regarding preferences will be crucial for the value of the future. For example, if the future contains powerless beings in addition to powerful agents, the welfare of the former will depend to a large degree on the other-regarding preferences of the latter (much more about that later).
We can expect a considerable fraction of future agents’ preferences to be other-regarding
Most people alive today clearly have (positive and negative) other-regarding preferences, but will this be the case for future agents? It has been argued that over time, other-regarding preferences could be stripped away by Darwinian selection. We explore this argument and several counterarguments in appendix 2. We conclude that future agents will, in expectation, have a considerable fraction of other-regarding preferences.
Future agents’ preferences will in expectation be parallel rather than anti-parallel to our reflected preferences
We want to estimate the EV of a future shaped by powerful tools according to future agents’ other-regarding preferences. In this article we assume that we should ultimately aim to satisfy our reflected moral preferences, the preferences we would have after an idealized reflection process (as discussed in the "Moral assumptions" section above). Thus, we must establish how future agents’ other-regarding preferences (FAP) compare to our reflected other-regarding preferences (RP). Briefly put, we need to ask: “would we want the same things as these future agents who will shape the world?”
FAP can be somewhere on a spectrum from parallel to orthogonal to anti-parallel to RP. If FAP and RP are parallel, future agents agree exactly with our reflected preferences. If the are anti-parallel, future agents see value exactly where we see disvalue. And if the are orthogonal, future agents value what we regard as neutral, and vice versa. We now examine how FAP will be distributed on this spectrum.
Assume that future agents care about moral reflection. They will then have better conditions for an idealized reflection process than we have, for several reasons:
Future agents will probably be more intelligent and rational
Empirical advances will help inform moral intuitions (e.g. experience machines might allow agents to get a better idea of other beings’ experiences)
Philosophy will advance further
Future agents will have more time and resources to deliberate
Given these prerequisites, it seems that future agents’ moral reflection would in expectation lead to FAP that are parallel rather than anti-parallel to RP. How much overlap between FAP and RP to expect remains difficult to estimate.
However, scenarios in which future agents do not care about moral reflection might substantially influence the EV of the future. For example, it might be likely that humanity loses control and the agents shaping the future bear no resemblance to humans. This could be the case if developing controlled artificial general intelligence (AGI) is very hard, and the probability that misaligned AGI will be developed is high (in this case, the future agent is a misaligned AI).
Even if (post-)humans remain in control, human moral intuitions might turn out to be contingent the starting conditions of the reflection process and not very convergent across the species. Thus, FAP may not develop into any clear direction, but rather drift randomly. Very strong and fast goal drift might be possible if future agents include digital (human) minds because such minds would not be restrained by the cultural universals rooted in the physical brain architecture.
If it turns out that FAP develop differently from RP, FAP will in expectation be orthogonal to RP rather than anti-parallel. The space of possible preferences is vast, so it seems much more likely that FAP will be completely different from RP, rather than exactly opposite (See footnote for an example). In summary, FAP parallel or orthogonal to RP both seem likely, but a large fraction of FAP being anti-parallel to RP seems fairly unlikely. This main claim seems true for most “idealized reflection processes” that people would choose.
However, FAP being between parallel and orthogonal to RP in expectation does not necessarily imply the future will be good. Actions driven by (orthogonal) FAP could have very harmful side-effects, as judged by our reflected preferences. Harmful side-effects could be devastating especially if future agents are indifferent towards beings we (would on reflection) care about morally. Such negative side-effects might outweigh positive intended effects, as has happened in the past. Indeed, some of the most discussed “risks of astronomical future suffering” are examples of negative side-effects.
Future agents’ tools and preferences will in expectation shape a world with probably net positive welfare
Above we argued that we can expect some overlap between future agents’ other-regarding preferences (FAP) and our reflected other-regarding preferences (RP). We can thus be somewhat optimistic about the future in a very general way, independent of our first-order moral views, if we ultimately aim to satisfy our reflected preferences. In the following section, we will drop some of that generality. We will examine what future agents’ preferences will imply for the welfare of future beings. In doing so, we assume that we would on reflection hold an aggregative, welfarist altruistic view (as explained in the background-section).
If we assume these specific RP, can we still expect FAP to overlap with them? After all, other-regarding preferences anti-parallel to welfarist altruism – such as sadistic, hateful, revengeful preferences - clearly exist within present day humanity. If current human values transferred broadly into the future, should we then expect a large fraction of FAP being anti-parallel to welfarist altruism? Probably not. We argue in appendix 3 that although this is hard to quantify, the large majority of human other-regarding preferences seem positive.
Assuming somewhat welfarist FAP, we explore what the future might be like for two types of beings: Future agents (post-humans) who have powerful tools to shape the world, and powerless future beings. To aggregate welfare for moral evaluation, we need to estimate how many beings of each type will exist. Powerful agents will likely be able to create powerless beings as “tools” if this seems useful for them. Sentient “tools” could include animals, farmed for meat production or spread to other planets for terraforming (e.g. insects), but also digital sentient minds, like sentient robots for task performance or simulated minds created for scientific experimentation or entertainment. The last example seems especially relevant, as digital minds could be created in vast amounts if digital sentience is possible at all, which does not seem unlikely. If we find we morally care about these “tools” upon reflection, the future would contain many times more powerless beings than powerful agents.
The EV of the future thus depends on the welfare of both powerful agents and powerless beings, with the latter potentially much more relevant than the former. We now consider each in turn, asking:
- How will their expected welfare be affected by intended effects and side-effects of future agents’ actions?
- How to evaluate this morally?
The aggregated welfare of powerful future agents is in expectation positive
Future agents will have powerful tools to satisfy their self-regarding preferences and be somewhat benevolent towards each other. Thus, we can expect future agents’ welfare to be increased through intended effects of their actions.
Side-effects of future agents’ actions negative for other agents’ welfare would mainly arise if their civilization is not coordinated well. However, compromise and cooperation seem to usually benefit all involved parties, indicating that we can expect future agents to develop good tools for coordination and use them a lot. Coordination also seems essential to avert many extinction risks. Thus, a civilization that avoided extinction so successfully that it colonizes space is expected to be quite coordinated.
Taken together, vastly more resources will likely be used in ways that improve the welfare of powerful agents than in ways that diminish their welfare. From the big majority of welfarist views, future agents’ aggregated welfare is thus expected to be positive. This conclusion is also supported by human history, as improved tools, cooperation and altruism have increased the welfare of most humans and average human lives are seen as worth living by many (see part 1.1).
The aggregated welfare of powerless future beings may in expectation be positive
Assuming that future agents are mostly indifferent towards the welfare of their “tools”, their actions would affect powerless beings only via (in expectation random) side-effects. It is thus relevant to know the “default” level of welfare of powerless beings. If the affected powerless beings were animals shaped by evolution, their default welfare might be net negative. This is because evolutionary pressure might result in a pain-pleasure asymmetry with suffering being much more intense than pleasure (see footnote for further explanation). Such evolutionary pressure would not apply for designed digital sentience. Given that our experience with welfare is restricted to animals (incl. humans) shaped by evolution, it is unclear what the default welfare of digital sentients would be. If there is at least some moral concern for digital sentience, it seems fairly likely that the creating agents would prefer to give their sentient tools net positive welfare.
If future agents intend to affect the welfare of powerless beings, they might - besides from treating their sentient “tools” accordingly - create (dis-)value optimized sentience: minds that are optimized for extreme positive or negative welfare. For example, future agents could simulate many minds in bliss, or many minds in agony. The motivation for creating (dis-)value optimized sentience could be altruism, sadism or strategic reasons. Creating (dis-)value optimized sentience would likely produce much more (negative) welfare per unit of invested resources than the side-effects on sentient tools mentioned above, as sentient tools are optimized for task performance, not production of (dis-)value. (Dis-)value optimized sentience would then be the main determinant of the expected value of post-human space colonization, and not side-effects on sentient tools.
FAP may be orthogonal to welfarist altruism, in which case little (dis-)value optimized sentience will be produced. However, we expect a much larger fraction of FAP to be parallel to welfarist altruism than anti-parallel to it, and thus expect that future agents will use many more resources to create value-optimized sentience than disvalue-optimized sentience. The possibility of (dis-)value optimized sentience should increase the net expected welfare of powerless future beings. However, there is considerable uncertainty about the moral implications of one resource-unit spent optimized for value or disvalue (see e.g. here and here). On the one hand, (dis)value optimized sentience created without evolutionary pressure might be equally efficient in producing moral (dis)value, but used a lot more to produce value. On the other hand, disvalue optimized sentience might lead to especially intense suffering. Many people intuitively give more moral importance to the prevention of suffering the worse it gets (e.g. prioritarianism).
In summary, it seems plausible that a little concern for the welfare of sentient tools could go a long way. Even if most future agents were completely indifferent towards sentient tools (=majority of FAP orthogonal to RP), positive intended effects – creation of value-optimized sentience – could plausibly weigh heavier than side-effects.
Morally evaluating the future scenarios sketched in part 1.2 is hard because we are uncertain. Both empirically uncertain what the future will be like and morally uncertain what our intuitions will be like. The key unanswered questions are
- How much can we expect the preferences that shape the future to overlap with our reflected preferences?
- In absence of concern for the welfare of sentient tools, how good or bad is their default welfare?
- How will the scales of intended effects and side-effects compare?
Taken together, we believe that the arguments in this section indicate that the EV of (post)-human space colonization would only be negative from relatively strongly disvalue-focused views. From the majority, but not overwhelming majority, of welfarist views the EV of (post)-human space colonization seems positive.
In parts 1.1 and 1.2, we directly estimated the EV of (post-)human space colonization and found it to be very uncertain. In the remaining parts, we will improve our estimate via other approaches that are less dependent on specific predictions about how (post-)humans will shape the future.
1.3: Future agents could later decide not to colonize space (option value)
We are often uncertain about what the right thing to do is. If we can defer the decision to someone wiser than ourselves, this is generally a good call. We can also defer across time: we can keep our options open for now, and hope our descendants will be able to make better decisions. This option value may give us a reason to prefer to keep our options open.
For instance, our descendants may be in a better position to judge whether space colonization would be good or bad. If they can see that space colonization would be negative, they can refrain from (further) colonizing space: They have the option to limit the harm. In contrast, if humanity goes extinct, the option of (post)-human space colonization is forever lost. So avoiding extinction creates ‘option value’(e.g. Macaskill). This specific type of ‘option value’ - from future agents choosing not to colonize space - and not the more general value of keeping options open, is what we will be referring to throughout this section. This type of option value exist for nearly all moral views, and is very unlikely to be negative. However, as we will discuss in this chapter, this value is rather small compared to other considerations.
A considerable fraction of futures contains option value
Reducing the risk of human extinction only creates option value if future agents will make a better decision, by our (reflected) lights, about whether to colonize space than we could. If they will make worse decisions than us, we would rather decide ourselves.
In order for future agents to make better decisions than us and actually act on them, they need to surpass us in at least one of the following aspects:
- Better values
- Better judgement what space colonization will be like (based on increased empirical understanding and rationality)
- Greater willingness and ability to make decisions based on moral values (non-selfishness and coordination)
Human values change. We are disgusted by many of our ancestors’ moral views, and they would find ours equally repugnant. We can even look back on our own moral views and disagree. There is no reason for these trends to stop exactly now: human morality will likely continue to change.
Yet at each stage in the change, we are likely to view our values as obviously correct. This encourages a greater degree of moral uncertainty than feels natural. We should expect that our moral views would change after idealized reflection (although this also depends on which meta-ethical theory is correct and how idealized reflection works).
We argued in part 1.2 that future agents’ preferences will in expectation have some overlap with our reflected preferences. Even if that overlap is not very high, a high degree of moral uncertainty would indicate that we would often prefer future agents’ preferences over our current, unreflected preferences. In a sizeable fraction of future scenarios, future agents with more time and better tools to reflect, can be expected to make better decisions than one could today.
Empirical understanding and rationality
We now understand the world better than our ancestors, and are able to think more clearly. If those trends continue, future agents may understand better what space colonization will be like, and so better understand how good it will be on a given set of values.
For example, future agents’ estimate of the EV of space colonization will benefit from
- Better empirical understanding of the universe (for instance about questions discussed in part 2.2) and better predictions, fuelled by more scientific knowledge and better forecasting techniques
- Increased intelligence and rationality, allowing them to more accurately determine what the best action is based on their values.
As long as there is some overlap between their preferences and one’s reflected preferences, this gives an additional reason to defer to future agents’ decisions (example see footnote).
Non-selfishness and coordination
We often know what’s right, but don’t follow through on it anyway. What is true for diets also applies here:
Future agents would need to actually make the decision about space colonization based on moral reasoning. This might imply acting against strong economic incentives pushing towards space colonization.
Future agents need to be coordinated well enough to avoid space colonization. That might be a challenge in non-singleton futures since future civilization would need ways to ensure that not a single agent starts space colonization.
It seems likely that future agents would probably surpass our current level of empirical understanding, rationality, and coordination, and in a considerable fraction of possible futures they might also do better on values and non-selfishness. However, we should note that to actually not colonize space, they would have to surpass a certain threshold in all of these fields, which may be quite high. Thus, a little bit of progress doesn’t help - option value is only created in deferring the decision to future agents if they surpass this threshold.
Only the relative good futures contain option value
For any future scenario to contain option value, the agents in that future need to surpass us in various ways, as outlined above. This has an implication that further diminishes the relevance of the option value argument. Future agents need to have relatively good values and be relatively non-selfishness to decide not to colonize space for moral reasons. But even if these agents colonized space, they would probably do it in a relatively good manner. Most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents (like misaligned AI). Such “bad” agents will make worse decisions about whether or not to colonize space than we, currently, could, because their preferences are very different from our (reflected) preferences. Potential space colonization by indifferent or malicious agents thus generates large amounts of expected future disvalue, which cannot be alleviated by option value. Option value doesn’t help in the cases where it is most needed (see footnote for an explanatory example)
If future agents are good enough, there is option value in deferring the decision whether to colonize space to them. In some not-too-small fraction of possible futures, agents will fulfill the criteria and thus option value adds positively to the EV of reducing extinction risk. However, the futures accounting for most expected future disvalue are likely controlled by indifferent or malicious agents. Such “bad” agents would likely make worse decisions than we could. A large amount of expected future disvalue is thus not amendable from alleviation through option value. Overall, we think the option value in reducing the risk of human extinction is probably fairly moderate, but there is a lot of uncertainty and contingency on one’s specific moral and empirical views. Modelling the considerations of this section showed that if the 90% confidence interval of value of the future was from -0.9 to 0.9 (arbitrary value units), option value was 0.07.
Part 2: Absence of (post-)human space colonization does not imply a universe devoid of value or disvalue
Up to now, we have tacitly assumed that the sign of EV of (post)-human space colonization determines whether extinction risk reduction is worthwhile. This only holds if without humanity, the EV of the future is roughly zero, because the (colonizable) universe is and will stay devoid of value or disvalue. We now consider two classes of scenarios in which this is not the case, with important implications especially for people who think that EV of (post-)human space colonization is likely negative.
2.1 Whether (post-)humans colonizing space is good or bad, space colonization by other agents seems worse
If humanity goes extinct without colonizing space, some kind of other beings would likely survive on earth. These beings might evolve into a non-human technological civilization in the hundreds of millions of years left on earth and eventually colonize space. Similarly, extraterrestrials (that might already exist or come into existence in the future) might colonize (more of) our corner of the universe, if humanity does not.
In these cases, we must ask whether we prefer (post-)human space colonization over the alternatives. Whether alternative civilizations would be more or less compassionate or cooperative than humans, we can only guess. We may however assume that our reflected preferences depend on some aspects of being human, such as human culture or the biological structure of the human brain. Thus, our reflected preferences likely overlap more with a (post-)human civilization than alternative civilizations. As future agents will have powerful tools to shape the world according to their preferences, we should prefer (post-)human space colonization over space colonization by an alternative civilization.
To understand how we can factor this consideration into the overall EV of a future with (post-) human space colonization, consider the following example of Ana and Chris. Ana thinks the EV of (post-)human space colonization is negative. For her, the EV of potential alternative space colonization is thus even more negative. This should cause people who, like Ana, are pessimistic about the EV of (post-)human space colonization (and thus the value of reducing the risk of human extinction) to update towards reducing the risk of human extinction because the alternative is even worse (technical caveat in footnote).
Chris thinks that the EV of (post-)human space colonization is positive. For him, the EV of potential alternative space colonization could be positive or negative. For people like Chris, who are optimistic about the EV of (post-)human space colonization (and thus the value of reducing the risk of human extinction), the direction of update is thus less clear. They should update towards reducing the risk of human extinction if the potential alternative civilization is bad, or away from it if the potential alternative civilization is merely less good. Taken together, this consideration implies a stronger update for future pessimists like Ana than for future optimists like Chris. This becomes clearer in the mathematical derivation or when considering an example.
It remains to estimate how big the update should be. Based on our best guesses about the relevant parameters (Fermi-estimate see here), it seems like future pessimists should considerably shift their judgement of the EV of human extinction risk reduction into the less negative direction. Future optimists should moderately shift their judgement downwards. Therefore, if one was previously uncertain with roughly equal credence in future pessimism and future optimism, one’s estimate for the EV of human extinction risk reduction should increase.
We should note that this is a very broad consideration, with details contingent on the actual moral views people hold and specific empirical considerations.
A specific case of alternative space colonization could arise if humanity gets extinguished by misaligned AGI. It seems likely that misaligned AI would colonize space. Space colonization by an AI might include (among other things of value/disvalue to us) the creation of many digital minds for instrumental purposes. If the AI is only driven by values orthogonal to ours, it would likely not care about the welfare of those digital minds. Whether we should expect space colonization by a human-made, misaligned AI to be morally worse than space colonization by future agents with (post-)human values has been discussed extensively elsewhere. Briefly, nearly all moral views would most likely rather have human value-inspired space colonization than space colonization by AI with arbitrary values, giving extra reason to work on AI alignment especially for future pessimists.
2.2 Existing disvalue could be alleviated by colonizing space
With more empirical knowledge and philosophical reflection, we may find that the universe is already filled with beings/things that we morally care about. Instead of just increasing the number of morally relevant things (i.e. earth originating sentient beings), future agents might then influence the states of morally relevant beings/things already existing in the universe. This topic is highly speculative and we should stress that most of the EV probably comes from “unknown unknowns”, which humanity might discover during idealized reflection. Simply put, we might find some way in which future agents can make the existing world (a lot) better if they stick around. To illustrate this general concept, consider the following ideas.
We might find that we morally care about things other than sentient beings, which could be vastly abundant in the universe. For example, we may develop moral concern for fundamental physics, e.g. in the form of panpsychicism. Another possibility could arise if the solution to the simulation argument (Bostrom, 2003) is indeed that we live in a simulation, with most things of moral relevance positioned outside of our simulation but modifiable by us in yet unknown ways. It might also turn out that we can interact with other agents in the (potentially infinite) universe or multiverse by acausal trade or multiverse-wide cooperation, thereby influencing existing things of moral relevance (to us) in their part of the universe/multiverse. These specific ideas may look weird. However, given humanity’s history of realizing that we care about more/other things than previously thought, it should in principle seem likely that our reflected preferences include some yet unknown unknowns.
We argued in part 1.2 that future agents’ preferences will in expectation be parallel rather than anti-parallel to our reflected preferences. If the universe is already filled with things/beings of moral concern, we can thus assume that future agents will in expectation improve the state of these things. This creates additional reason to reduce the risk of human extinction: There might be a moral responsibility for humanity to stick around and “improve the universe”. This perspective is especially relevant for disvalue-focused views. From a (strongly) disvalue-focused view, increasing the numbers of conscious beings by space colonization is negative because it generates suffering and disvalue. It might seem that there is little to gain if space colonization goes well, but much to lose if it goes wrong. If, however, future agents could alleviate existing disvalue, then humanity’s survival (potentially including space colonization) has upsides that may well be larger than the expected downsides (Fermi-estimate see footnote).
Part 3: Efforts to reduce extinction risk may also improve the future
If we had a button that reduces human extinction risk, and has no other effect, we would only need the considerations in parts 1 and 2 to know whether we should press it. In practice, efforts to reduce extinction risk often have other morally relevant consequences, which we examine below.
3.1: Efforts to reduce non-AI extinction risk reduce global catastrophic risk
Global catastrophe here refers to a scenario of hundreds of millions of human deaths and resulting societal collapse. Many potential causes of human extinction, like a large scale epidemic, nuclear war, or runaway climate change, are far more likely to lead to a global catastrophe than to complete extinction. Thus, many efforts to reduce the risk of human extinction also reduce global catastrophic risk. In the following, we argue that this effect adds substantially to the EV of efforts to reduce extinction risk, even from the very-long term perspective of this article. This doesn’t hold for efforts to reduce risks that, like risks from misaligned AGI, are more likely to lead to complete extinction than to a global catastrophe.
Apart from being a dramatic event of immense magnitude for current generations, a global catastrophes could severely curb humanity’s long-term potential by destabilizing technological progress and derailing social progress.
Technological progress might be uncoordinated and incautious in a world that is politically destabilized by global catastrophe. For pivotal technologies such as AGI, development in an arms race scenario (e.g. driven by post-catastrophe resource scarcity or war) could lead to adverse outcomes we cannot correct afterwards.
Social progress might likewise divert towards opposing open society and general utilitarian-type values. Can we expect the “new” value system emerging after a global catastrophe to be robustly worse than our current value system? While this issue is debated, Nick Beckstead gives a strand of arguments suggesting the “new” values would in expectation be worse. Compared to the rest of human history, we currently seem to be on a unusually promising trajectory of social progress. What exactly would happen if this period was interrupted by a global catastrophe is a difficult question, and any answer will involve many judgements calls about the contingency and convergence of human values. However, as we hardly understand the driving factors behind the current period of social progress, we cannot be confident it would recommence if interrupted by a global catastrophe. Thus, if one sees the current trajectory as broadly positive, one should expect this value to be partially lost if a global catastrophe occurs.
Taken together, reducing global catastrophic risk seems to be a valuable effect of efforts to reduce extinction risk. This aspect is fairly relevant even from a very-long term perspective because catastrophes are much more likely than extinction. A Fermi-Estimate suggests the long-term impact from the prevention of global catastrophes is about 50% of the impact from avoiding extinction events. The potential long-term consequences from a global catastrophe include worse values and an increase in the likelihood of misaligned AI scenarios. These consequences seem bad from most moral perspectives, including strongly disvalue-focused ones. Considering the effects on global catastrophic risk should suggest a significant update in the evaluation of the EV of efforts to reduce extinction risk towards more positive (or less negative) values.
3.2: Efforts to reduce extinction risk often promote coordination, peace and stability, which is broadly good
The shared future of humanity is a (transgenerational) global public good (Bostrom, 2013), thus society needs to coordinate to preserve it, e.g. by providing funding and other incentives. Most extinction risk also arises from technologies that allow for one agent (intentionally or by mistake) to start a potential extinction event (e.g. release a harmful virus or start a nuclear war). Coordinated action and careful decisions are thus needed and indeed, the broadest efforts to reduce extinction risk directly promote global coordination, peace and stability. More focused efforts often promote “narrow cooperation” within a specific field (e.g. nuclear non-proliferation) or set up processes (e.g. pathogenic surveillance) that increase global stability by reducing perceived levels of threat from non-extinction events (e.g. bioterrorist attacks).
Taken together, efforts to reduce extinction risk also promote a more coordinated, peaceful and stable global society. Future agents in such a society will probably make wiser and more careful decisions, reducing the risk of unexpected negative trajectory changes in general. Safe development of AI will specifically depend on these factors. Therefore, efforts to reduce extinction risk may also steer the world away from some of the worst non-extinction outcomes, which likely involve war, violence and arms races.
Note that there may be a trade-off as most targeted efforts seem more neglected and therefore promising levers for extinction risk reduction. However, their effects on global coordination, peace and stability are less certain and likely smaller than the effects of broad efforts aimed directly at increasing these factors. Broad efforts to promote global coordination, peace and stability might be among the most promising approaches to robustly improve the future and reduce the risk of dystopian outcomes conditional on human survival.
The expected value of efforts to reduce the risk of human extinction (from non-AI causes) seems robustly positive
So all things considered, what is the expected value of efforts to reduce the risk of human extinction? In the first part, we considered what might happen if human extinction is prevented for long enough that future agents, maybe our biological descendants, digital humans, or (misaligned) AGI created by humans, colonize space. The EV of (post-)human space colonization is probably positive from many welfarist perspectives, but very uncertain. We also examined the ‘option value argument’, according to which we should try to avoid extinction and defer the decision to colonize space (or not) to wiser future agents. We concluded that option value, while mostly positive, is small and the option value argument hardly conclusive.
In part 2, we explored what the future universe might look like if humans do go extinct. Vast amounts of value or disvalue might (come to) exist in those scenarios as well. Some of this (dis-)value could be influenced by future agents if they survive. This insight has little impact for people who were optimistic about the future anyway, but shifts the EV of reducing extinction risk upwards for people who were previously pessimistic about the future. In part 3, we extended our considerations to additional effects of many efforts to reduce extinction risk, namely reducing the risk of “mere” global catastrophes and increasing global cooperation and stability. These effects generate considerable additional positive long-term impact. This is because global catastrophes would likely change the direction of technological and social progress in a bad way, while global cooperation and stability are prerequisites for a positive long-term trajectory.
Some aspects of moral views make the EV of reducing extinction risk looks less positive than suggested above. We will consider three such aspects:
- From a strongly disvalue-focused view, increasing the total number of sentient beings seems negative regardless of the empirical circumstances. The EV of (post-) human space colonization (part 1.1 and 1.2) is thus negative, at least if the universe is currently devoid of value.
- From a very stable moral view (with low moral uncertainty, thus very little expected change in preferences upon idealized reflection), there are no moral insights for future agents to discover and act upon. Future agents could then only make better decisions than us about whether to colonize space through empirical insights. Likewise, future agents could only discover opportunities to alleviate astronomical disvalue that we currently do not see through empirical insights. Option value (part 1.3) and the effects from potentially existing disvalue (part 2.2) are reduced.
- From a very unusual moral view (with some of one’s reflected other-regarding preferences expected to be anti-parallel to most of humanity’s reflected other-regarding preferences), future agents will sometimes do the opposite of what one would have wanted. This would be true even if future agents are reflected and act altruistically (according to a different conception of ‘altruism’). From that view the future looks generally worse. There is less option value (part 1.3), and if the universe is already filled with beings/things that we morally care about (part 2.2), sometimes future agents might do the wrong thing upon this discovery.
To generate the (hypothetical) moral view that is most sceptical about reducing extinction risk, we unite all of the three aspects above. We assume a strongly disvalue-focused, very stable and unusual moral view. Even from this perspective (in rough order of descending relevance):
- Efforts to reduce extinction risk may improve the long-term future by reducing the risk of global catastrophes and increasing global cooperation and stability (part 3).
- There may be some opportunity for future agents to alleviate existing disvalue (as long as the moral view in question isn’t completely ‘unusual’ in all aspects) (part 2.2)
- (Post-)humans space colonization might be preferable to space colonization by non-human animals or extraterrestrials (part 2.1)
- Small amounts of option value might arise from empirical insights improving decisions (part 1.3).
From this maximally sceptical view, targeted approaches to reduce the risk of human extinction likely seem somewhat unexciting or neutral, with high uncertainty (see footnote for how advocates of strongly disvalue-focused views see the EV of efforts to reduce extinction risk). Reducing the risk of extinction by misaligned AI probably seems positive because misaligned AI would also colonize space (see part 2.1).
From views that value the creation of happy beings or creation of value more broadly, have considerable moral uncertainty, and believe future reflected and altruistic agents could make good decisions, the EV of efforts to reduce extinction risk is likely positive and extremely high.
In aggregation, efforts to reduce the risk of human extinction seem in expectation robustly positive from many consequentialist perspectives.
Efforts to reduce extinction risk should be a key part of the EA long-termist portfolio
Effective altruists whose primary moral concern is making sure the future plays out well will, in practice, need to allocate their resources between different possible efforts. Some of these efforts are optimized to reduce extinction risk (e.g. promoting biosecurity), others are optimized to improve the future conditional on human survival while also reducing extinction risk (e.g. promoting global coordination or otherwise preventing negative trajectory changes) and some are optimized to improve the future without making extinction risk reduction a primary goal (e.g. promoting moral circle expansion or "worst-case" AI safety research).
We have argued above that the EV of efforts to reduce extinction risk is positive, but is it large enough to warrant investment of marginal resources? A thorough answer to this question requires detailed examination of the specific efforts in question and goes beyond the scope of this article. We are thus in no position to provide a definitive answer for the community. We will, however, present two arguments that favor including efforts to reduce extinction risk as a key part in the long-termist EA portfolio. Efforts to reduce the risks of human extinction are time-sensitive and seem very leveraged. We know of specific risks this century, we have reasonably good ideas for ways to reduce them, and if we actually avert an extinction event, this has robust impact for millions of years (at least in expectation) to come. As a very broad generalization, many efforts optimized to otherwise improve the future - such as improving today’s values in the hope that they will propagate to future generations - are less time-sensitive or leveraged. In short, it seems easier to prevent an event from happening in this century than to otherwise robustly influence the future millions of years down the line.
Key caveats to this argument include that it is not clear how big differences in time-sensitivity and leverage are and that we may still discover highly leveraged ways to “otherwise improve the future”. Therefore, it seems that the EA long-termist portfolio should contain all of the efforts described above, allowing each member of the community to contribute to their comparative advantage. For those holding very disvalue-focused moral views, the more attractive efforts would plausibly be those optimized to improve the future without making extinction risk reduction a primary goal.
We are grateful to Brian Tomasik, Max Dalton, Lukas Gloor, Gregory Lewis, Tyler John, Thomas Sittler, Alex Norman, William MacAskill and Fabienne Sandkühler for helpful comments on the manuscript. Additionally, we thank Max Daniel, Sören Mindermann, Carl Shulman and Sebastian Sudergaard Schmidt for discussions that helped inform our views on the matter.
Jan conceived the article and the arguments presented in it. Friederike and Jan contributed to structuring the content and writing.
Appendix 1: What if humanity stayed earthbound?
In this appendix, we use the approach of part 1.1 and apply it to a situation in which humanity stays Earth-bound. It is recommended to first read part 1.1 before reading this appendix.
We think that scenarios in which humanity stays Earth-bound are of very limited relevance for the EV of the future for two reasons:
- Even if humanity staying Earth-bound was the most likely outcome, probably only a small fraction of expected beings live in these scenarios, so they only constitute a small fraction of expected value or disvalue (as argued in the introduction).
- Humanity staying Earth-bound may not actually be a very likely scenario because reaching post-humanity and realizing astronomical value might be a default path, conditional on humanity not going extinct (Bostrom, 2009)
If we assume humanity will stay Earth-bound, it seems that most welfarist views would probably favour reducing extinction risk. If one thinks humans are much more important than animals, it is obvious (unless one combined that view with suffering-focused ethics, such as antinatalism). If one also cares about animals, then very plausibly humanity's impact on wild animals is more relevant than humanity’s impact on farmed animals, because of the enormous numbers of the former (and especially since it seems plausible that factory farming will not continue indefinitely). So far, humanity’s main effect on wild animals has been a permanent decrease of population size (through habitat destruction), which is expected to continue as human population size grows. Compared to that, direct influence on wild animal well-being currently is unclear and probably small (though it is less clear for aquatic life):
- We kill significant numbers of wild animals, but we don’t know how painful human-caused death compared to non-human caused death is
- Wild animal generation times are very short, so the number of animals affected by “never coming into existence” is probably much larger
If one thinks that wild animals are on net suffering, future population size reduction seems beneficial. If one thinks that wild animal welfare is net positive, then habitat reduction would be bad. However, there is still unarguably a lot of suffering in nature. Humanity might eventually - if we have much more knowledge and better tools, that allow us to do so at limited costs to ourselves - improve wild animals’ lives (like we already do with e.g. vaccinations), so the prospect of that might offset some of the negative value of current habitat reduction. Obviously, habitat destruction is negative from a conservationist/environmentalist perspective.
Appendix 2: Future agents will in expectation have a considerable fraction of other-regarding preferences
Altruism in humans likely evolved as a “shortcut” solution to coordination problems. It was often impossible to forecast how much an altruistic act would help spread your own genes, but it often would (especially in small tribes, where all members were closely related). Thus, humans for whom altruism just felt good had a selective advantage.
As agents become more rational and long-term planning, a tendency to help for purely selfless reasons seems less adaptive. Agents can deliberately cooperate for strategic reasons whenever necessary and for the exactly optimal amount to optimize for their own reproductive fitness. One might fear that in the long run, only preferences for increasing one’s own power and influence (and that of one’s descendants) might remain under Darwinian selection.
But this is not necessarily the case, for two reasons:
Darwinian processes will select for patience, not “selfishness” (Paul Christiano)
Agents reasoning from a long-term perspective, and the better the tools to preserve values and influence into the future, may reduce the need for altruistic preferences, but also strongly reduce selection pressure for selfishness. In contrast to short-term planning (overly) altruistic agents, long-term planning agents that want to create value would realize that amassing power is an instrumental goal for that, and will try to survive, get resources for instrumental reasons, and coordinate with others against unchecked expansion of selfish agents. Thus, future evolution might select not for selfishness, but for patience or how strongly an agent cares about the long-term. Such long-term preferences should be expected to be more altruistic.
Carl Shulman additionally makes the point that in a space colonization scenario, agents that want to create value would only be very slightly disadvantaged in direct competition with agents that only care about expanding.
Brian Tomasik thinks Christiano’s argument is valid and altruism might not be driven to zero in the future, but is doubtful that very-long term altruist will have strategic advantages over medium-term corporations and governments and cautions against putting too much weight on theoretical arguments: “Human(e) values have only a mild degree of control in the present. So it would be surprising if such values had significantly more control in the far future.”
Preferences might not even be subject to Darwinian processes indefinitely
If the losses from evolutionary pressure indeed loom large, it seems quite likely that future generations would coordinate against it, e.g. by forming a singleton (Bostrom, 2006) (which broadly encompasses many forms of global coordination or value/goal-preservation). (Of course, there are also future scenarios that would strip away all other-regarding preferences, e.g. in Malthusian scenarios.)
In conclusion, we will end up somewhere between no other-regarding preferences and even more than today, with a considerable probability of future agents having a considerable fraction of other-regarding preferences.
Appendix 3: What if current human values transferred broadly into the future?
Most humans (past and present) intend to do what we now consider good (be loving, friendly, altruistic) more than they intend to harm (be sadistic, hateful, seek revenge). Positive other-regarding preferences might be more universal: most people would, all else equal, prefer all human or animals to be happy, while fewer people would have such a general preference for suffering. This relative overhang of positive preferences in human society is evident from rules that ban hurting (some) others, but not helping others. These rules will (if they persist) also shape the future, as they increase the costs of doing harm.
Throughout human history, there has been a trend away from cruelty and violence. Although humans cause a lot of suffering in the world today, this is mostly because people are indifferent or “lazy”, rather than evil. All in all, it seems fair to say that the significant majority of human other-regarding preferences is positive, and that most people would, all else equal, prefer more happiness and less suffering. However, we admit this is hard to quantify.
References (only those published in peer-reviewed journals, and books):
Bjørnskov, C., Boettke, P.J., Booth, P., Coyne, C.J., De Vos, M., Ormerod, P., Sacks, D.W., Schwartz, P., Shackleton, J.R., Snowdon, C., 2012. ... and the Pursuit of Happiness-Wellbeing and the Role of Government. Bostrom, N., 2013. Existential risk prevention as global priority. Global Policy 4, 15–31. Bostrom, N., 2011. INFINITE ETHICS. Analysis and Metaphysics 9–59. Bostrom, N., 2009. The Future of Humanity, in: New Waves in Philosophy of Technology, New Waves in Philosophy. Palgrave Macmillan, London, pp. 186–215.[ https://doi.org/10.1057/9780230227279_10](https://doi.org/10.1057/9780230227279_10) Bostrom, N., 2006. What is a singleton. Linguistic and Philosophical Investigations 5, 48–54. Bostrom, N., 2004. The future of human evolution. Death and anti-death: Two hundred years after Kant, fifty years after Turing 339–371. Bostrom, N., 2003a. Astronomical waste: The opportunity cost of delayed technological development. Utilitas 15, 308–314. Bostrom, N., 2003b. Are We Living in a Computer Simulation? The Philosophical Quarterly 53, 243–255.[ https://doi.org/10.1111/1467-9213.00309](https://doi.org/10.1111/1467-9213.00309) Greaves, H., 2017. Population axiology. Philosophy Compass 12, e12442. Killingsworth, M.A., Gilbert, D.T., 2010. A wandering mind is an unhappy mind. Science 330, 932.[ https://doi.org/10.1126/science.1192439](https://doi.org/10.1126/science.1192439) Pinker, S., 2011. The Better Angels of our Nature. New York, NY: Viking. Sagoff, M., 1984. Animal Liberation and Environmental Ethics: Bad Marriage, Quick Divorce. Philosophy & Public Policy Quarterly 4, 6.[ https://doi.org/10.13021/G8PPPQ.41984.1177](https://doi.org/10.13021/G8PPPQ.41984.1177) Singer, P., 2011. The expanding circle: Ethics, evolution, and moral progress. Princeton University Press. Tuomisto, H.L., Teixeira de Mattos, M.J., 2011. Environmental Impacts of Cultured Meat Production. Environ. Sci. Technol. 45, 6117–6123.[ https://doi.org/10.1021/es200130u](https://doi.org/10.1021/es200130u)
Simply put: two beings experiencing positive (or negative) welfare are morally twice as good (or bad) as one being experiencing the same welfare ↩︎
Some considerations that might reduce our certainty that, even given the moral perspective of this article, most expected value or disvalue comes from space colonization:
- The doomsday argument
- Some explanations of the Fermi-Paradox
- Potential implications of the simulation argument (Bostrom, 2003)
In this article, the term ‘(post-)human space colonization’ is meant to include any form of space colonization that originates from a human civilization, including cases in which (biological) humans or human values don’t play a role (e.g. because humanity lost control over artificial superintelligence, which then colonizes space). ↩︎
… assuming that without (post-)human space colonization, the universe is and stays devoid of value or disvalue, as explained in “Outline of the article” ↩︎
We here assume that humanity does not change substantially, excluding e.g. digital sentience from our considerations. This may be overly simplistic, as interstellar travel seems so difficult that a space-faring civilization will likely be extremely different from us today. ↩︎
Around 80 billion farmed fish, which live around one year, are raised and killed per year. ↩︎
All estimates from Brian Tomasik ↩︎
There are convincing anecdotes and examples for an expanding moral circle from family to nation to all humans: The abolishment of slavery; human rights; reduction in discrimination based on gender, sexual orientation, race. However, there doesn’t seem to be a lot of hard evidence. Gwern lists a few examples of a narrowing moral circle (such as infanticide, torture, other examples being less convincing). ↩︎
- lab-grown meat is very challenging with few people working on it, little funding, …
- Consumer adoption is far from inevitable
- Some people will certainly not want to eat in-vitro meat, so it is unlikely the number of factory-farmed will be abolished completely in the medium term, if the circle of empathy doesn’t increase or governments don’t regulate.
There are also contrary trends. E.g. in Germany, meat consumption per head has been decreasing since 2011, from 62.8 kg in 2011 to 59.2 kg in 2015. In the US, it has been stagnant for 10 years. ↩︎
- Many more people remember feeling enjoyment or love than pain or depression across many countries (Figure 13, here)
- In nearly every country, (much) more than 50% of people report feeling very happy or rather happy (section “Economic growth and happiness”, here)
- Average happiness in experience sampling in US: 65/100 (Killingsworth and Gilbert, 2010)
One could claim that this just shows that people are afraid of dying or don’t commit suicide for other reasons, but people that suffer from depression have lifetime suicide rates of 2-15%, 10-25 times higher than general population. This at least indicates that suicide rates increase if quality of life decreases. ↩︎
Reported well-being: People on average seem to report being content with their lives. This is only moderate evidence for their lives being positive from a welfarist view because people don’t generally think in welfarist terms when evaluating their lives and there might be optimism bias in reporting. Suicide rates: There are many reasons why people with lives not worth living might refrain from suicide, for example:
- possibility of failing and then being institutionalized and/or living with serious disability
- obligations to parents, children, friends
- fear of hell
- always enough food and water (with some exceptions)
- Domesticated animals have been bred for a long time and now in general have lower basal stress levels and stress reactions than wild animals (because they don’t need them)
- harmful breeding (e.g. broiler chicken are potentially in pain during the last 2 weeks of their life, because their joints cannot sustain their weight)
- There is no incentive to satisfy the emotional and social needs of farmed animals. It is quite likely that e.g. pigs can’t exhibit their natural behavior (e.g. gestation crates). Pigs, hens, veal cattle are often kept in ways that they can’t move (or only very little) for weeks.
- stress (intense confinement, chicken and pigs show self-mutilating behavior)
- extreme suffering (some percentage of farmed animals suffering to death or experiencing intense pain during slaughter)
The book Compassion by the pound, for example, rates the welfare of caged laying hens and pigs as negative, but beef cattle, dairy cows, free range laying hens and broiler chickens (market animals) as positive. Other experts disagree, especially on broiler chickens having lives worth living. ↩︎
Ability to express natural behaviour, such as sex, eating, social behavior, etc. ↩︎
Often painful deaths, disease, parasitism, predation, starvation, etc. In general, there is danger of anthropomorphism. Of course I would be cold in Antarctica, but a polar bear wouldn’t. ↩︎
Specifically: moral weight for insects, probability that humanity will eventually improve wild animal welfare, future population size multiplier (insect relative to humans) and human and insect welfare. ↩︎
If anything, attitudes towards animals have arguably become more empathetic. The majority of people around the globe express concern for farm animal well-being. (However, there is limited data, several confounders, and results from indirect questioning indicate that the actual concern for farmed animals might be much lower). See e.g.: http://ec.europa.eu/commfrontoffice/publicopinion/archives/ebs/ebs_270_en.pdf https://www.horizonpoll.co.nz/attachments/docs/horizon-research-factory-farming-survey-report.pdf http://www.tandfonline.com/doi/abs/10.2752/175303713X13636846944367 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4196765/ But also: https://link.springer.com/article/10.1007/s11205-009-9492-z ↩︎
Future technology, in combination with unchecked evolutionary pressure, might also lead to futures that contain very little of what we would value upon reflection (Bostrom, 2004). ↩︎
Self-regarding preferences are preferences that depend on the expected effect of the preferred state of affairs on the agent. These are not synonymous with purely “selfish preferences”. Acting according to self-regarding preferences can lead to acts that benefit others, such as in trade.
Other-regarding preferences are preferences that don’t depend on the expected effect of the preferred state of affairs on the agent. Other-regarding preferences can lead to acts that also benefit the actor. E.g. parents are happy if they know their children are happy. However, the parents would also want their children to be happy if they wouldn’t come to know about it. As defined here, other-regarding preferences are not necessarily positive for others. They can be negative (e.g. sadistic/hateful preferences) or neutral (e.g. aesthetic preferences).
Example of two parties at war:
- Self-regarding preference: Members of the one party want members of the other party to die, so they can win the war and conquer the other party’s resources.
- Other-regarding preference: Members of the one party want members of the other party to die, because they developed intense hate against them. Even if they don’t get any advantage from it, they would still want the enemy to suffer.
Individual humans as well as human society have become more intelligent over time. See: history of education, scientific revolution, Flynn effect, information technology. Genetic engineering or artificial intelligence may further increase our individual and collective cognition. ↩︎
Even if FAP and RP don’t have a lot of overlap, there might be additional reasons to defer to the values of future generations. Paul Christiano advocates one should sympathize with future agents’ values, if they are reflected, for strategic cooperative reasons, and for a willingness to discard idiosyncratic judgements. ↩︎
Even if earth-originating AI is initially controlled, this might not guarantee control over the future: Goal preservation might be costly, if there are trade-offs between learning and goal preservation during self-improvement, especially in multipolar scenarios. ↩︎
How meaningful moral reflection is, and whether we should expect human values to converge upon reflection, also depends on unsolved questions in meta-ethics. ↩︎
Of course, orthogonal other-regarding preferences can sometimes still lead to anti-parallel actions. Take as an example the debate of conservationism vs. wild animal suffering. Both parties have other-regarding preferences over wild animals. Conservationist don’t have a preferences for wild animal suffering, just for conserving eco-systems. Wild animal suffering advocates don’t have a preference against conserving eco-systems (per se), just against wild animal suffering. In practice, these orthogonal views likely recommend different actions regarding habitat destruction. However, if there will be future agents with preferences on both sides, then there is wildly more room for gains through trade and compromise (such as the implementation of David Pearce’s Hedonistic imperative) in cases like this than if other-regarding preferences were actually anti-parallel. Still, as I also remark in the conclusion, people who think their reflected preferences will be sufficiently unusual to have only a small overlap with other-regarding preferences of other humans, even if they are reflected, will find the whole part 1.2 less compelling for that reason. ↩︎
Maybe we would, after idealized reflection, include a certain class of beings into our other-regarding preferences, and we would want them to be able to experience, say, freedom. It seem quite likely that future agents won’t care about these being at all. However, it seems very unlikely that they would have a particular other-regarding preference for such being to be un-free.
Or consider the paperclip-maximiser, a canonical example for misaligned AI and thus a example for FAP certainly not being parallel to RP. Still, a paperclip-maximizer does not have a particular aversion against flourishing life, just as we don’t have a particular aversion against paperclips. ↩︎
Examples of negative “side-effects” as defined here:
- The negative “side-effects” of warfare on the losing party are bigger than the positive effects for the winning party (assuming that the motivation for the war was not “harming the enemy”, but e.g. acquiring the enemy’s resources)
- This is an example of side effects of powerful agents’ self-regarding preferences on other powerful agents.
- The negative “side-effects” of factory farming (animal suffering) are bigger than the positive effects for humanity (ability to eat meat). Many people do care about animals, so this is also an example of self-regarding preferences conflicting with other-regarding preferences.
- The negative “side-effects” of slave-labor on the slave are bigger than the positive effects for the slave owner (gain in wealth)
- These are both examples of side effects of powerful agents’ self-regarding preferences on powerless beings.
Of course there are also positive side-effects, cooperative and accidental: E.g.
- positive “side-effects” of powerful agents acting according to their preferences on other powerful agents: All gains from trade and cooperation
- positive “side-effects” of powerful agents acting according to their preferences on powerless beings: Rabies vaccination for wild animals. Arguably, wild animal population size reduction.
- The negative “side-effects” of warfare on the losing party are bigger than the positive effects for the winning party (assuming that the motivation for the war was not “harming the enemy”, but e.g. acquiring the enemy’s resources)
Additionally, one might object that FAP may not be the driving force shaping the future. Today, it seems that major decision are mediated by a complex system of economical and political structures that often leads to outcomes that don’t align with the preferences of individual humans and that overweights the interests of the economically and politically powerful. On that view, we might expect the influence of human(e) values over the world to remain small. We think that future agents will probably have better tools to actually shape the world according to their preferences, which includes better tools for mediating disagreement and reaching meaningful compromise. But insofar as the argument in this footnote applies, it gives an additional reason to expect orthogonal actions, even if FAP aren’t orthogonal. ↩︎
Note that cooperation does not require caring about the partner one cooperates with. Even two agents that don’t care about each other at all may cooperate instead of waging war for the resources the other party holds, if they have good tools/institutions to arrange compromise, because the cost of warfare is high. ↩︎
Evolutionary reasons for the asymmetry between biological pain and pleasure that would not necessarily remain in designed digital sentience (ideas owed to Carl Shulman):
- Animals try to minimize the duration of pain (e.g. by moving away from the source of pain), and try to maximize the duration of pleasurable events (e.g by continuing to eat). Thus, painful events are on average shorter than pleasurable events, and so need to be more intense to induce the same learning experience.
- Losses in reproductive fitness from one single negative event (e.g. a deadly injury) can be much greater than the gains of reproductive fitness from any single positive event, so animals evolved to want to avoid these events at all cost.
- Boredom/satiation can be seen as evolved protection against reward channel hacking. Animals for which one pleasant stimulus stayed pleasant indefinitely (e.g. animal that just continued eating) had less reproductive success. Pain channels need less protection against hacking, because pain channel hacking...:
- only works if there is sustained pain in the first place, and
- is much harder to learn than pleasure channel hacking (the former: after getting hurt, an animal would need to find and eat a pain-relieving plant; the latter: an animal just needs to continue eating despite not having any use for additional calories)
This might be part of the reason why pain seems much easier to instantiate on demand than happiness. ↩︎
Even if future powerful agents have some concern for the welfare of sentient tools, sentient tools’ welfare might still be net negative, if there are reasons that make positive-welfare tools much more expensive than negative welfare tools (e.g. if suffering is very important for task performance). But even if maximal efficiency and welfare of tools are not completely correlated, we think that most suffering can be avoided while still keeping most productivity, so that a little concern for sentient tools could thus go a long way. ↩︎
Strategic acts in scenarios with little cooperation could motivate the creation of disvalue-optimized sentience, especially in multipolar scenarios that contain both altruistic and indifferent agents (blackmailing). However, because uncooperative acts are bad for everyone, these scenarios in expectation seem to involve little resources. On the positive side, there can also be gains from trade between altruistic and indifferent agents. ↩︎
Sentient tools are optimized for performance in the task they are created for. Per resource-unit, future agents would create: a number of minds as is most efficient, with hedonic experience as is most efficient, optimized for task.
(Dis)value-optimized sentience might be directly optimized for extent of consciousness or intensity of experience (if that is actually what future generations value altruistically). Per resource-unit, future agents would create: as many minds as is optimal for (dis)value, with as positive/negative as possible hedonic experience, optimized for conscious states.
Such sentience might be orders of magnitude more efficient in creating conscious experience than sentience not optimized for it. E.g. in humans, only a tiny fraction of energy is used for peak conscious experience: about 20% of energy is used for the brain, only a fraction of that is used for conscious experience, only a fraction of which are “peak” experiences. ↩︎
The driving force behind this judgement is not necessarily the belief that most futures will be good. Rather, it is the belief that the ‘rather good’ futures will contain more net value than the ‘rather bad’ futures will contain net disvalue.
- The ‘rather good’ futures contain agents with other-regarding preferences highly parallel to our reflected preferences. Many resources will be spent in a way that optimizes for value (by our lights).
- In the ‘rather bad’ futures, agents are largely selfish, or have other-regarding preferences completely orthogonal to our reflected other-regarding preferences. In these futures, most resources will be spent for goals that we do not care about, but very few resources will be spent to produce things we would disvalue in an optimized way. On whichever side of ”zero” these scenarios fall, they seem much closer to parity than the “rather good futures” (from most moral views).
As also noted in the discussion at the end of the article, part 1 is less relevant for people who have other-regarding preferences very different from other people, and who believe their RP to be very different from the RP of the rest of humanity. ↩︎
Option value is not a separate kind of value, and it would be already integrated in the perfect EV calculation. However, it is quite easy to overlook, and somewhat important in this context, so it is discussed separately here. ↩︎
In a general sense, ‘option value’ includes the value of any change of strategy, for the better or worse, that future agents might take upon learning more. However, the general fact that agents can learn more and adapt their strategy is not surprising and was already factored into considerations 1, 2 and 4. ↩︎
In the more general definition, option value is not always positive. In general, giving future agents the option to choose between different strategies can be bad, if the values of future agents are bad or their epistemics are worse. In this section, ‘option value’ only refers to the option of future agents not to colonize space, if they find colonizing space would be bad from an altruistic perspective. It seems very unlikely that, if future agents refrain from space colonization for altruistic reasons at all, they would do so exactly in those cases in which we (current generation) would have judge space colonization as positive (according to our reflected preferences). So this kind of option value is very unlikely to be negative. ↩︎
Although empirical insights about the universe play a role in both option value and part 2.2, these two considerations are different:
- Part 2.2: Further insight about the universe might show that there already is a lot of disvalue out there. A benevolent civilization might reduce this disvalue.
- Option value: Further insight about the universe might show that there already is a lot of value or disvalue out there. That means that we should be uncertain about the EV of (post-)human space colonization. Our descendants will be less uncertain, and can then, if they know there is NOT already a lot of disvalue out there, still decide to not spread to the stars.
Individual humans as well as human society have become more intelligent over time. See: history of education, scientific revolution, Flynn effect, information technology. Genetic engineering or artificial intelligence may further increase our individual and collective cognition. ↩︎
For example, if we care only about maximizing X, but future agents will care about maximizing X, Y and Z to equal parts, letting them decide whether or not to colonize space might still lead to more X than if we decided, because they have vastly more knowledge about the universe and are generally much more capable of making rational decisions. ↩︎
Even if future agents can make better decisions regarding our other-regarding preferences than we (currently) could, future agents also need to be non-selfish enough to act accordingly - their other-regarding preferences need to constitute a sufficiently large fraction of their overall preferences. ↩︎
Say we are uncertain about the value in the future in two ways:
- 50% credence that disvalue-focused view would be my preferred moral view after idealized reflection, 50% credence in a ‘balanced view’ that also values the creation of value.
- 50% credence that the future will be controlled by indifferent actors, with preferences completely orthogonal to our reflected preferences, 50% credence that it will be controlled by good actors who have exactly the preferences we would have after idealized reflection.
The following table shows expected net value of space colonization without considering option value (again: made-up numbers):
Indifferent actors Good actors Disvalue-focused view -100 -10 ‘Balanced view’ - 5 100
Now with option value, only the good actors would limit the harm if the disvalue-focused view was indeed our (and thus, their) preferred moral view after idealized reflection:
Indifferent actors Good actors Disvalue-focused view -100 0 ‘Balanced view’ - 5 100
There is more option value, if:
- One one currently has high moral uncertainty (one expects one’s views to change considerably upon idealized reflection). With high moral uncertainty, it is more likely that future agents will have significantly more accurate moral values. Expects future agents to have a significantly better empirical understanding
- One’s uncertainty about the EV of the future comes mainly from moral, and not empirical, uncertainty. For example, say you are uncertain about the expected value of the future because you are unsure whether you would, in your reflected preferences, endorse a strongly disvalue-focused view. If you are generally optimistic about future agents, you can assume future generations to be better informed about which moral view to take. Thus, there is a lot of option value in reducing the risk of human extinction. If, one the other hand, you are uncertain about the EV of the future because you think there is a high chance that future agents just won’t be altruistic, there is no option value in deferring the decision about space colonization to them.
It seems likely that some life-forms would survive, except if human extinction is caused by some cosmic catastrophes (not a focus area for effective altruists, because unlikely and intractable) or by specific forms of nano-technology or by misaligned AI. ↩︎
The extent to which it is true depends on the reflection process one chooses. Several people who read an early draft of this article commented that they would imagine their reflected preferences to be independent of human-specific factors. ↩︎
The argument in the main text assumed that the alternative space colonization contains a comparable amount of things that we find morally relevant as the (post-)human colonization. But in many cases, the EV of an alternative space colonization would actually be (near) neutral, because the alternative civilization’s preferences would be orthogonal to ours. Our values would just be so different from the AI’s or extraterrestrial values that space colonization by these agents might often look neutral to us. The argument in the main text still applies, but only for those alternative space colonizations that contain comparable absolute amounts of value and disvalue.
However, a very similar argument applies even for alternative colonizations that contain less absolute amount of things we morally care about. The value of alternative space colonization would be shifted more towards zero, but future pessimists would in expectation always find alternative space colonization a worse outcome than no space colonization. From the future pessimistic perspective, human extinction leads to a bad outcome (alternative colonization), and not a neutral one (no space colonization). Future pessimists should thus update towards extinction risk reduction being less negative. Future optimists might find the alternative space colonization better or worse than no colonization.
The mathematical derivation in the next footnote takes this caveat into account. ↩︎
Assumption: This derivation makes the assumption that people who think the EV of human space colonization is negative and those who think it is positive would still rank a set of potential future scenarios in the same order when evaluating them normatively. This seems plausible, but may not be the case. Let’s simplify the value of human extinction risk reduction to:
EV(reduction of human extinction risk) = EV(human space colonization) - EV(human extinction)
(This simplification is very uncharitable towards extinction risk reduction, even if only considering the long-term effects, see parts 2 and 3 of this article). Assuming that no non-human animal or extraterrestrial civilization would emerge in case of human extinction, then EV(human extinction)=0, and so future pessimists judge:
EV(reduction of human extinction risk) = EV(human space colonization) - EV(human extinction)= EV(human space colonization) < 0
And future optimists believe:
EV(reduction of human extinction risk) = EV(human space colonization) - EV(human extinction) = EV(human space colonization) > 0
Let’s say, if humanity goes extinct, there will be non-human space colonization eventually with the probability p. (p can be down-weighted in a way to account for the fact that later space colonization probably means less final area colonized). This means that:
EV(human extinction) = p * EV(non-human space colonization)
Let’s define the amount of value and disvalue created by human space colonization as Vₕ and Dₕ, and the amount value and disvalue created by the non-human civilization as Vₙₕ and Dₙₕ.
We can expect two relations:
- On average, a non-human civilization will care less about creating value and care less about reducing disvalue than a human civilization. We can expect the ratio of value to disvalue to be worse in the case of a non-human civilization:
(i) Vₙₕ/Dₙₕ = (Vₕ/Dₕ) * r, with 0 <= r <= 1
- On average, non-human animals and extraterrestrial values will be alien to us, their preferences will be orthogonal to ours. I seems likely that on average these futures will contain less value or disvalue than a future with human space-colonization.
(ii) (Vₙₕ + Dₙₕ) = (Vₕ + Dₕ) * a, with 0 <= a <= 1
Finally, the expected value of non-human space colonization can be expressed as (by definition):
(iii) EV(non-human space colonization) = Vₙₕ - Dₙₕ
Using (i), (ii), and (iii) we get:
EV(human extinction) = EV(non-human space colonization) * Probability(non-human space colonization) = (Vₙₕ - Dₙₕ) * p = [a * (Vₕ + Dₕ) / ((Vₕ/ Dₕ) * r + 1)] * (r * Vₕ/ Dₕ - 1) * p
The first term [in square brackets] is always positive. The sign of the second term (in bold) can change depending on whether we were previously optimistic or pessimistic about the future.
If we were previously pessimistic about the future, we thought:
Vₕ - Dₕ < 0 -> Vₕ/ Dₕ < 1
The second term is negative, EV of human extinction is negative. Compared to the “naive” pessimistic view (assuming EV(human extinction) = 0), pessimists should update their view into the direction of EV(reducing human extinction risk) being less negative.
If we were previously optimistic about the future, we thought:
Vₕ - Dₕ > 0 -> Vₕ/ Dₕ > 1
Now the second term can be negative, neutral, or positive. Compared to the naive view, future optimists should sometimes be more enthusiastic (if Vₙₕ/ Dₙₕ= r * Vₕ/ Dₕ < 1) and sometimes be less enthusiastic (if Vₙₕ/ Dₙₕ= r * Vₕ/ Dₕ > 1) about extinction risk reduction than they previously were. ↩︎
Let’s define future pessimists as people who judge the expected value of (post-)human space colonization as negative; future optimists analogously. Now consider the example of a non-human civilization significantly worse than human civilization (by our lights), such that future optimists would find it normatively neutral, and future pessimists find it significantly more negative than human civilization. Then future optimists would not update their judgement (compared to before considering the possibility of a non-human animal spacefaring civilization), but pessimists would update significantly into the direction of human extinction risk reduction being positive. ↩︎
E.g. one might think that humanity might be comparatively bad at coordination (compared to e.g. intelligent ants), and so relatively likely to create uncontrolled AI wrong, which might be an exceptionally bad outcome, maybe even worse than an intelligent ant civilization. However, considerations like this seem to require highly specific judgements and are likely not very robust. ↩︎
Section 4.2 is not dependent on a welfarist or even consequentialist view. More generally, it applies to any kind of empirical or moral insight that we might have, which would make us realize that other things than we previously thought are of great moral value or disvalue. ↩︎
- The history of an “expanding moral circle” (Singer, 2011), from tribes to nations to all humans…
- The relatively new notion of environmentalism
- The new notion of wild animal suffering
- The new notion of future beings being (astronomically) important (Bostrom, 2003)
Assuming that the side-effects of resources spent for self-regarding preferences of future agents are neutral/symmetric with regards to the beings/things out there (which seems to be a reasonable assumption). ↩︎
Fermi-estimate (wild guesses, again):
- Assume a 20% probability that, with more moral and empirical insight, we would conclude that the universe is already filled with beings/things that we morally care about
- Assume that the altruistic impact future agents could have is always proportional to the amount of resources spent for altruistic purposes. If the universe is devoid of value or disvalue, then altruistic resources will be spent on creating new value (e.g. happy beings). If the universe is already filled with beings/things that we morally care about, it will likely contain some disvalue. Assume that in these cases, 25% of altruistic resources will be used to reduce this disvalue (and only 75% to create new value). Also assume that resources can be used at the same efficiency e to create new disvalue, or to reduce existing disvalue.
- Assume that resources spent for self-regarding preferences of future agents would on average not improve or worsen the situation for the things of (dis)value already out there.
- Assume that in expectation, future agents will spend 40 times as many resources pursuing other-regarding preferences parallel to our reflected preferences (“altruistic”) than on pursuing other-regarding preferences anti-parallel to our reflected preferences (“anti-altruistic”). Note that this is compatible with future agents, in expectation, spending most of their resources on other-regarding preferences completely orthogonal to our reflected preferences.
- From a disvalue-focused perspective, creation of new value does not matter, only creation of new disvalue, or reduction of already existing disvalue. From such a perspective: (R: total amount of resources spent on parallel or anti-parallel other-regarding preferences).
- Expected creation of new disvalue = (1/40) * R * e = 2.5% * R * e
- Expected reduction of already existing disvalue = 20% * 25% * (1-(1/40)) * R * e = 5% * R * e
Thus, the expected reduction of disvalue through (post-)humanity is 2 times greater than expected creation of disvalue. This is, however, an upper bound. The calculation assumed that the universe contains enough disvalue that future agents could actually spend 25% altruistic resources on alleviating it, before having alleviated it all. In some cases, the universe might not contain that much disvalue, so some resources would go into the creation of value again. ↩︎
Analogous to part 1.2, this part 2.2 is less relevant for people who believe that some of their reflected other-regarding preferences will be so unusual that they will be anti-parallel to most of humanity’s reflected other-regarding preferences. Such a view is e.g. defended by Brian Tomasik in the context of suffering in fundamental physics. Tomasik argues that, even if he (after idealized reflection) and future generation both came around to care for sentience in fundamental physics, and even if future generations were to influence fundamental physics for altruistic reasons, they would still be more likely to do it in a way that increases the vivacity of physics, which Tomasik (after idealized reflection) would oppose. ↩︎
This section draws heavily on Nick Beckstead’s thoughts. ↩︎
Global catastrophes that do not directly cause human extinction may initiate developments that lead to extinction later on. For the purposes of this article, these cases are not different from direct extinction, and are omitted here. ↩︎
E.g. Paul Christiano: “So if modern civilization is destroyed and eventually successfully rebuilt, I think we should treat that as recovering most of Earth’s altruistic potential (though I would certainly hate for it to happen).” In his article, Christiano outlines several empirical and moral judgement calls that lead him to his conclusion, such as:
- As long a moral reflection and sophistication process is ongoing, which seems likely, civilizations will reach very good values (by his lights).
- He is willing to discard his idiosyncratic judgements.
- He directly cares about others’ (reflected) values.
It is of course a question whether one should stick with one’s own preferences, if the majority of reflected and altruistic agents have opposite preferences. According to some empirical and meta-ethical assumptions, one should. ↩︎
Different advocates of strong suffering-focused views come to different judgements on the topic. They all seem to agree that, from a purely suffering-focused perspective, it is not clear whether efforts to reduce the risk of human extinction are positive or negative:
Lukas Gloor: "it tentatively seems to me that the effect of making cosmic stakes (and therefore downside risks) more likely is not sufficiently balanced by positive effects on stability, arms race prevention and civilizational values (factors which would make downside risks less likely). However, this is hard to assess and may change depending on novel insights.” … “We have seen that efforts to reduce extinction risk (exception: AI alignment) are unpromising interventions for downside-focused value systems, and some of the interventions available in that space (especially if they do not simultaneously also improve the quality of the future) may even be negative when evaluated purely from this perspective.”
David Pearce: “Should existential risk reduction be the primary goal of: a) negative utilitarians? b) classical hedonistic utilitarians? c) preference utilitarians? All, or none, of the above? The answer is far from obvious. For example, one might naively suppose that a negative utilitarian would welcome human extinction. But only (trans)humans - or our potential superintelligent successors - are technically capable of phasing out the cruelties of the rest of the living world on Earth. And only (trans)humans - or rather our potential superintelligent successors - are technically capable of assuming stewardship of our entire Hubble volume.” … “In practice, I don't think it's ethically fruitful to contemplate destroying human civilisation, whether by thermonuclear Doomsday devices or utilitronium shockwaves. Until we understand the upper bounds of intelligent agency, the ultimate sphere of responsibility of posthuman superintelligence is unknown. Quite possibly, this ultimate sphere of responsibility will entail stewardship of our entire Hubble volume across multiple quasi-classical Everett branches, maybe extending even into what we naively call the past [...]. In short, we need to create full-spectrum superintelligence.”
Brian Tomasik: “I'm now less hopeful that catastrophic-risk reduction is plausibly good for pure negative utilitarians. The main reason is that some catastrophic risk, such as from malicious biotech, do seem to pose nontrivial risk of causing complete extinction relative to their probability of merely causing mayhem and conflict. So I now don't support efforts to reduce non-AGI "existential risks". [...] Regardless, negative utilitarians should just focus their sights on more clearly beneficial suffering-reduction projects” ↩︎
For example, interventions that aim at improving humanity’s values/increasing the circle of empathy might be highly leveraged and time-sensitive, if humanity achieves goal conservation soon, or values are otherwise sticky. ↩︎
“Positive”/”negative” as defined from a welfarist perspective. ↩︎
Societies may increase the costs, and thereby reducing the frequency, of acts following from negative other-regarding preferences, as long as negative other-regarding preferences are a minority. E.g. if 5% of a society have a other-regarding preference for inflicting suffering on a certain group (of powerless beings), but 95% have a preference against it, in many societal forms less than 5% of people will actually inflict suffering on this group of powerless beings, because there will be laws against it, ... ↩︎
This fact could be interpreted either as human nature that we will revert to, or as a trend of moral progress. The latter seems more likely to us. ↩︎
Another possible operationalization of the ratio between positive and negative other-regarding preferences: How much money is spent on pursuing positive and negative other-regarding preferences?
- Some state budgets are clearly pursuant to positive other-regarding preferences
- It is less clear whether there are budgets that are clearly pursuant to negative other-regarding preferences, although at least a part of military spending is.
Thanks for posting on this important topic. You might be interested in this EA Forum post where I outlined many arguments against your conclusion, the expected value of extinction risk reduction being (highly) positive.
I do think your "very unlikely that [human descendants] would see value exactly where we see disvalue" argument is a viable one, but I think it's just one of many considerations, and my current impression of the evidence is that it's outweighed.
Also FYI the link in your article to "moral circle expansion" is dead. We work on that approach at Sentience Institute if you're interested.
I have seen and read your post. It was published after my internal "Oh my god, I really, really need to stop reading and integrating even more sources, the article is already way too long"-deadline, so I don't refer to it in the article.
In general, I am more confident about the expected value of extinction risk reduction being positive, than about extinction risk reduction actually being the best thing to work on. It might well be that e.g. moral circle expansion is more promising, even if we have good reasons to believe that extinction risk reduction is positive.
I personally don't think that this argument is very strong on its own. But I think there are additional strong arguments (in descending order of relevance):
Thank you for the reply, Jan, especially noting those additional arguments. I worry that your article neglects them in favor of less important/controversial questions on this topic. I see many EAs taking the "very unlikely that [human descendants] would see value exactly where we see disvalue" argument (I'd call this the 'will argument,' that the future might be dominated by human-descendant will and there is much more will to create happiness than suffering, especially in terms of the likelihood of hedonium over dolorium) and using that to justify a very heavy focus on reducing extinction risk, without exploration of those many other arguments. I worry that much of the Oxford/SF-based EA community has committed hard to reducing extinction risk without exploring those other arguments.
It'd be great if at some point you could write up discussion of those other arguments, since I think that's where the thrust of the disagreement is between people who think the far future is highly positive, close to zero, and highly negative. Though unfortunately, it always ends up coming down to highly intuitive judgment calls on these macro-socio-technological questions. As I mentioned in that post, my guess is that long-term empirical study like the research in The Age of Em or done at Sentience Institute is our best way of improving those highly intuitive judgment calls and finally reaching agreement on the topic.
I have written up my thoughts on all these points in the article. Here are the links.
The final paragraphs of each sections usually contain discussion of how relevant I think each argument is. All these sections also have some quantitative EV-estimates (linked or in the footnotes).
But you probably saw that, since it is also explained in the abstract. So I am not sure what you mean when you say:
Are we talking about the same arguments?
Oh, sorry, I was thinking of the arguments in my post, not (only) those in your post. I should have been more precise in my wording.
What are your thoughts on A longtermist critique of “The expected value of extinction risk reduction is positive”?
Great work. A few notes in descending order or importance which I'd love to see addressed at least in brief:
1) This seems not to engage with the questions about short-term versus long-term prioritization and discount rates. I'd think that the implicit assumptions should be made clearer.
2) It doesn't seem obvious to me that, given the universalist assumptions about the value of animal or other non-human species, the long term future is affected nearly as much by the presence or absence of humans. Depending on uncertainties about the Fermi hypothesis and the viability of non-human animals developing sentience over long time frames, this might greatly matter.
3) Reducing the probability of technological existential risks may require increasing the probability of human stagnation.
4) S-risks are plausibly more likely if moral development is outstripped by growth in technological power over relatively short time frames, and existential catastrophe has a comparatively limited downside.
Hi David, thanks for your comments.
Yes, the article does not deal with considerations for and against caring about the long-term. This is discussed elsewhere. Instead, the article assumes that we care about the long-term (e.g. that we don't discount the value of future lives strongly), and analyses what implications follow from that view.
We tried to make that explicit. E.g., the first point under "Moral assumptions" reads:
I think this point matters. Part 2.1 of the article deals with the implications of potential future non-human animal civilizations and extraterrestrials. I think the implications are somewhat complicated and depend quite a bit on your values, so I won't try to summarize them here.
We don't try to argue for increasing the speed of technological progress.
Apart from that, it is not clear to me that extinction has "comparatively little downside" (compared to S-risks, you probably mean). It, of course, depends on your moral values. But even from a suffering-focused perspective, it may well be that we would - with more moral and empirical insight - come to realize that the universe is already filled with suffering. I personally would not be surprised if "S-risks by omission" (*) weighed pretty heavily in the overall calculus. This topic is discussed in part 2.2.
I don't have anything useful to say regarding your point 3).
(*) term coined by Lukas Gloor, I think.
Thanks for replying.
I'd agree with your points regarding limited scope for the first and second points, but I don't understand how anyone can make prioritization decisions when we have no discounting - it's nearly always better to conserve resources. If we have discounting for costs but not benefits, however, I worry the framework is incoherent. This is a much more general confusion I have, and the fact that you didn't address or resolve it is unsurprising.
Re: S-Risks, I'm wondering whether we need to be concerned about value misalignment leading to arbitrarily large negative utility, given some perspectives. I'm concerned that human values are incoherent, and any given maximization is likely to cause arbitrarily large "suffering" for some values - and if there are multiple groups with different values, this might mean any maximization imposes maximal suffering on the large majority of people's values.
For example, if 1/3 of humanity feels that human liberty is a crucial value, without which human pleasure is worse than meaningless, another 1/3 views earning reward as critical, and the last 1/3 views bliss/pure hedonium as optimal, we would view tiling the universe with human brains maxed out for any one of these as a hugely negative outcome for 2/3 of humanity, much worse than extinction.
Regarding your second point, just a few thoughts:
First of all, an important point is how you think values and morality work. If two-thirds of humanity, after thorough reflection, disagree with your values, does this give you a reason to become less certain about your values as well? Maybe adopt their values to a degree? ...
Secondly, I am also uncertain how coherent/convergent human values will be. There seem to be good arguments for both sides, see e.g. this blog post by Paul Christiano (and the discussion with Brian Tomasik in the comments of that post): https://rationalaltruist.com/2013/06/13/against-moral-advocacy/
Third: In a situation like the one you described above, at least there would be huge room for compromise/gains from trade/... So if future humanity would be split into the three factions you suggested, they would not necessarily fight a war until only one faction remains that can then tile the universe with their preferred version. Indeed, they probably would not, as cooperation seems better for everyone in expectation.
1) I agree that there is some confusion on my part, and on the part of most others I have spoken to, about how terminal values and morality do or do not get updated.
3) I will point to a maybe forthcoming paper / idea of Eric Drexler at FHI that makes this point, which he called "pareto-topia". Despite the wonderful virtues of the idea, I'm unclear if there is a stable game-theoretic mechanism that prevents a race to the bottom outcome when fundamentally different values are being traded off. Specifically in this case, it's possible that different values lead to an inability to truthfully/reliably cooperate - a paved road to pareto-topia seems not to exist, and there might be no path at all.
By "in expectation random", do you mean 0 in expectation? I think there are reasons to expect the effect to be negative (individually), based on our treatment of nonhuman animals. Our indifference to chicken welfare has led to severe deprivation in confinement, more cannibalism in open but densely packed systems, the spread of diseases, artificial selection causing chronic pain and other health issues, and live boiling. I expect chickens' wild counterparts (red jungle fowls) to have greater expected utility, individually, and plausibly positive EU (from a classical hedonistic perspective, although I'm not sure either way). Optimization for productivity seems usually to come at the cost of individual welfare.
Even for digital sentience, if designed with the capacity to suffer -- regardless of our intentions and their "default" level of welfare, and especially if we mistakenly believe them not to be sentient -- we might expect their levels of welfare to decrease as we demand more from them, since there's not enough instrumental value for us to recalibrate their affective responses or redesign them with higher welfare. The conditions in which they are used may become significantly harsher than the conditions for which they were initially designed.
It's also very plausible that many of our digital sentiences will be designed through evolutionary/genetic algorithms or other search algorithms that optimize for some performance ("fitness") metric, and because of how expensive these approaches are computationally, we may be likely to reuse the digitial sentiences with only minor adjustments outside of the environments for which they were optimized. This is already being done for deep neural networks now.
Similarly, we might expect more human suffering (individually) from AGI with goals orthogonal to our welfare, an argument against positive expected human welfare.
Yes, that's what we meant.
I am not sure I understand your argument. You seem to say the following:
My answer: The complete "side-effects" (in the meaning of the article) on sentient tools comprises bringing them into existence and using them. The relevant question seems to be if this package is positive or negative, compared to the counterfactual (no sentient tools). Humanity might bring sentient tools into conditions that are worse for the tools than the conditions they were optimized for. Even these conditions might still be overall positive.
Apart from that, I am not sure if the two assumptions listed as bullet points above will actually hold for the majority of "sentient tools". I think that we know very little about the way tools will be created and used in the far future, which was one reason for assuming "zero in expectation" side-effects.
Isn't it equally justified to assume that their welfare in the conditions they were originally optimized/designed for is 0 in expectation? If anything, it makes more sense to me to make assumptions about this setting first, since it's easier to understand their motivations and experiences in this setting based on their value for the optimization process.
We can ignore any set of tools that has zero total wellbeing in expectation; what's left could still dominate the expected value of the future. We can look at sets of sentient tools that we might think could be biased towards positive or negative average welfare:
1. the set of sentient tools used in harsher conditions,
2. the set used in better conditions,
3. the set optimized for pleasure, and
4. the set optimized for pain.
Of course, there are many other sets of interest, and they aren't all mutually exclusive.
The expected value of the future could be extremely sensitive to beliefs about these sets (their sizes and average welfares). (And this could be a reason to prioritize moral circle expansion instead.)
These are all very good points. I agree that this part of the article is speculative, and you could easily come to a different conclusion.
Overall, I still think that this argument alone (part 1.2 of the article) points into the direction of extinction risk reduction being positive. Although the conclusion does depend on the "default level of welfare of sentient tools" that we are discussing in this thread, it more critically depends on whether future agents' preferences will be aligned with ours.
But I never gave this argument (part 1.2) that much weight anyway. I think that the arguments later in that article (part 2 onwards, I listed them in my answer to Jacy's comment) are more robust and thus more relevant. So maybe I somewhat disagree with your statement:
To some degree this statement is, of course, true. The uncertainty gives some reason to deprioritize extinction risk reduction. But: The expected value of the future (with (post-) humanity) might be quite sensitive to these beliefs, but the expected value of extinction risk reduction efforts is not the same as the expected value of the future. You also need to consider what would happen if humanity goes extinct (non-human animals, S-risks by omission), non-extinction long-term effects of global catastrophes, option value,... (see my comments to Jacy). So the question of whether to prioritize moral circle expansion is maybe not extremely sensitive to "beliefs about these sets [of sentient tools]".
[I'm doing a bunch of low-effort reviews of posts I read a while ago and think are important. Unfortunately, I don't have time to re-read them or say very nuanced things about them.]
[COI I helped fund this work and gave feedback on it.]
I think this is one of the best public analyses of an important question.
Curious how you're thinking about efforts that are intended to reduce x-risk but instead end up increasing it.
e.g. public-facing aerosol injection research:
Uhm... Seems bad? :-)
How sensitive are these conclusions to the ethical views of future people? E.g. what if people in the future are mostly deontologists or have asymmetric population ethics (so may not be motivated to create lots of high welfare beings), and we still evaluate in total utilitarian terms?
Hi Michael, I wrote this 2 years ago and have not worked in this area afterwards. To give a really good answer, I'd probably have to spend several hours reading the text again. But from memory, I think that most arguments don't rest on the assumption of future agents being total utilitarians. In particular, none of the arguments requires the assumption that future agents will create lots of high welfare beings. So I guess the same conclusions follow if you assume deontologist future agents, or ones with asymmetric population ethics. This is particularly true if you think that your idealised, reflected preferences would be close to that of the future agents.
Since the post is very long, and since a lot of readers are likely to be familiar with some arguments already, I think a table of contents in the beginning would be very valuable. I sure would like one.
I see that it's already possible to link to individual sections (like https://www.effectivealtruism.org/articles/the-expected-value-of-extinction-risk-reduction-is-positive/#a-note-on-disvalue-focus) so I don't think this would be too hard to add?
Thanks for the comment. We added a navigable table of contents.