Charting the precipice: The time of perils and prioritizing x-risk

David Bernard

This post is a part of Rethink Priorities’ Worldview Investigations Team’s CURVE Sequence: “Causes and Uncertainty: Rethinking Value in Expectation.” The aim of this sequence is twofold: first, to consider alternatives to expected value maximization for cause prioritization; second, to evaluate the claim that a commitment to expected value maximization robustly supports the conclusion that we ought to prioritize existential risk mitigation over all else. This report considers a time of perils-based case for prioritizing existential risk mitigation.

Executive summary

The time of perils is an important hypothesis among those concerned with existential risk (x-risk). It posits that although x-risk is high now, if we successfully navigate this perilous period, risk will fall to very low levels and we will have a very long and valuable future ahead of us in expectation.
The plausibility of this hypothesis is often used to support the case for x-risk mitigation work having a large positive expected value, sometimes to the extent that x-risk work is overwhelmingly valuable and more cost-effective than all other potential global priorities.
The time of perils is a specific view of how the future is likely to play out which potentially requires a number of assumptions.
We collect and discuss a number of premises potentially required for a version of the time of perils hypothesis that is particularly strong and seems influential in AI x-risk circles. According to this hypothesis, x-risk is currently high due to a number of sources, but especially the possibility of unaligned transformative AI. The primary way of transitioning out of this risky period is by developing and properly deploying aligned transformative AI, which will lead to x-risk levels being drastically reduced and kept very low for a long time. This results in a future that is very long and big in expectation and the expected size of this future is overwhelmingly morally valuable.
These premises fall into a few categories:
- (1) high existential risk now, (e.g., significant chance some misaligned AI will be deployed and seek power)
- (2) transformative AI can significantly reduce existential risk, (e.g., the aligned AI we develop will be sufficiently capable to solve non-AI x-risks)
- (3) a valuable future (e.g., future animal lives will not be too bad)
- (4) moral framework, (e.g., totalism)
All these premises are controversial to varying degrees. Given their controversial and uncertain nature, it seems reasonable to assign a low credence to this version of the time of perils. If one wanted to justify x-risk mitigation work solely based on this version, they would at least have to be comfortable letting their decision-making be driven by small probabilities (<1%) of large value, and potentially they might need to adopt a fanatical decision theory—i.e. one that recommends taking lotteries with tiny probabilities of very large values.

Introduction

Here’s a view that appears to be fairly common among those focused on existential risk (x-risk) due to AI:

We are currently in a time of perils—a period where x-risk is particularly high—mostly due to the threat of misaligned AI. However, if we develop and deploy aligned transformative AI, then x-risk will, with a sufficiently high probability, fall to the point that we’ll no longer be in peril. By definition, aligned AI will not itself be a threat to humanity and, at the same time, it will most likely address all other existential threats. Moreover, it will allow civilization to expand beyond the Earth and make human-originating civilization robust to future existential threats. As a result, humanity will be on a trajectory toward a very long and large future that is overwhelmingly morally valuable. Conversely, if we don’t get aligned AI, then an x-risk such as unaligned AI will likely kill us all or permanently damage our potential to expand outside of the solar system. In either case, the value of human civilization is limited or strongly negative and thus our potential has been wasted—an existential catastrophe. Therefore, the expected value of work to mitigate x-risk, particularly the risk from AI, is much greater than the expected value of work on all other causes.

The general idea of the time of perils comes from Carl Sagan and Derek Parfit, although the idea has received new supporters with the development of AI and x-risk studies. In essence, the current era is pivotal for humanity's long-term future and our actions, especially regarding AI alignment, will determine whether we unlock a prosperous future or face existential catastrophe. Call this the “time of perils-based case for x-risk mitigation’s overwhelming value”—or, for short, “the TOP case”.

Several prominent x-risk scholars have explicitly endorsed elements of this view (though we don’t mean to suggest that they endorse exactly the position just outlined). For instance:

It's quite likely the extinction/existential catastrophe rate approaches zero within a few centuries if civilization survives, [partly because…] Aligned AI blocks threats from misaligned AI (and many other things)... If we're more than 50% likely to get to that kind of robust state, which I think is true… then the life expectancy of civilization is very long, almost as long on a log scale as with 100%.

—Carl Shulman

… if we never get AI, I expect the future to be short and grim. Most likely we kill ourselves with synthetic biology. If not, some combination of technological and economic stagnation, rising totalitarianism + illiberalism + mobocracy, fertility collapse and dysgenics will impoverish the world and accelerate its decaying institutional quality.

—Scott Alexander

If we don't get whacked by the existential risks, the future is probably going to be wonderful.

—Stuart Armstrong

Given a long enough time with our potential intact, I believe we have a very high chance of fulfilling it: that setbacks won’t be permanent unless they destroy our ability to recover. If so, then most of the probability that humanity fails to achieve a great future comes precisely from the destruction of its potential—from existential risk.

—Toby Ord

One might consequently argue that even the tiniest reduction of existential risk has an expected value greater than that of the definite provision of any ‘ordinary’ good, such as the direct benefit of saving 1 billion lives… These considerations suggest that the loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole.

—Nick Bostrom

However, it is difficult to know whether any given element of the TOP case is true. There are, after all, many possible ways the future could play out. For instance, some possibilities include:

We do not develop transformative AI because of technical or political difficulties.
We develop aligned AI but it is not able to address the continuing political or technical barriers associated with farmed and wild animal welfare; as a result, there are ongoing, animal-related moral catastrophes that make the future net negative morally.
AI helps us grow economically and temporarily increases the rate of GDP growth but it doesn't fundamentally revolutionize the world. As such, we're richer but not necessarily safer.
Resource constraints and the intrinsic complexities of existential risks prevent aligned AI from radically reducing x-risk levels; so, humanity goes extinct before it can reach a point of existential security.
Technological advancement amplifies rather than alleviates x-risks as the pace of technological growth surpasses our capacity for risk management. Even with the introduction of aligned transformative AI, x-risk may increase if AI spurs the development of other high-risk technologies such as bioengineering.
We make it through the first time of perils with the help of aligned AI but we face another "great filter" in the future which we fail to make it through.
We decide against pursuing space colonization because it is untenable for physical or political reasons, and that makes the future not overwhelmingly valuable.
Humans meet hostile aliens who wipe out our species before we’ve colonized a sufficient amount of space to create civilisational robustness, and the intervening time is not enough for the future to be overwhelmingly valuable.
We go extinct but aliens with similar values colonize the same space we would have colonized, resulting in similar levels of value.

However, proponents of TOP do not necessarily need to assert that the TOP version of the future is more likely than any of these alternatives. All they need is that the probability of the premises of the TOP case are high enough to generate the conclusion that the value of x-risk mitigation is overwhelming in expectation. Even if you thought there was only a 10% chance of the conclusion of the TOP case being true, when accounting for the vast number of lives at stake if it is true, then you could still generate an expected value of x-risk mitigation that is significantly higher than other causes.

So, what do you need to generate this sort of expected value-based conclusion from the TOP case? At a broad level, we think you need four main claims.

X-risk is currently high and it is tractable to reduce it.
Deploying aligned transformative AI will reduce x-risk to persistently very low levels.
Reducing x-risk will result in a very long and large future of positive value.
Morality is such that acting to realize this future is overwhelmingly morally choiceworthy.

We think that if these four claims were true, then the case for x-risk mitigation being overwhelmingly morally valuable would hold. However, these claims are broad and difficult to evaluate. To make progress, we take the approach of breaking them down into 18 different subpremises to make assessing them more tractable. We discuss each of the premises in turn, why it is important for the TOP case and key uncertainties around the premise.

This work is only an initial step in the direction of evaluating the validity of the TOP case and is incomplete. We discuss how each of these 18 premises feed into the TOP case and key uncertainties about each of them. We do not defend particular credences for each of these premises or the broader claims. We think more research is needed before we would be confident enough to assign such numbers.

Furthermore, we think the 18 premises we generate should be considered as placeholders and not set in stone. The premises do not form a logical argument and it would be reasonable to disagree with the necessity of some of them for a sufficiently strong version of the TOP case to hold. However, it’s plausible that many of these premises are necessary. We simply aim to highlight that there are a relatively large number of premises potentially required and there are deep uncertainties around many of them. Accordingly, it seems reasonable to be uncertain about what probability to assign them and the conclusion of the TOP case.

This is only one of several possible approaches to analyze the TOP case, and surprisingly, given its potential importance, there’s been limited research on this so far. Ord’s The Precipice is the seminal work in this area, but as of writing there are only 6 posts on the EA Forum with the time of perils tag. Adamczewski, Thorstad and Muñoz Morán assess the implications of the TOP case if it is true, but spend little time assessing whether it is true or not. The aim of this report is to start that work and as such, should be treated as a brainstorming exercise for possible components of a sufficiently strong TOP case and not a knock-down argument for or against the TOP case.

All that said, we think it is currently epistemically reasonable to assign a low credence to the conclusion of this version of the TOP case and it’s unclear whether this credence is high enough to justify this particular conclusion of x-risk mitigation work being overwhelmingly valuable.^[1] Given that there are many premises potentially required and we are uncertain about what probability to assign them, the conjunction of them may be highly improbable. ^[2]

Based on a commitment to maximizing expected value, you might still think that even low credences in the conclusion should drive your decision-making if enough value is at stake. However, this would require being comfortable going all in on small probabilities and in the extreme might require your decision-making to depend on tiny probabilities, which you may think is excessively fanatical. Alternative decision theories which place more weight on risk and penalize small probabilities more, will be much less driven by these probabilities.

In the rest of the report, we collect and briefly discuss each of the potential assumptions of the TOP case. We discuss each one briefly and highlight why it is potentially important to the TOP case, as well as uncertainties surrounding it.

High existential risk now

The first empirical component of the TOP case is that we face high existential risk now; we are in the time of perils. Typically this existential risk is seen as coming from a number of sources. The most prominent among these is unaligned artificial intelligence. For example, in The Precipice, Toby Ord assigns an approximately 10% probability of an existential catastrophe from unaligned AI in the next century, versus approximately 3% each to engineered pandemics and unforeseen anthropogenic risks. We focus on and draw from Joe Carlsmith’s report on Existential Risk from Power-Seeking AI in discussing the assumptions that go into AI being an existential threat, as it is one of the most well-known models of existential threat from AI. Carlsmith focuses on the probability of various claims being true by 2070, but the time of perils may last longer than that. The first 5 premises are drawn directly from Carlsmith with some discussion drawn from Good Judgement’s Superforecasting AI.

1. It will become possible and financially feasible to build advanced, strategically aware, agentic planning AI systems

Systems that are advanced, strategically aware, agentic planning AIs, referred to as APS systems, are potentially on the horizon. These characteristics seem to be important prerequisites for AI to pose an existential threat. Systems with these characteristics will be able to strategize and plan, making them distinctively formidable and they could be capable of actively working against humanity. Importantly, these systems need not be superintelligent and effectively omnipotent and omniscient.

However, there is a conceivable chance that the development of such advanced AI systems, even less than superintelligent systems, may never come to fruition. Potential hindrances, such as the resource requirements for AI systems being unexpectedly large or insurmountable technical challenges, could derail the progress and realization of AI systems. A stronger version of this claim is that such systems will be able to improve their own intelligence and this could result in an intelligence explosion but the speed at which transformative AI systems will arrive and improve is much debated.

2. There will be strong incentives to deploy AI

AI systems likely need to be deployed to cause an existential threat, though it is possible that risk can occur during training as well. It seems that there will be compelling incentives to deploy APS systems. Strategic awareness and planning makes them appealing for diverse applications. So, developers may be inclined to deliberately create and deploy them. Moreover, APS systems might emerge inadvertently as a byproduct during training processes that are not directly aimed at achieving APS characteristics.

However, it's worth noting that not all trajectories lead to the deployment of these systems. Potential deterrents to their widespread adoption may arise. Governments, recognizing the latent threats these systems might pose, could introduce stringent legislations against their deployment or attempt to monopolize control over such systems for security purposes. Furthermore, there could be other unidentified factors that might reduce the impetus to deploy APS systems. It's possible that economic considerations, societal pressures, or ethical debates could reduce the incentives to bring such potentially powerful AI systems into our world.

3. It will be much harder to build aligned than misaligned AI

For AI to pose a credible existential threat, the alignment challenge must be non-trivial. If constructing aligned AIs were straightforward, existential concerns from misaligned AI would be moot. Specifically, there must be an inherent risk of unintentionally producing a misaligned AI while attempting to produce an aligned one. One particular type of misalignment that is concerning is that of a misaligned power-seeking AI. Given that power-seeking is advantageous for a myriad of objectives, the principle of instrumental convergence predicts that power-seeking will be a common feature of powerful AI systems, even if the objectives of these systems do not perfectly align with human objectives.

On the flip side, it’s possible that achieving AI alignment might not be as difficult as it appears. John Wentworth, for instance, estimated in 2020 that there's roughly a 10% chance of alignment by default just by continuing on our current trajectory. The ability of large language models to sensibly interpret vague human instructions is evidence that alignment may be easier than expected.

4. Some misaligned AI will be deployed and seek power

The most plausible way to generate AI manifesting as an existential threat, is to assume that certain misaligned AI systems, once deployed, will seek power. It seems like society would likely prefer not to deploy potentially misaligned power-seeking AI, but deployment decisions may not always align with broader societal interests. A private lab, driven by short-term gains or a disregard for potential long-term implications, might willingly risk introducing a power-seeking AI into the environment. Race dynamics and the extra benefits of being a first mover, could further bolster these hasty deployments, even when safety isn't thoroughly vetted.

Again, there are clearly uncertainties around this premise. One might draw parallels between AI and other risky technologies, such as nuclear weapons, which have not proliferated widely, but it’s unclear how relevant these reference classes are. The knowledge and materials needed to build nuclear weapons are tightly restricted. It may be possible to impose similar such information security and trade restrictions on AI in the future, but ever decreasing costs of compute may make these restrictions ineffectual. By the time we develop APS systems, we may have already introduced government legislation or strong cultural norms against the deployment of risky systems.

Furthermore, despite instrumental convergence being an appealing conceptual argument for power seeking behavior there currently seems to be little empirical evidence of this behavior in existing public AI models. It’s possible that deployed AIs will only be slightly misaligned and therefore will not constitute an existential catastrophe (but may still be a global catastrophic risk).

5. Some power-seeking systems will permanently disempower all of humanity

The final step in Carlsmith’s argument is to note certain power-seeking systems might lead to the irreversible disempowerment of humanity and that this would be an existential catastrophe. We might notice power-seeking tendencies in an AI and seek to correct them, but given a powerful enough system, we might not be capable of doing so. The AI system, once crossing a certain threshold of dominance, could render humanity perpetually powerless and failing to fulfill our potential.

The intensity of power-seeking behavior can be imagined on a spectrum, with certain thresholds separating benign from catastrophic outcomes. It remains a possibility that we could develop AI systems that, despite their power-seeking tendencies, remain below the critical threshold, ensuring that humanity retains control and avoids the direst outcomes. We might think of this as an AI system being analogous to an existing powerful state that has its own interests but does not outcompete other actors.

Furthermore, it seems likely that the probability of an existential catastrophe will be much lower than the probability of a ‘merely’ global catastrophe. For instance, the Existential Risk Persuasion Tournament found that superforecasters and domain experts thought AI extinction risk was one to two orders of magnitude less likely than catastrophic risk, depending on the forecast horizon and forecaster type. Delineating between a global catastrophe and an existential catastrophe is likely to remain a challenge, especially when it comes to AI. Specifically, the nature of the future risk from non-aligned AI is highly variable: Will it lead to widespread suffering and disvalue or result in a universe devoid of value altogether or something more mild? The differentiation between these scenarios is significant and has direct implications for the value at stake if the TOP case is true.

6. Other risks are non-negligible but don’t guarantee our extinction before aligned AI is developed

While AI stands out as a significant existential risk, it is by no means the sole threat on the horizon. Other formidable risks, such as engineered pandemics or nuclear weapons, must also be acknowledged. These threats, too, have a role in the time of perils hypothesis, marking the era as one fraught with multiple dangers. Yet, for the hypothesis to remain tenable, these non-AI risks can't be too overwhelming. If they were, humanity might face extinction before the challenge of creating aligned AI is even broached.

Consider, for example, the prevention of biorisks. It seems possible that at some point before the development of transformative AI, the net balance of offensive biorisk threats that could cause extinction and the defensive capabilities designed to stop those threats could heavily favor the offensive side that could cause extinction. If that were to happen, human extinction might become very likely during such a period.

Ord's previously cited risk estimates seem compatible with the time of perils framework. However, these estimates carry a degree of uncertainty. Should the actual risks be significantly higher or lower than projected, or the downsides of the risk occurring not as bad as they might seem, then the time of perils case could be called into question. As in the previous section, for the TOP case to hold weight, it's not sufficient for these threats to pose merely a global catastrophic risk. The risks must be truly existential, threatening the continued existence or potential of humanity. Otherwise, the world may still end up containing enormous amounts of value after the occurrence of a non-existential risk.

Significantly reduced existential risk

The next step in the time of perils case is for aligned AI to significantly reduce the levels of existential risk from all sources. This includes the risk from misaligned AI, but also all other x-risks. Not only do these risks have to be solvable, but our AI systems must be powerful enough to solve them, and keep them solved indefinitely. Significantly reduced levels of future risk seem to be crucial for ensuring that the future is sufficiently long for it to be overwhelmingly valuable. AI is the most commonly discussed way of reducing future risks so we focus on this, although there are potentially non-AI routes to ending the time of perils as well, such as robustness from space colonization or advanced non-AI technologies. This section draws partly from Finnveden, Riedel, and Shulman’s AGI and Lock-In. ^[3]

7. Aligned AI is sufficiently capable to solve other x-risks

In the framework of the TOP case, the first objective of an aligned AI is to address and neutralize other x-risks, which is a critical prerequisite for humanity to achieve a long future. Therefore the AI systems we develop need to be capable and powerful enough to solve, or at least significantly mitigate, all other x-risks in order to secure our future. But a question emerges: are there x-risks that are beyond the power of potential AI systems?

As discussed in premise 6, in a pre-AI world there is the possibility of an imbalance in offensive and defensive capabilities in biorisk threats and prevention. In a post-AI world, a potential imbalance would need to be resolved by some combination of (i) the technological future remaining around neutral in offensive versus defensive capabilities or leaning towards prevention over extinction threats, and/or (ii) the balance leaning technologically towards extinction threats but AI stopping actors from using such weapons. The former situation seems to be a forecast about the future nature of biorisk that doesn't seem clearly resolvable given existing evidence, while the latter situation might be politically untenable, depending on how much monitoring and surveillance might be required.

Classic literature on superintelligent AIs often paints a picture of them as bordering on omnipotent and omniscient. This perception extends even to human-level AIs, essentially positing that their rapid coordination capabilities aggregate to a form of omnipotence and omniscience. However, it seems reasonable to approach these claims with skepticism. The exact capabilities of advanced AI systems remains a matter of uncertainty. Furthermore, it may be the case that we have to sacrifice on capabilities in order to reduce the risk of misalignment, and this may reduce the ability of the AIs we do end up developing to address the x-risks we face.

Assessing the capability of such an AI becomes paramount, especially when considering the difference between merely reducing risks and virtually eliminating them. Achieving near-complete risk reduction might require an unreasonably advanced AI. Natural phenomena like super volcanoes, gamma ray bursts, and solar flares present significant challenges. While AI can potentially predict or mitigate the effects of such events, their complete prevention may be outside the realm of AI capabilities, either due to technical or political challenges. Some anthropogenic risks like great power wars are likely to be even more difficult to solve than natural risks. Furthermore, introducing AI systems into society may make these anthropogenic risks even harder to solve if, for example, it speeds up arms races. This emphasizes the need to assess the difficulty of solving various x-risks, even for an agent more powerful than ourselves.

8. Aligned AI also eliminates the risk of misaligned AI

After successfully deploying an aligned AI, we face another challenge: the potential development and deployment of a misaligned AI. Even in the presence of an aligned AI, a misaligned counterpart could still introduce significant existential risks. Therefore, the goal isn't just to create an aligned AI but to ensure it has the capacity to counteract threats from any misaligned AI that might arise later.

AGI and Lock-In suggests that a dominant aligned AI would likely be resilient against challenges from misaligned AI systems. However, these claims are still speculative given our current understanding. There are critical unanswered questions in this area. For instance, how capable does an aligned AI need to be to effectively counter misaligned AI threats? Additionally, to maintain the security provided by an aligned AI, what freedoms might humans need to relinquish? And at what point does this trade-off become unacceptable for us?

9. Aligned AI remains aligned indefinitely

AI alignment is not just about achieving initial compatibility with human values. For us to fully actualize our potential, AI alignment needs to be persistent, with minimal risk of deviation over successive iterations or periods. This continuing alignment is essential for two primary reasons.

Firstly, AI systems must be robust against any external malevolent influences, ensuring that bad actors cannot corrupt or manipulate their operations. Secondly, we must mitigate "value drift", which refers to the divergence of AI's objectives from those of humanity at the time.

AGI and Lock-In touches upon the challenges and potential solutions for maintaining consistent AI alignment. However, given the importance of persistence, it seems like more in-depth research on this topic is required.

10. No bad actors ever control intent-aligned AI

A merely intent-aligned AI follows the specific objectives of its developers regardless of whether those objectives are good for achieving one among the best possible futures. Even if we manage to deploy intent-aligned AI that aligns with the developers' specific objectives, a new challenge emerges: the potential misuse by bad actors. Just as the deployment of an aligned AI doesn't guarantee the absence of existential threats from misaligned AI, the presence of intent-aligned AI doesn't mean it's immune to being controlled or misused by bad actors who could cause a lot of harm.

Bad actors could come in a number of varieties. First, there could be actors who use intent-aligned AI to maximize their share of the pie without increasing the size of the pie and creating benefits for everyone else. Secondly, there could be classically bad actors such as terrorist groups who use the AI to actively harm people and curtail humanity’s potential. Finally, under the TOP case it may even be the case that actors not conventionally seen as bad could result in existential catastrophe. If the actors who control AI lack expansionist ambitions, they might fail to colonize the stars and create a grand future. Of course, uncertainties still remain about the possibility of AI being created by any of these bad actors, and just how much would be lost by this occurring.

11. No unknown unknowns

Another premise is that there are no unknown unknown risks. In particular, the TOP case seems to require that we (or at least the AI systems we develop) will be aware of all existential risks and have a strategy for dealing with them before they wipe us out. This includes there being no unknown great filters ahead of us, which may drastically shorten our expected lifespan. If new, currently unconceived of risks were to emerge, these could overshadow currently recognised x-risks and make our current mitigation efforts irrelevant, misdirected or possibly even harmful.

The very nature of unknown unknowns makes them uncertain. How can you foresee and prepare for a threat that is currently beyond our comprehension? We may think that technological progress is unbounded in which case potential new threats will always be out there. Even if we think technological progress is bounded, there may still be technologies to be discovered that we haven’t yet thought of that pose existential threats themselves.

A big, positive future

A key part of the TOP case is that the future that is threatened is potentially immensely valuable. This potentially ‘astronomical’ value is what is important for making the claim that x-risk mitigation work radically dominates other global priorities. If we limit ourselves to the next few generations, then x-risk work may be in a similar magnitude of cost-effectiveness to global health and farmed animal interventions. If we look beyond that, there are a number of different ways to model future value with very different potential values of the future.

12. Aligned AI will produce lots of value

One way for the future to end up being valuable is to have an aligned transformative AI aimed at making it so. There are a few different ways that this could be done that have been discussed so far. The most standard way is to develop our abilities to colonize and settle space, and then create many new worlds as valuable as Earth. Alternatively we could develop digital minds, create many of them more efficiently than biological humans and give them blissful lives.

It remains to be seen what capabilities future AI systems will have. Even if they will be sufficiently powerful to reduce existential risk to arbitrarily low levels, this need not imply that they will be able to solve problems such as space travel and colonization. Furthermore, if AI is democratically aligned with a broad coalition of humanity’s interests, it seems plausible that although humanity might broadly agree about the value of preventing existential threats, we might disagree about whether we should colonize space or create digital minds, and as such an aligned AI may not assist in achieving those ends.

13. Human or human-descended lives will be good

Human or human-descended lives may fill the future. For this to be morally good we need that the lives of these individuals are themselves good at least on average, or that they surpass a neutral or acceptable threshold of welfare.

One difficulty is knowing what the neutral welfare point is and the fraction of human lives that are currently above it. Further, we shouldn't be fully confident that welfare ought to be the primary way to evaluate the goodness or badness of a life. Regardless, this difficulty may not be so important if we can assume that the positive trends we’ve seen over the last couple of centuries in economic growth, health and quality of life continue. However, we may harbor some uncertainties about whether these trends will continue. We also have examples of negative trends over the past centuries such as growing carbon emissions that might disrupt the positive trajectory.

Furthermore, it’s also possible that there aren’t incredibly many human lives in the future. If current declining fertility trends continue as they are forecasted to, then the human population will shrink rather than expand, significantly reducing the potential value of the future. Of course, future population trends are not certain and there is the potential for governments to respond with policies aimed at increasing fertility.

14. Animal lives will not be too bad

The potential future is not restricted to human lives; the value of the future may be significantly influenced by the welfare of non-human animals. We might transfer farmed and wild animals beyond our planet, either intentionally or inadvertently. When considering the potential scale, and depending on one's stance regarding the moral patienthood of animals, the collective well-being of animals could surpass that of humans in moral importance.

However, history presents a mix of optimism and concern. While some advances in animal rights and welfare have been made, large-scale operations like factory farming continue to raise significant ethical issues. The state of wild animals, too, is contentious, with the possibility that many wild animals might lead net-negative lives due to natural suffering. Our negligence toward the well-being of animals may be a potential moral catastrophe.

The trajectory of animal welfare is very uncertain. It may be possible that all problems involving animals will be solved by aligned AI or AI could continue to perpetuate farmed and wild animal suffering. For example, if humans want cultivated meat, then it seems plausible that AI could significantly advance that aim within the next decade. However, humans also often want "natural" foods and processes, which may result in the continuation of farming and significant amounts of wild animal suffering. Furthermore, it may be the case that ecosystems are too complex for even advanced AIs to intervene upon with reliably positive consequences. The treatment of animals in the future will likely hinge on the interaction of societal values, technological advancements, and consumer demands and much more research could be done to project these trends.

15. Other (digital) minds matter and will not be too bad

For the strongest versions of the TOP case with the largest amounts of potential future value at stake, we need to assume that digital minds will be created and that they have valuable lives. This is because digital minds will not have our biological limitations; so, they will be able to convert energy more efficiently to utility and more easily deal with the difficulties of space travel necessary for exploiting the full resources of the accessible universe.

The science of the consciousness of digital substrates is an emerging field and seems to be even harder than assessing the consciousness of non-human animals. AI systems trained on data from humans may game behavioral tests of consciousness, producing outputs that are similar to sentient humans but with a radically different underlying process that may or may not be conscious. It seems like it will be difficult to evaluate whether digital systems are sentient and worthy of our moral concern.

Furthermore, it remains to be seen how we will treat such digital minds if they are conscious (or we are uncertain about whether they are conscious) and whether they will have net positive lives. If digital minds are not given human rights, then there will be economic pressure to exploit and control them, potentially creating dystopian futures. Even an intent-aligned AI could in theory produce unhappy digital slaves if that were the best way to produce what humans wanted. Alternatively, if digital minds have positive lives then our ability to duplicate and create many more of them could produce incredibly large amounts of value.

16. We will not encounter aliens

Another tacit assumption is that humanity's future will be solitary, uninfluenced by external extraterrestrial entities. However, encountering an alien intelligence could profoundly shift the course of human history and the set of existential risks we face. Alien civilizations could potentially have superior technologies and knowledge to us. Depending on whether first contact with them is harmonious or adversarial, they might either accelerate our progress and raise our future potential or be an existential threat themselves that reduces or eliminates our long-term potential.

One uncertainty is whether the universe does contain extraterrestrial life as showcased in the Fermi Paradox. If it turns out that there is extraterrestrial life, then we would want to learn more about the technological prowess or motives of any alien civilization although this seems like an elusive challenge. There is also the question of how bad it would be if we were replaced by aliens. If they shared similar values to us then we might think the loss of humanity’s potential would not be so bad from the perspective of the universe.

Moral framework

Finally we conclude with a discussion of the moral framework commonly used to evaluate the value of the future. We focus on two of the most important components: namely, totalism and minimal discounting of future lives. We frame these in axiological terms, and discuss what these imply in terms of the potential value. To translate this into a claim about what we ought to or are permitted to do, we additionally need some decision theoretical principle such as expected value maximization. Adding this makes the case more controversial and we discuss it elsewhere in the sequence, so, we do not include discussion of it here.

17. Additive axiology

An additive axiology is one that assesses the value of a state of the universe by summing up (potentially weighted) individual values within that state. These can be opposed to average axiologies, where the relevant difference is that adding new sources of positive value always increases axiological value in an additive framework, but not necessarily in an average framework. An additive axiology allows the many individual sources of value in the potential big future to be aggregated to an overwhelmingly large total value.

Totalism is one popular axiological framework that measures the value of a state of the universe by simply adding up the well-being present in each individual life within that state across all time. In essence, the total value is just the sum of values from every life that has ever existed or will exist. The key distinction here is that all lives count towards value, including those in the future, not just those currently living as in a person-affecting view. Because of this, the total value can grow indefinitely, constrained only by the physical limits of the universe. Accordingly, it is possible for the value of overwhelmingly many future potential individuals to be considered as overwhelmingly axiologically valuable. Moreover, the framework implies that failing to realize the potential of these future lives, possibly by failing to cause them to exist, leads to a significant loss in the total value of the universe.

However, the domain of population ethics and axiology, which concerns the ethical implications of bringing individuals into existence, is wickedly complex and controversial. Various impossibility theorems and conflicting intuitions about axioms make clear conclusions elusive. Whatever the appeal of a totalist axiology, several other theories and variations of population ethics are potential contenders, such as person-affecting and critical-level views, and allowing for moral uncertainty should also be given some weight.

18. Minimal discounting of future lives

The positive outcome of the TOP case is a very long future populated by numerous lives. For this long future to hold substantive moral weight, future lives must not be significantly discounted. Using standard exponential discounting, assigning any positive value to the rate of pure time preference would result in the diminishing moral value of these future lives as we project further into the future, with future lives holding no value in the limit. To be clear, some discounting future lives is compatible with the TOP case. If you held that all future lives were only worth 1% of present lives, with this discount being constant over time, you could still conclude that the future is immensely valuable because of the sheer number of possible future lives.

Some philosophers, although not all, argue that the rate of pure time preference should be zero, implying that future well-being holds equal weight to present well-being. Conversely, many economists argue for a non-zero discount rate, often for reasons tied to evidence that humans do value the present more than the future, uncertainty surrounding the future, and opportunity costs.

Conclusion

The time of perils case presents a potentially compelling vision of our current existential situation, emphasizing the weightiness of our choices today, especially in the realm of AI alignment. Credence in TOP can have massive implications for what we ought to do today, but so far the hypothesis has received relatively little exploration and formalization. To that end, this report underscores the number of different assumptions that underpin a strong variety of this case. Each of these assumptions carries with it its own uncertainties, and many are still in the preliminary stages of research.

While the TOP case's narrative might appeal to our intuitive understanding of x-risk, it is not yet particularly well-researched and is potentially built upon a number of different premises. When these premises are taken together, the collective likelihood of the TOP case being the future we actually face seems quite uncertain. We think more work should be done to build more formal models of the time of perils and what is required for our exit from these times, which can guide future forecasting of these potentially crucial considerations.

Our aim was not to debunk the TOP case but to shine a light on its underlying components and the complexities involved. By doing so, we hope to encourage more nuanced discussions, research, and potentially recalibrations of priorities in the field of cause prioritization. If the most powerful and prominent argument for the overwhelming value of x-risk mitigation rests on this specific understanding of our era, then it seems important to engage deeply with each assumption. This might involve bolstering evidence for some, refuting others, or identifying overlooked aspects that might further shape our understanding.

A number of open questions remain regarding the time of perils and cause prioritization. Firstly, we’d be interested in seeing more formal models of the time of perils, assessing what the minimal set of credible assumptions is for the time of perils case to go through and assessing these assumptions in more depth than we were able to do here. Secondly, we focused on an AI version of the time of perils case since it seems like the most commonly discussed version. It seems plausible that there are other non-AI based ways to significantly reduce x-risk which should also be further explored. Finally, our definition of how valuable the future needs to be for it to qualify as "overwhelmingly valuable" remains somewhat ambiguous. Further studies determining the scale and duration of the future required for x-risk mitigation to be overwhelmingly valuable would be useful.

Acknowledgements

This report was written by David Bernard. Thanks to members of the Worldview Investigations Team – Hayley Clatterbuck, Laura Duffy, Bob Fischer, Arvo Muñoz Morán and Derek Shiller – Arden Koehler, David Thorstad, Gustav Alexandrie, Michael Aird, Bill Anderson-Samways, Marcus A. Davis, Kieran Greig for helpful discussions and feedback. The post is a project of Rethink Priorities, a global priority think-and-do tank, aiming to do good at scale. We research and implement pressing opportunities to make the world better. We act upon these opportunities by developing and implementing strategies, projects, and solutions to key issues. We do this work in close partnership with foundations and impact-focused non-profits or other entities. If you're interested in Rethink Priorities' work, please consider subscribing to our newsletter. You can explore our completed public work here.

^{^}
To be clear, we are only assessing one argument for x-risk work being overwhelmingly valuable and there are obviously other possible arguments as well, which readers might find more or less compelling. These include (1) non-AI-induced versions of the time of perils, which are disjunctive to the AI-induced version we discuss here and we would also be keen to see more fleshed out, and (2) the common-sense case for x-risk, which we explore in further detail elsewhere in this sequence.
^{^}
For example, suppose you thought that all 18 premises were necessary for the TOP case to get off the ground, you had the same credence in each premises, and the truth of each premise was independent from all the others. If you had a 50% credence in each premise, this would result in a 0.00038% for the conclusion. If your premise credences were 25% or 75%, your credences in the conclusion would be 0.00000015% and 0.56% respectively. This calculation is subject to the multiple stage fallacy, however, we still think it is useful for illustrative purposes.To be clear, these numbers should mean little to you as you may think there are more or fewer components required for the most compelling version of TOP for you, your credences in each of them will vary, and the truth status of the different components will be correlated in different ways.
^{^}
It’s important to note that many of the premises in this section (and potentially the previous section) are likely correlated. For example, if we are in a world where an AI is able to solve non-AI x-risks, this also increases the likelihood that it is able to prevent the development of a misaligned AI.

^{^}

Compensation is roughly the principle “that we can always compensate somehow for making things worse nearby, by making things sufficiently better far away (and vice versa)” (Russell, 2023, where it's also stated formally). It is satisfied pretty generally by theories that are impartial in deterministic finite cases, including total utilitarianism, average utilitarianism, variable value theories, prioritarianism, critical-level utilitarianism, egalitarianism and even person-affecting versions of any of these views. In particular, theoretically “moving” everyone to nearby or “moving” everyone to far away without changing their welfare levels suffices.

Benevolent_RainOct 25 202321

I love your CURVE sequence. It feels very EA to be this rigorous and open minded about cause prioritization, and double checking that we are currently on a course for having maximum impact.

MichaelPlantOct 25 202320

Thanks for this. I think this is very valuable and really appreciate this being set out. I expect to come back to it a few times. One query and one request from further work - from someone, not necessarily you, as this is already a sterling effort!

I've heard Thorstad's TOP talk a couple of times, but it's now a bit foggy and I can't remember where his ends and yours starts. Is it that Thorstad argues (some version of) longtermism relies on the TOP thesis, but doesn't investigate whether TOP is true, whereas you set about investigating if it is true?
The request for further work: 18 is a lot of premises for a philosophical argument, and your analysis is very hedged. I recognise you don't want to claim too much but, as a reader who has thought about this far less than you, I would really appreciate you telling me what you think. Specifically, it would be useful to know which of the premises are the most crucial, in the sense of being least plausible. Presumably, some of 18 premises we don't need to worry about, and our attention can concentrate on a subset. Or, if you think all the premises are similarly plausible, that would be useful to know too!

David BernardOct 25 202314

Hi Michael, thanks for this.

On 1: Thorstad argues that if you want to hold both claims (1) Existential Risk Pessimism - per-century existential risk is very high, and (2) Astronomical Value Thesis - efforts to mitigate existential risk have astronomically high expected value, then TOP is the most plausible way to jointly hold both claims. He does look at two arguments for TOP - space settlement and an existential risk Kuznets curve - but says these aren’t strong enough to ground TOP and we instead need a version of TOP that appeals to AI. It’s fair to think of this piece as starting from that point, although the motivation for appealing to AI here was more due to this seeming to be the most compelling version of TOP to x-risk scholars.

On 2: I don’t think I’m an expert on TOP and was mostly aimed at summarising premises that seem to be common, hence the hedging. Broadly, I think you do only need the 4 claims that formed the main headings (1) high levels x-risk now, (2) significantly reduced levels of x-risk in the future, (3) a long and valuable / positive EV future, and (4) a moral framework that places a lot of weight on this future. I think the slimmed down version of the argument focuses solely on AI as it’s relevant for (1), (2) and (3), but as I say in the piece, I think there are potentially other ways to ground TOP without appealing to AI and would be very keen to see those articulated and explored more.

(2) is the part where my credences feel most fragile, especially the parts about AI being sufficiently capable to drastically reduce other x-risks and misaligned AI, and AI remaining aligned near indefinitely. It would be great to have a better sense of how difficult various x-risks are to solve and how powerful an AI system we might need to near eliminate them. No unknown unknowns seems like the least plausible premise of the group, but its very nature makes it hard to know how to cash this out.

JWS 🔸Oct 25 202313

This is another great post from this already-great sequence. However, I think this one may have the most profound implications for EA and I think it deserves more reflection from Forum and Community members. It certainly seems like it undercuts the overwhelming value thesis that has supported a lot of EA's movement towards longtermism, and my intuitive sense is that many longtermist or longtermist-adjacent EAs haven't really reckoned with these assumptions or used them in their explicit or implicit EV calculations.

Are there any more plans from the RP team not just to lay out the assumptions here, but perhaps quantify them even if it's at an OOM level (I'm thinking of the Sandberg/Drexler/Ord paper on the Fermi Paradox here) and seeing under what scenarios longtermist interventions hold up?

Bob FischerOct 26 20234

Thanks so much for the vote of confidence, JWS. While we'd certainly be interested in working more on these assumptions, we haven't yet committed to taking this particular project further. But if funding were to become available for that extension, we would be glad to keep going!

Michael St Jules 🔸Oct 24 202311

Whether or not aligning AI results in a big positive future, it could still have a huge positive impact between futures. Independently of this, if AI alignment work also includes enough s-risk-mitigating work and avoids enough s-risk-increasing work, it could also prevent big negative futures.

I'm not sure whether or not it does reduce s-risk on net, but I'm not that informed here. The priorities for s-risk work generally seem different from alignment and extinction risk reduction, at least according to CLR and CRS, but AI-related work is still a/the main priority.

EDIT: I decided to break up this comment, making the more general point here and discussing specific views in my reply.

Michael St Jules 🔸Oct 24 202314

On other specific axiologies:

Whatever the appeal of a totalist axiology, several other theories and variations of population ethics are potential contenders, such as person-affecting and critical-level views, and allowing for moral uncertainty should also be given some weight.

Both person-affecting views and critical level utilitarianism are compatible with the dominance of very low probability events of enormous value, and could imply it if we're just taking expected values over a sum.

Critical level utilitarianism is unbounded, when just taken to be the sum of utilities minus a uniform critical level for each moral patient or utility value. Wide person-affecting views, according to which future people do matter, but just in the sense of it being better for better off people to exist than worse off people, and asymmetric person-affecting views, where bad lives are worth preventing, can also generate unbounded differences between potential outcomes and make recommendations to reduce existential risk (assuming risk and ambiguity neutral expected "value" maximization), especially to avoid mediocre or bad futures. See Thomas, 2019, especially section 6 Extinction Risk Revisited. Others may have made similar points.

(The rest of this comment is mostly copied from the one I made here.)

For narrow person-affecting views, see:

Gustafsson, J. E., & Kosonen, P. (20??). Prudential Longtermism.
Carl Shulman. (2019). Person-affecting views may be dominated by possibilities of large future populations of necessary people.

Basically, existing people could have extremely long and therefore extremely valuable lives, through advances in medicine, anti-aging and/or mind uploading.

That being said, it could be the case that accelerating AI development is good and slowing it is bad on some narrow person-affecting views, because AI could help existing people have extremely long lives, through its contributions to medicine, anti-aging and/or mind uploading. See also:

Matthew Barnett. (2023). The possibility of an indefinite AI pause, section The opportunity cost of delayed technological progress.
Chad I. Jones. (2023). The A.I. Dilemma: Growth versus Existential Risk. (talk, slides).

I think most people discount their future welfare substantially, though (perhaps other than for meeting some important life goals, like getting married and raising children), so living so much longer may not be that valuable according to their current preferences. To dramatically increase the stakes, one of the following should hold:

We need to not use their own current preferences and say their stakes are higher than they would recognize them to be, which may seem paternalistic and will fail to respect their current preferences in other ways.
The vast majority of the benefit comes from the (possibly small and/or atypical) subset of people who don't discount their future welfare much, which gets into objections on the basis of utility monsters, inequity and elitism (maybe only the relatively wealthy/educated have very low discount rates). Or, maybe these interpersonal utility comparisons aren't valid in the first place. It's not clear what would ground them.

David BernardOct 24 20236

I was somewhat surprised by the lack of distinction between the cases where we go extinct and the universe is barren (value 0) and big negative futures filled with suffering. The difference between these cases seem large to me and seems like they will substantially affect the value of x-risk and s-risk mitigation. This is even more the case if you don't subscribe to symmetric welfare ranges and think our capacity to suffer is vastly greater than our capacity to feel pleasure, which would make the worst possible futures way worse than the best possible futures are good. I suspect this is related to the popularity of the term 'existential catastrophe' which collapses any difference between these cases (as well as cases where we bumble along and produce some small positive value but far from our best possible future).

Michael St Jules 🔸Oct 24 20235

The aliens (including alien-descended AI) could also themselves be moral patients, and there are some other possibilities worth considering if this is true:

We could help them.
We could harm them.
They could limit our expansion and take the space we would have otherwise, with or without catastrophic conflict. In that case, the future could still be very morally valuable (or disvaluable), but our future could be much smaller in value. Or, we could be replaceable.
We could limit their expansion. This could help or harm them, depending on the value of their existences, and could help or harm other aliens that they would have otherwise encountered. It also could make us replaceable.

(By "we" and "our", I mean to include our descendants, our tech and tech descended from us, including autonomous AI or whatever results from it.)

It also seems worth mentioning grabby alien models, which, from my understanding, are consistent with a high probability of eventually encountering aliens if we survive.

David BernardOct 24 20238

Thanks for highlighting this Michael and spelling out the different possibilities. In particular, it seems like if aliens are present and would expand into the same space we would have expanded into had we not gone extinct, then for the totalist, to the extent that aliens have similar values to us, the value of x-risk mitigation is reduced. If we are replaceable by aliens, then it seems like not much is lost if we do go extinct, since the aliens would still produce the large valuable future that we would have otherwise produced.

I have to admit though, it is personally uncomfortable for my valuation of x-risk mitigation efforts and cause prioritisation to depend partially on something as abstract and unknowable as the existence of aliens.

Michael St Jules 🔸Oct 24 20234

Doesn't the argument go through even if other (non-AI) risks are negligible, as long as AI risk is not negligible? I think you just want "Other risks don't guarantee our extinction before aligned AI is developed".

David BernardOct 24 20233

Yep, I agree you can generate the time of perils conclusion if AI risk is the only x-risk we face. I was attempting to empirically describe a view that seem to be popular in the x-risk space here, that other x-risks beside AI are also cause for concern, but you're right that we don't necessarily need this full premise.

I just wanted to flag some other positive cases for exponential discounting of the interests of future moral patients just for their temporal locations, as in discounted utilitarianism:

Asheim, 2010 has representation theorems with forms of discounted utilitarianism (+ an asymptotic part) under certain fairly intuitive and plausible assumptions. These results are also discussed in West, 2015.
Russell, 2022 (45:40 to the end) shows that discounted utilitarianism seems to be much better behaved (satisfies more desirable principles) in many ways than other versions of utilitarianism, although obviously giving up impartiality. Some impossibility results also guarantee the inconsistency of impartiality with other highly intuitive principles in cases involving infinitely many possible moral patients. West, 2015 discusses some in cases with infinitely many actual moral patients, and there's of course the inconsistency of Strong Pareto and full (infinite) impartiality. There are also some impossibility theorems for when only finitely many will ever exist, as long as their number (or aggregate value) is unbounded and heavy-tailed in some prospect; see Goodsell, 2021 and Russell, 2023 (and I discuss these results in my Anti-utilitarian theorems section of a post). "If you held that all future lives were only worth 1% of present lives, with this discount being constant over time" also satisfies Compensation,^[1] which is jointly inconsistent with Separability and Stochastic Dominance, according to Theorem 4 of Russell, 2023.

Of course, giving up impartiality seems like a very significant cost to me.

^{^}
Compensation is roughly the principle “that we can always compensate somehow for making things worse nearby, by making things sufficiently better far away (and vice versa)” (Russell, 2023, where it's also stated formally). It is satisfied pretty generally by theories that are impartial in deterministic finite cases, including total utilitarianism, average utilitarianism, variable value theories, prioritarianism, critical-level utilitarianism, egalitarianism and even person-affecting versions of any of these views. In particular, theoretically “moving” everyone to nearby or “moving” everyone to far away without changing their welfare levels suffices.

Vasco Grilo🔸Oct 28 20231

Hi David,

9. Aligned AI remains aligned indefinitely
AI alignment is not just about achieving initial compatibility with human values.

I am not sure what "human values" refers to here, but, from my perspective, the goal is aligning transformative AI with impartial values, not human values. In particular, I would be happy for transformative AI to maximise expected total hedonistic wellbeing, even if that implies human extinction (e.g. maybe humans will do everything to maintain significant amounts of wildlife, and AI concludes wildlife is full of suffering). This related to a point you make in the next section:

Finally, under the TOP case it may even be the case that actors not conventionally seen as bad could result in existential catastrophe. If the actors who control AI lack expansionist ambitions, they might fail to colonize the stars and create a grand future.

Effective Altruism Forum
EA Forum

Charting the precipice: The time of perils and prioritizing x-risk

91

Executive summary

Introduction

High existential risk now

1. It will become possible and financially feasible to build advanced, strategically aware, agentic planning AI systems

2. There will be strong incentives to deploy AI

3. It will be much harder to build aligned than misaligned AI

4. Some misaligned AI will be deployed and seek power

5. Some power-seeking systems will permanently disempower all of humanity

6. Other risks are non-negligible but don’t guarantee our extinction before aligned AI is developed

Significantly reduced existential risk

7. Aligned AI is sufficiently capable to solve other x-risks

8. Aligned AI also eliminates the risk of misaligned AI

9. Aligned AI remains aligned indefinitely

10. No bad actors ever control intent-aligned AI

11. No unknown unknowns

A big, positive future

12. Aligned AI will produce lots of value

13. Human or human-descended lives will be good

14. Animal lives will not be too bad

15. Other (digital) minds matter and will not be too bad

16. We will not encounter aliens

Moral framework

17. Additive axiology

18. Minimal discounting of future lives

Conclusion

Acknowledgements

91

Reactions

More posts like this

A big, positive future

16. We will not encounter aliens

6. Other risks are non-negligible but don’t guarantee our extinction before aligned AI is developed

18. Minimal discounting of future lives

9. Aligned AI remains aligned indefinitely