State Space of X-Risk Trajectories

David_Kristoffersson; JustinShovelain

Justin Shovelain developed the core ideas in this article and assisted in writing, David Kristoffersson was the lead writer and editor.

Cross-posted on LessWrong

Abstract

Currently, people tend to use many key concepts informally when reasoning and forming strategies and policies for existential risks (x-risks). A well-defined formalization and graphical language for paths and choices would help us pin down more exactly what we think and let us see relations and contrasts more easily. We construct a common state space for futures, trajectories, and interventions, and show how these interact. The space gives us a possible beginning of a more precise language for reasoning and communicating about the trajectory of humanity and how different decisions may affect it.

Introduction

Understanding the possible trajectories of human civilization and the futures they imply is key to steering development towards safe and beneficial outcomes. The trajectory of human civilization will be highly impacted by the development of advanced technology, such as synthetic biology, nanotechnology, or artificial general intelligence (AGI). Reducing existential risks means intervening on the trajectory civilization takes. To identify effective interventions, we need to be able to answer questions like: how close are we to good and bad outcomes? How probable are the good and bad outcomes? What levers in the system could allow us to shape its trajectory?

Previous work has modeled some aspects of existential risk trajectories. For example, various surveys and extrapolations have been made to forecast AGI timelines [1; 2; 3; 4; 5], and scenario and risk modeling have provided frameworks for some aspects of risks and interventions [6; 7; 8]. The paper Long Term Trajectories of Human Civilization [9] offers one form of visualization of civilizational trajectories. However, so far none of these works have defined a unified graphical framework for futures, trajectories, and interventions. Without developing more and better intellectual tools to examine the possible trajectories of the world and how to shape them, we are likely to remain confused about many of the requirements for reaching a flourishing future, causing us to take less effective actions and leaving us ill-prepared for the events ahead of us.

We construct a state space model of existential risk, where our closeness to stably good and bad outcomes is represented as coordinates, ways the world could develop are paths in this space, our actions change the shapes of these paths, and the likelihood of good or bad outcomes is based on how many paths intercept those outcomes. We show how this framework provides new theoretical foundations to guide the search for interventions that can help steer the development of technologies such as AGI in a safe and beneficial direction.

This is part of Convergence’s broader efforts to construct new tools for generating, mapping out, testing, and exploring timelines, interventions, and outcomes, and showing how these all interact. The state space of x-risk trajectories could form a cornerstone in this larger framework.

State Space Model of X-Risk Trajectories

As humanity develops advanced technology we want to move closer to beneficial outcomes, stay away from harmful ones, and better understand where we are. In particular, we want to understand where we are in terms of existential risk from advanced technology. Formalization and visual graphs can help us think about this more clearly and effectively. Graphs make relationships between variables clearer, allowing one to take in a lot of information in a single glance, and affording us to discern various patterns, like derivatives, oscillations, and trends. The state space model formalizes x-risk trajectories and provides us the power of visual graphs. To construct the state space, we will define x-risk trajectories geometrically. Thus, we need to formalize position, distance, and trajectories in terms of existential risk, and we need to incorporate uncertainty.

In the state space model of existential risk, current and future states of the world are points, possible progressions through time are trajectories through these points, and stably good or bad futures happen when any coordinate drops below zero (i.e. absorbing states).

Stable futures are futures of either existential catastrophe or existential safety. Stably good futures (compare to Bostrom’s ‘OK outcomes’, as in [10]) are those where society has achieved enough wisdom and coordination to guarantee the future against existential risks and other dystopian outcomes, perhaps with the aid of Friendly AI (FAI). Stably bad futures (‘bad outcomes’) are those where existential catastrophe has occurred.

Trajectories illustration

While the framework presented in this article can be used to analyse any specific existential risk, or existential risks in general, in this article we will illustrate with a scenario where humanity may develop FAI or “unfriendly” AI (UFAI). For the simplest visualization of the state space, one can draw a two-dimensional coordinate system, and let the x-coordinates below 0 be the “UFAI” part of the space, and the y-coordinates below 0 be the “FAI” part of the space. The world will then take positions in the upper right quadrant, with the x-coordinate being the world’s distance from a UFAI future, and the y-coordinate being the world’s distance from an FAI future. As time progresses, the world will probably move closer to one or both of these futures, tracing out a trajectory through this space. By understanding the movement of the world through this space, one can understand which future we are headed for (in this example, whether we look likely to end up in the FAI future or the UFAI future). ^[1]

Part of the advantage of this approach is as a cognitive aid to facilitate better communication of possible scenarios between existential risk researchers. For instance, there might be sensible (but implicit) variance in their estimates of current distance to UFAI and/or FAI (perhaps because one researcher thinks UFAI will be much easier to build than the other researcher thinks). In Fig 1, we show this as two possible start positions. They might also agree about how an intervention represented by the black line (perhaps funding a particular AI safety research agenda) would affect trajectories. But because they disagree on the world's current position in the space, they'll disagree on whether that intervention is enough. (Fig 1 indicates the intervention will not have had a chance to substantially bend the trajectory before UFAI is reached, if the world is in Start 1.) If the researchers can both see the AGI trajectory space like this, they can identify their precise point of disagreement, and thus have more chance of productively learning from and resolving their disagreement.

Essentially, the state space is a way to describe where the world is, where we want to go, and what we want to steer well clear of. We proceed by outlining a number of key considerations for the state space.

Trajectories of the world

We want to understand what outcomes are possible and likely in the future. We do this by projecting from past trends into the future and by building an understanding of the system’s dynamics. A trajectory is a path in the state space between points in the past, present, or future that may move society closer or farther away from certain outcomes.

As time progresses, the state of the world changes, and thus the position of the world in the space changes. As technology becomes more advanced, more extreme outcomes become possible, and the world moves closer to both the possibilities of existential catastrophe and existential safety. Given the plausible trajectories of the world, one can work out probabilities for the eventual occurrence of each stably good or bad future.

In the example, by drawing trajectories of the movement of the world, one can study how the world is or could be moving in relation to the FAI and UFAI futures. As a simplified illustration of how a trajectory may be changed, say society would decide to stop developing generic AGI capability and would focus purely and effectively on FAI; this may change the trajectory to a straight line moving towards FAI, assuming it’s possible to decouple progress on the two axes.

Defining distance more exactly

We need a notion of distance that is conducive to measurement, prediction, and action. The definition of distance is central to the meaning of the space and determines much of the mechanics of the model. The way we’ve described the state space thus far leaves it compatible with many different types of distance. However, in order to fully specify the space, one needs to choose one distance. Possible interesting choices for defining distance include: work hours, bits of information, computational time, actual time, and probabilistic distance. Further, building an ensemble of different metrics would allow us to make stronger assessments.

Uncertainty over positions and trajectories

We need to take into account uncertainty to properly reflect what we know about the world and the future. There is much uncertainty about how likely we are to reach various societal outcomes, whether due to uncertainty about our current state, about our trajectory, or about the impact certain interventions would have. By effectively incorporating uncertainties into our model, we can more clearly see what we don’t know (and what we should investigate further), draw more accurate conclusions, and make better plans.

Taking uncertainty into account means having probability distributions over positions and over the shape and speed of trajectories. Technically speaking, trajectories are represented as probability distributions that vary with time and that are roughly speaking determined by taking the initial probability distribution and repeatedly applying a transition matrix to it. This is a stochastic process that would look something like a random walk (such as in brownian motion) drifting in a particular direction. (We’d also ideally have a probability function over both the shape and speed of trajectories in a way that doesn’t treat the shape and speed as conditionally independent.) Using the example in the earlier diagram, we don’t know the timelines for FAI or UFAI with certainty. It may be 5 or 50 years, or more, before one of the stable futures is achieved. Perhaps society will adapt and self-correct towards developing the requisite safe and beneficial AGI technology, or perhaps safety and preparation will be neglected. These uncertainties and events can be modeled in the space, with trajectories passing through positions with various probabilities.

Defining how to calculate speed over a trajectory

We need a method to calculate the speed of the trajectories. The speed of a trajectory is assumed to primarily be determined by the rate of technological development. Timeline models allow us to determine the speed of trajectories. The connection between state space coordinates and time is in reality non trivial and possibly sort of jumpy (unless smoothed out by uncertainty). For example, an important cause of a trajectory might be a few discrete insights, such that the world has sudden, big lurches along that trajectory at the moments when those insights are reached, but moves slowly at other times.

Trajectories as defined in the coordinate space do not have time directly associated with them from their shapes; instead, that is an implicit quantity. That is, distance in the coordinate space does not correspond uniformly to distance in time. The same trajectory can take more or less time to go through.

In general, there are several ways to calculate the expected time until we have a certain technology. Expert surveys, trend extrapolation, and simulation are three useful tools that can be used for this purpose.

Extensions

We see many ways to extend this research:

Shaping trajectories: Extending the modeling with systematization (starting by mapping interventions and constructing strategies) and visualization of interventions to help in better understanding how to shape trajectories in order to provide new ideas for interventions.
Specialized versions: Making specialized versions of the space for each particular existential risk (such as AI risk, biorisk, nuclear war, etc.).
Trajectories and time: Further examining the relationships between the trajectories and time. Convergence has one mathematical model for AI timelines, and there is a family of approaches, each valid in certain circumstances. These can be characterized, simulated, and verified, and could help inspire interventions.
Measurability: Further increasing the measurability of our position and knowledge of the state space dynamics. We don’t know exactly where the world is in coordinates, how we are moving, or how fast we’re going, and we want to refine measurement to be less of an ad hoc process and more like engineering. For example, perhaps we can determine how far we are away from AGI by projecting the needed computer power or the rate at which we’re having AGI insights. Proper measurement here is going to be subtle, because we cannot sample, and don’t entirely know what the possible AGI designs are. But by measuring as best we can, we can verify dynamics and positions more accurately and so fine tune our strategies.
Exploring geometries: Exploring variations on the geometry, such as with different basis spaces or parametrizations of the state space, could provide us with new perspectives. Maybe there are invariants, symmetries, boundaries, or non-trivial topologies that can be modelled.
Larger spaces: Immerse the space in larger ones, like the full geometry of Turing Machines, or state spaces that encode things like social dynamics, resource progress, or the laws of physics. This would allow us to track more dynamics or to see things more accurately.
Resource distribution: Using the state space model as part of a greater system that helps determine how to distribute resources. How does one build a system that handles the explore vs exploit tradeoff properly, allows delegation and specialization, evaluates teams and projects clearly, self-improves in a reweighting way, allows interventions with different completion dates to be compared, incorporates unknown unknowns and hard to reverse engineer data rich intuitions cleanly, and doesn’t suffer from decay, Goodhart’s law, or the principal-agent problem? Each of these questions needs investigation.
Trajectories simulator: Implement a software platform for the trajectories model to allow exploration, learning, and experimentation using different scenarios and ways of modeling

Conclusion

The state space model of x-risk trajectories can help us think about and visualise trajectories of the world, in relation to existential risk, in a more precise and structured manner. The state space model is intended to be a stepping stone to further formalizations and “mechanizations” of strategic matters on reducing existential risk. We think this kind of mindset is rarely applied to such “strategic” questions, despite potentially being very useful for them. There are of course drawbacks to this kind of approach as well; in particular, it won’t do much good if it isn’t calibrated or combined with more applied work. We see the potential to build a synergistic set of tools to generate, map out, test, and explore timelines, interventions, and outcomes, and to show how these all interact. We intend to follow this article up with other formalizations, insights, and project ideas that seem promising. This is a work in progress; thoughts and comments are most welcome.

We wish to thank Michael Aird, Andrew Stewart, Jesse Liptrap, Ozzie Gooen, Shri Samson, and Siebe Rozendal for their many helpful comments and suggestions on this document.

Bibliography

[1]. Grace, K., Salvatier, J., Dafoe, A., Zhang, B., & Evans, O. (2017). When Will AI Exceed Human Performance? Evidence from AI Experts, 1–21. https://arxiv.org/abs/1705.08807
[2]. https://nickbostrom.com/papers/survey.pdf
[3]. https://www.eff.org/ai/metrics
[4]. http://theuncertainfuture.com
[5]. OpenPhil: What Do We Know about AI Timelines? https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/ai-timelines
[6]. Barret, A. M., & Baum, S. D. (2016). A model of pathways to artificial superintelligence catastrophe for risk and decision analysis. Journal of Experimental & Theoretical Artificial Intelligence, (789541031), 1–21. https://doi.org/10.1080/09528130701472416
[7]. http://aleph.se/andart2/math/adding-cooks-to-the-broth/
[8]. Cotton-Barratt, O., Daniel M., Sandberg A. (2020). Defence in Depth Against Human Extinction: Prevention, Response, Resilience, and Why They All Matter. https://onlinelibrary.wiley.com/doi/full/10.1111/1758-5899.12786
[9]. Seth D Baum, et al. (2019). Long-Term Trajectories of Human Civilization. http://gcrinstitute.org/papers/trajectories.pdf
[10]. Bostrom, N. (2013). Existential risk prevention as global priority. Global Policy, 4(1), 15–31. https://doi.org/10.1111/1758-5899.12002

How does the state space of x-risk trajectories model compare to the trajectory visualizations in [9]? The axes are almost completely different. Their trajectories graphs have an axis for time; the state space doesn’t. Their graphs have an axis for population size; the state space doesn’t. In the state space, each axis represents a stably bad or a stably good future. Though, in the visualizations in [9], hitting the x-axis represents extinction, which maps somewhat to hitting the axis of a stably bad future in the trajectories model. The visualizations in [9] illustrate valuable ideas but they seem to be less about choices or interventions than the state space model is. ↩︎

SiebeFeb 8 20206

I think this article very nicely undercuts the following common sense research ethics:

If your research advances the field more towards a positive outcome than it moves the field towards a negative outcome, then your research is net-positive

Whether research is net-positive depends on the current field's position relative to both outcomes (assuming that when either outcome is achieved, the other can no longer be achieved). It replaces this with another heuristic:

To make a net-positive impact with research, move the field closer to the positive outcome than the negative outcome with a ratio of at least the same ratio as distance-to-positive : distance-to-negative.

If we add uncertainty to the mix, we could calculate how risk averse we should be (where risk aversion should be larger when the research step is larger, as the small projects probably carry much less risk to accidentally make a big step towards FAI).

The ratio and risk-aversion could lead to some semi-concrete technology policy. For example, if the distance to FAI and UAI is (100, 10), technology policy could prevent funding any projects that either have a distance-ratio (for lack of a better term) lower than 10 or that have a 1% or higher probability a taking a 10d step towards UAI.

Of course, the real issue is whether such a policy can be plausibly and cost-effectively enforced or not, especially given that there is competition with other regulatory areas (China/US/EU).

Without policy, the concepts can still be used for self-assessment. And when a researcher/inventor/sponsor assesses the risk-benefit profile of a technology themselves, they should discount for their own bias as well, because they are likely to have an overly optimistic view of their own project.

MichaelA🔸Feb 26 20201

Good points.

Also, this comment reminded of somewhat similar arguments in this older post by Justin (and Ozzie Gooen).

adamShimiFeb 10 20204

The geometric intuition underlying this post already proves useful for me!

Yesterday, while discussing with a friend why I want to change my research topic to AI Safety instead of what I currently do (distributed computing), my first intuition was that AI safety aims at shaping the future, while distributed computing is relatively agnostic about it. But a far better intuition comes when considering the vector along the current trajectory in state space, starting at the current position of the world, and whose direction and length capture the trajectory and the speed at which we follow it.

From this perspective, the difference between distributed computing/hardware/cloud computing research and AI safety research is obvious in terms of vector operations:

The former amounts to positive scaling of the vector, and thus makes us go along our current trajectory faster.
While the latter amounts to rotations (and maybe scaling, but it is a bit less relevant), which allows us to change our trajectory.

And since I am not sure we are heading in the right direction, I prefer to be able to change the trajectory (at least potentially).

David_KristofferssonFeb 10 20201

Happy to see you found it useful, Adam! Yes, general technological development corresponding to scaling of the vector is exactly the kind of intuition it's meant to carry.

adamShimiFeb 7 20203

As a tool for existential risk research, I feel like the graphical representation will indeed be useful in crystallizing the differences in hypotheses between researchers. It might even serves as a self-assessing tool, for checking quickly some of the consequences of one's own view.

But beyond the trajectories (and maybe specific distances), are you planning on representing the other elements you mention? Like the uncertainty or the speed along trajectories? I feel like the more details about an approach can be integrated into a simple graphical representation, the more this tool will serve to disentangle disagreement between researchers.

David_KristofferssonFeb 9 20202

But beyond the trajectories (and maybe specific distances), are you planning on representing the other elements you mention? Like the uncertainty or the speed along trajectories?

Thanks for your comment. Yes; the other elements, like uncertainty, would definitely be part of further work on the trajectories model.

nathanhbAug 25 20231

For what it's worth, my model of a path to safe AI looks like a narrow winding path along a ridge with deadly falls to either side:

Unfortunately, the deadly falls to either side have illusions projected onto them of shortcuts to power, wealth, and utility. I don't think there is any path which goes to safety without a long ways of immediate danger nearby. In this model, deliberately consistently optimizing for safety above all else during the dangerous stretch is the only way to make it through.

The danger zone is where the model is sufficiently powerful and agentic enough that a greedy shortsighted person could say to it, "Here is access to the internet. Make me lots of money." and this would result in a large stream of money pouring into their account. I think we're only a few years away from that point, and that the actions that safety researchers take in the meantime aren't going to change that. So, we need both safety research and governance, and carefully selecting disproportionately safety-accelerating research would be entirely irrelevant to the strategic landscape.

This is just my view, and I may be wrong, but I think it's worth pointing out that there's a chance that the idea of trying to do disproportionately safety-accelerating research is a distraction from strategically relevant action.

Effective Altruism Forum
EA Forum