[I wrote this back in August, but it never felt finished enough to post. I don't have any plans to come back to this, so I'm posting it here so it doesn't just gather dust.]
This is the third post in a three-part series summarising, critiquing, and suggesting extensions of Leopold Aschenbrenner's paper Existential Risk and Growth. I summarised the model and results from the paper in the first post, and Ben critiqued the model in the second post. The series is written by me and Ben Snodin, with input from Phil Trammell. Any errors in this post are mine.
The global factors contributing to existential risk form a complex system. One way of understanding complex systems is by building a granular gearsy understanding of the system. For example, forecasting computer hardware trends and their effects on AI capabilities, thinking about how risks from advanced biotechnology might shift from sophisticated to unsophisticated actos, or speculating on the safest order for specific technologies to be developed. Currently, most work on understanding existential risk makes granular and specific arguments about particular technologies or political situations.
Another approach to understanding complex systems is to describe them with simple, abstract models that capture the most important behavior without being attached to specific mechanisms. These models can give us a different flavor of insight about existential risk. An example of this approach besides Existential Risk and Growth is Modelling the Human Trajectory. For some systems this abstract approach works, and for some it doesn’t. We don’t know yet how well it works for existential risk. If it does, we could get insights that were more robust, and more general than we currently can.
Whichever your preferred approach, it might be that a mixed strategy, where both approaches are used, gives more robust conclusions, particularly where the approaches converge. Exploring both approaches in parallel also has high information value given the high stakes and our uncertainty about which approach might be more fruitful. To help others build the new field, this post presents some promising directions for further research.
Section 2 posits some plausibly useful extensions and variations. Section 3 discusses some extensions and variations that didn't seem promising to us. These judgements are our current best guesses and we wouldn't be at all surprised if one of the 'less promising' extensions actually turned out to be very promising.
This section presents some extensions and variations to try. The general idea here is to work out what the most important factors contributing to existential risk might be, and then find a way to model them which is simple enough to work with, but preserves the most important relationships between variables.
An asterisk* indicates I'm particularly excited about the idea.
The model assumes exogenous exponential population growth. How do the results change if this is replaced by:
There are existing models of learning by doing where technology is discovered as a side effect of production. What happens if we replace the existing production function for safety technologies with a model where:
One might argue that AI risk is lower when growth is slower, because the slower growth gives safety researchers more time to prepare for future technological developments. Note, however, that this assumes growth preferentially speeds up capabilities work over safety work. This would run counter to Kuznets curves in other domains, and also doesn’t seem right empirically. Funding for AI safety seems to come from richer people (eg. Open Phil) than funding for AI capabilities (which comes indirectly from paying for, or advertisements on, consumption goods like Google search).
All that said, one might maintain (as done here) that there is a sense in which AI development parallelizes more than AI safety work, and that this overturns the conclusion above. An interpretation in the framework of the model would be that perhaps the lambda on larger than the lambda on . So it would be interesting to see whether plausible variation in lambda across sectors changes the qualitative conclusions above, rather than merely the rate at which society shifts labor between the sectors.
There are probably lots of useful AI-specific extensions or variations to the paper that I don't have the expertise to think of. For example, considering some of the following could be useful:
I'm interested to see approaches that are more granular than this model, but still take a high-level approach. Bridging the gap between this abstract model, and other granular arguments about AI could be fruitful. I've presented two ways you could model AI in the framework of Existential Risk and Growth below.
One interpretation of the exogenous exponential population growth in the model is that, though human population growth will stagnate, new robot labor will allow growth to continue. This would imply trying an extension where:
This could capture the tradeoffs between researching more advanced AI systems; deploying existing AI systems to work for us; and the risks associated with both of these. As well as catastrophic events, this model might also predict positive singularity-type events.
One way of modelling this quantitatively would be by introducing a population sector analogous to the consumption and safety sectors, so that the production functions for technologies are given by:
Where is level of population technology, and is the number of scientists researching population technology. Then, the (endogenous) growth comes from investment in labor-reproduction:
With this variation, there would be six kinds of labor - (safety, consumption, population)(research, production). The resource constraints would be:
At low levels of population-technology, , we can interpret population-producing labor, , as resources spent on standard biological human reproduction (including childcare and schooling). At high levels of population technology, we can interpret as labor spent on creating artificial workers (digital or physical). One obvious problem with this approach is that, as the artificial population grows, the consumption per capita decreases, even though the consumption per human doesn't decrease. We might not want to model artificial labor as caring about it's own consumption, but in the current model artificial labor would be included in the representative agent.
What we're really interested in is the consumption per amount of human labor, not the consumption per amount of total labor. To model this, we can define total population as the sum of human labor , and machine labor, . Then the consumption per human is rather than .
Note that there are other ways to model this general idea.
Following Hanson (2001) we might want production functions that model computer hardware specifically:
Where is total computer hardware at and it can be divided between consumption production and safety production. The allocation of hardware could be broken down further between all of the four kinds of labor in (safety, consumption)×(research, production)). There's a choice here for the production function for hardware - one option is to follow Hanson, but there might be a simpler approach that works better.
The model assumes the centralised optimal impatient allocation.
This would also shed light on the relative value of improving international coordination compared to the four kinds of labor or lowering the discount rate.
In the model, a representative agent must instantly allocate the endowment good, labor, between four types of work.
Agents are assumed to know the values of and .
Agents also are assumed to know the allocation of labor between the four sectors, and the resulting production of consumption and safety technologies and goods. In the centralized optimal allocation, this would probably be a fine assumption. But even then, complete knowledge of and might not be.
However, this particular question might be intractable. One obvious way to approach the question would be to get historical data on the size of the consumption sector, the size of the safety sector, and the chance of anthropogenic existential catastrophe. The second seems very hard, and the third close to impossible.
Only anthropogenic risks are considered. What happens if we include some natural hazard rate (either constant, or decreasing in population) and either extend the existing safety sector so that it mitigates both natural and anthropic risk or add a natural-safety sector distinct from the anthropic-safety sector? This captures things like asteroid tracking and deflection,solar flare preparation, natural pandemic preparedness etc. Synder-Beattie et al. (2019) find that natural risks are negligible compared to potential anthropogenic extinction risks, which makes this extension seems less promising.
Existing literature models overlapping generations, which might seem at first glance to be a useful extension to the model - it seems intuitively plausible that there could be some effect from agents considering welfare over their own finite lifetimes. However, we can already interpret the model’s infinitely-lived selfish agent with exponentially discounted welfare as a sequence of finitely-lived partially altruistic agents who discount the welfare of their descendants, so this extension seems less insightful. Indeed, in reality, we can see how the preferences of overlapping generations are actually aggregated. The overlapping generations form governments which exponentially discount future welfare in their analyses - just as if there was a single infinitely-lived agent exponentially discounting their own welfare
Recall that consumption goods and safety goods have analogous production functions:
Similarly, consumption technologies, , and safety technologies, , have analogous production functions:
It might seem like an unnecessarily strong assumption to give each sector the same parameters , , and . But the key results of the model don't depend on these values being the same between sectors, so the assumption doesn't seem too problematic, and relaxing the assumption seems less promising.
One way to model AI that wouldn’t would be as a general purpose technology that can affect production in both the consumption and the safety sector. This technology would have the same production function as the other two sectors:
And the new production functions for consumption and safety goods would depend on the level of general purpose technology, :
It seems to me that would just increase the effective scale of the economy, so would probably need to be tweaked to give new insights.
It’s assumed that risk depends only on instantaneous aggregate production in the consumption and safety sectors. What would happen if risk depended on (some combination of):
The reason this seem potentially less promising is that the 'time of perils' that emerges in 'Existential Risk and Growth' already captures the useful part of hingeyness, - a philanthropist can have an outsized effect by spending in this time.