How do AI timelines affect the urgency of working on AI safety?
It seems plausible that, if artificial general intelligence (AGI) will arrive soon, then we need to spend quickly on AI safety research. And if AGI is still a way off, we can spend more slowly. Are these positions justified? If we have a bunch of capital and we're deciding how quickly to spend it, do we care about AI timelines? Intuitively, it seems like the answer is yes. But is it possible to support this intuition with a mathematical model?
TLDR: Yes. Under plausible model assumptions, there is a direct relationship between AI timelines and how quickly we should spend on AI safety research.
Let's start with some simplifying assumptions. We want to distill reality into a simple framework while still capturing the key features of the world that we care about. We could do this in lots of different ways, but here's an attempt:
We start with some fixed amount of capital that we can choose how to spend. At some point in the future, an artificial general intelligence (AGI) will be developed. This AGI will either be friendly or unfriendly. If it's unfriendly, everyone dies. We don't know exactly when AI will be developed, but we have at least an expected timeline.
To ensure the AGI is friendly, we will need to do some amount of AI safety research, but we don't know exactly how much. Once per decade, we decide how much to spend on safety research. Any money we don't spend can be invested in the market. Then, after AGI emerges, if it's friendly, we can spend any leftover capital on whatever amazing things will presumably exist at that point.
(That's just a high-level explanation; I'm skipping over the mathy bits. Appendix A contains the full details.)
If we do enough research by the time AGI is developed, everything works out okay. If we don't do enough research, we go extinct. The objective is to choose the spending schedule that maximizes the welfare we get out of our remaining capital after AGI emerges. (If we go extinct, we get no welfare. To meet our objective, we need to spend money on preventing unfriendly AI.)
Philanthropists face this basic tradeoff:
- If we spend more now, we're more likely to get enough research done in time if AGI arrives soon.
- If we spend more later, we earn more return on our investments. That way, (a) we can do a greater total amount of research, and (b) we will have more money left over at the end to spend on good things.
If we run the math on this model, this is what it says to do:
- If AGI is very unlikely to emerge this decade, don't spend any money on research yet. Invest all our capital.
- Once we get close to the median estimated date of AGI (to within a few decades), start spending around 30% of our capital per decade / 3% per year.
- In the decades after the median date of AGI (assuming AGI hasn't emerged yet), reduce the spending rate.
The model's optimal spending rate varies based on the median date of AGI:
- AI Impacts' review of AI timeline surveys found that survey respondents estimated a 50% chance of AGI by around 2050. Given that timeline, the model recommends a peak spending rate of 3% per year.
- For a much longer median timeline of 100 years, the model suggests spending nothing for the first 50 years, then spending around 1% per year after that.
- If we assume a very short timeline of only one decade, the model says to spend 5% per year for the first decade, and 1–2% per year after that if AGI still hasn't appeared.
Obviously this is a toy model that makes lots of unrealistic simplifications. For instance, you can't instantly cause more research to happen by throwing more money at it. But the model corroborates the intuitive notion that if AI timelines are shorter, then we should spend more quickly.
I have a hard time trusting this intuition on its own. The question of how much to spend now vs. later is really complicated: it's affected by the exponential growth of investments, the decay in expected value of future worlds where extinction is a possibility, and the complex relationship between research spending and productivity. Humans don't have good intuitions around that sort of thing. A lot of times, when you do the math, you realize that your seemingly-reasonable intuition was totally off base. So even though this model has many limitations, it confirms that the intuition is not a mistake arising from a failure to comprehend exponential growth. The intuition could still be wrong, but if so, it's not because of a basic math error.
It's also noteworthy that under this model, even with an aggressive AGI timeline, the optimal spending rate doesn't exceed 5% per year.
So, do short timelines mean we should spend more quickly? Yes. Maybe. If this model is correct. Which it's not. But even if it's wrong, it might still be correct in the ways that matter.
Python source code is available here. It's hard to properly describe the model's output in words, so you might find it more illustrative to download the source code and play with it.
Appendix A: Some properties of this model
This model embeds all of the assumptions listed below, any of which could easily be wrong. This list does not cover every assumption, just the explicit ones plus all the implicit ones I could think of in five minutes.
- We represent all donors supporting AI safety research. We can collectively decide on the optimal spending rate.
- We decide how much to spend once per decade. (This makes the problem tractable. If we could spend on a yearly basis, the model would have too many independent variables for Python's optimization library to handle.)
- We only care about spending decisions for the next two centuries. Ignore anything that happens after that. (Again, this is to make the problem computationally tractable.)
- Prior to the emergence of AGI, we don't want to spend money on anything other than AI safety research.
- After AGI is developed, we get an amount of utility equal to the logarithm of our remaining capital.
- It's possible to instantly convert money into research at any scale.
- The date of AGI follows a log-normal distribution. A log-normal distribution has some relevant properties:
- It's fat-tailed, which means the longer we go without developing AGI, the more additional time we expect it to take.
- Unlike, say, an exponential distribution, a log-normal distribution allows for a non-trivial probability that our median estimate is off by an order of magnitude. If our median timeline is 30 years, then we might still think it's plausible that AGI could take 300 years. (Exactly how plausible depends on what standard deviation we use.)
- On the other hand, unlike, say, a Pareto distribution, our probability quickly diminishes as we move out by more orders of magnitude. For example, if we estimate there's a 50% chance of AGI within 30 years and a 95% chance within 300 years, that implies an extremely confident 99.995% chance of AGI by the year 5000.
- Research spending required to avert catastrophe follows a log-normal distribution, so it also has the properties listed above.
This model has six input variables:
- how much capital we start with
- the investment rate of return
- median research spending required to make AGI safe
- standard deviation of research spending required
- median date of AGI
- standard deviation of date of AGI
Appendix B: Alternative models
Tom Sittler's "The expected value of the long-term future" presents some models that treat x-risk as an ongoing concern that can be reduced by deliberate effort (I will refer to these as "periodic models") rather than a single one-off event that occurs at an unknown time. I find his models more realistic, plus they're more similar to the Ramsey model that comes up a lot in economics literature.
About a year prior to writing this essay, I spent quite some time working with periodic models like the ones Sittler gives. The problem with them is that they're much harder to solve. I couldn't find optimal solutions for any of them, and I couldn't even find approximate solutions with a convex optimization program.
As a way around this, I tried restricting the decision to only two points in time: now or one century from now. This allowed me to preserve the basic structure these models while making it possible to find an optimal solution. But this restriction is highly limiting, which means the models' optimal solutions tell us little about what we should do in reality.
The new model I presented above has some nice properties. To my knowledge, no previous model achieved all of these:
- Civilization has a nonzero probability of going extinct in any particular year, but the probability of survival does not quickly approach zero.
- We must decide how much of our remaining budget to spend in each period. We cannot reduce our decision to a binary "fund x-risk or don't".
- The optimal spending schedule is feasible to find.
Of the four models that Sittler presented:
- "Trivial model" (section 3.1.2) has property 3, but not 1 or 2;
- "Constant risk, temporary effects" (section 3.1) has property 2, but not 1 or 3;
- "Variable risk, temporary effects" (section 3.2) and "Constant risk, lasting effects" (section 3.3) have properties 1 and 2, but not 3.
The models in my previous essay are similar to Sittler's. They gain solvability (property 3) at the expense of periodic decisionmaking (property 2).
However, the model in this essay does make some fairly specific assumptions, as discussed in Appendix A. Perhaps the most important assumptions are (a) there is only a single potential extinction event and (b) the long-term value of the future is bounded.
In an earlier draft of this essay, my model did not assign value to any capital left over after AGI emerges. It simply tried to minimize the probability of extinction. This older model came to the same basic conclusion—namely, shorter timelines mean we should spend faster. (The difference was that it spent a much larger percentage of the budget each decade, and under some conditions it would spend 100% of the budget at a certain point.) But I was concerned that the older model trivialized the question by assuming we could not spend our money on anything but AI safety research—obviously if that's the only thing we can spend money on, then we should spend lots of money on it. The new model allows for spending money on other things but still reaches the same qualitative conclusion, which is a stronger result.
None of these models is perfect, but some are more realistic than others. Which one is more realistic largely depends on an important question: what happens after we develop AGI? For example:
- Will the AGI behave like a better version of a human, allowing us to do all the stuff we would have done anyway, but at a faster rate? Or will it be so radically better as to make the world unrecognizable?
- Will the AGI be able to prevent all future x-risks, or will we still need to worry about the possibility of extinction?
- Does it matter how much capital we have? If we invest more now, that might give the AGI a useful headstart. But the AGI might so radically change the economy that the state of the economy prior to AGI won't matter, or altruistic capital might become (relatively) useless.
The answers to these questions could meaningfully change how much we should be spending on AI safety (or on other forms of x-risk mitigation).
It's at least plausible that the world economy allocates far too little to x-risk, therefore thoughtful altruists should spend their entire budgets on x-risk reduction. But the same could be argued for other effective and neglected causes such as farm animal welfare, so you have to decide how to prioritize between neglected causes. And that doesn't get around the problem of determining the optimal spending schedule: even if you should spend your entire budget on x-risk, it doesn't follow that you should spend the whole budget now.
Unless, of course, my model contains the same math error. Which is entirely possible. ↩︎
We could parameterize the distribution using the mean rather than the median, but I find medians a little more intuitive when working with log-normal distributions. ↩︎
Published 2018-01-02. Accessed 2021-08-02. ↩︎
Last year I spent a good 200 hours trying to figure out how to model this problem. Then, after not working on it for a year, I suddenly get an idea and write up a working program in an hour. Funny how that works. ↩︎
On the old model, the only downside to spending now rather than later was that you lose out on investment returns, so you can spend less total money. When investments could earn a relatively low return and timelines were short, the model would propose spending a little each decade and then spending the entire remaining budget at a specific point, usually on or shortly before the decade of peak risk. When investments could earn a high return or timelines were long, the model would never spend the whole budget at once, preferring to always save some for later. ↩︎