How do AI timelines affect the urgency of working on AI safety?
It seems plausible that, if artificial general intelligence (AGI) will arrive soon, then we need to spend quickly on AI safety research. And if AGI is still a way off, we can spend more slowly. Are these positions justified? If we have a bunch of capital and we're deciding how quickly to spend it, do we care about AI timelines? Intuitively, it seems like the answer is yes. But is it possible to support this intuition with a mathematical model?
TLDR: Yes. Under plausible model assumptions, there is a direct relationship between AI timelines and how quickly we should spend on AI safety research.
Let's start with some simplifying assumptions. We want to distill reality into a simple framework while still capturing the key features of the world that we care about. We could do this in lots of different ways, but here's an attempt:
We start with some fixed amount of capital that we can choose how to spend. At some point in the future, an artificial general intelligence (AGI) will be developed. This AGI will either be friendly or unfriendly. If it's unfriendly, everyone dies. We don't know exactly when AI will be developed, but we have at least an expected timeline.
To ensure the AGI is friendly, we will need to do some amount of AI safety research, but we don't know exactly how much. Once per decade, we decide how much to spend on safety research. Any money we don't spend can be invested in the market. Then, after AGI emerges, if it's friendly, we can spend any leftover capital on whatever amazing things will presumably exist at that point.
(That's just a high-level explanation; I'm skipping over the mathy bits. Appendix A contains the full details.)
If we do enough research by the time AGI is developed, everything works out okay. If we don't do enough research, we go extinct. The objective is to choose the spending schedule that maximizes the welfare we get out of our remaining capital after AGI emerges. (If we go extinct, we get no welfare. To meet our objective, we need to spend money on preventing unfriendly AI.)
Philanthropists face this basic tradeoff:
- If we spend more now, we're more likely to get enough research done in time if AGI arrives soon.
- If we spend more later, we earn more return on our investments. That way, (a) we can do a greater total amount of research, and (b) we will have more money left over at the end to spend on good things.
If we run the math on this model, this is what it says to do:
- If AGI is very unlikely to emerge this decade, don't spend any money on research yet. Invest all our capital.
- Once we get close to the median estimated date of AGI (to within a few decades), start spending around 30% of our capital per decade / 3% per year.
- In the decades after the median date of AGI (assuming AGI hasn't emerged yet), reduce the spending rate.
The model's optimal spending rate varies based on the median date of AGI:
- AI Impacts' review of AI timeline surveys found that survey respondents estimated a 50% chance of AGI by around 2050. Given that timeline, the model recommends a peak spending rate of 3% per year.
- For a much longer median timeline of 100 years, the model suggests spending nothing for the first 50 years, then spending around 1% per year after that.
- If we assume a very short timeline of only one decade, the model says to spend 5% per year for the first decade, and 1–2% per year after that if AGI still hasn't appeared.
Obviously this is a toy model that makes lots of unrealistic simplifications. For instance, you can't instantly cause more research to happen by throwing more money at it. But the model corroborates the intuitive notion that if AI timelines are shorter, then we should spend more quickly.
I have a hard time trusting this intuition on its own. The question of how much to spend now vs. later is really complicated: it's affected by the exponential growth of investments, the decay in expected value of future worlds where extinction is a possibility, and the complex relationship between research spending and productivity. Humans don't have good intuitions around that sort of thing. A lot of times, when you do the math, you realize that your seemingly-reasonable intuition was totally off base. So even though this model has many limitations, it confirms that the intuition is not a mistake arising from a failure to comprehend exponential growth. The intuition could still be wrong, but if so, it's not because of a basic math error.
It's also noteworthy that under this model, even with an aggressive AGI timeline, the optimal spending rate doesn't exceed 5% per year.
So, do short timelines mean we should spend more quickly? Yes. Maybe. If this model is correct. Which it's not. But even if it's wrong, it might still be correct in the ways that matter.
Python source code is available here. It's hard to properly describe the model's output in words, so you might find it more illustrative to download the source code and play with it.
Appendix A: Some properties of this model
This model embeds all of the assumptions listed below, any of which could easily be wrong. This list does not cover every assumption, just the explicit ones plus all the implicit ones I could think of in five minutes.
- We represent all donors supporting AI safety research. We can collectively decide on the optimal spending rate.
- We decide how much to spend once per decade. (This makes the problem tractable. If we could spend on a yearly basis, the model would have too many independent variables for Python's optimization library to handle.)
- We only care about spending decisions for the next two centuries. Ignore anything that happens after that. (Again, this is to make the problem computationally tractable.)
- Prior to the emergence of AGI, we don't want to spend money on anything other than AI safety research.
- After AGI is developed, we get an amount of utility equal to the logarithm of our remaining capital.
- It's possible to instantly convert money into research at any scale.
- The date of AGI follows a log-normal distribution. A log-normal distribution has some relevant properties:
- It's fat-tailed, which means the longer we go without developing AGI, the more additional time we expect it to take.
- Unlike, say, an exponential distribution, a log-normal distribution allows for a non-trivial probability that our median estimate is off by an order of magnitude. If our median timeline is 30 years, then we might still think it's plausible that AGI could take 300 years. (Exactly how plausible depends on what standard deviation we use.)
- On the other hand, unlike, say, a Pareto distribution, our probability quickly diminishes as we move out by more orders of magnitude. For example, if we estimate there's a 50% chance of AGI within 30 years and a 95% chance within 300 years, that implies an extremely confident 99.995% chance of AGI by the year 5000.
- Research spending required to avert catastrophe follows a log-normal distribution, so it also has the properties listed above.
This model has six input variables:
- how much capital we start with
- the investment rate of return
- median research spending required to make AGI safe
- standard deviation of research spending required
- median date of AGI
- standard deviation of date of AGI
Appendix B: Alternative models
Tom Sittler's "The expected value of the long-term future" presents some models that treat x-risk as an ongoing concern that can be reduced by deliberate effort (I will refer to these as "periodic models") rather than a single one-off event that occurs at an unknown time. I find his models more realistic, plus they're more similar to the Ramsey model that comes up a lot in economics literature.
About a year prior to writing this essay, I spent quite some time working with periodic models like the ones Sittler gives. The problem with them is that they're much harder to solve. I couldn't find optimal solutions for any of them, and I couldn't even find approximate solutions with a convex optimization program.
As a way around this, I tried restricting the decision to only two points in time: now or one century from now. This allowed me to preserve the basic structure these models while making it possible to find an optimal solution. But this restriction is highly limiting, which means the models' optimal solutions tell us little about what we should do in reality.
The new model I presented above has some nice properties. To my knowledge, no previous model achieved all of these:
- Civilization has a nonzero probability of going extinct in any particular year, but the probability of survival does not quickly approach zero.
- We must decide how much of our remaining budget to spend in each period. We cannot reduce our decision to a binary "fund x-risk or don't".
- The optimal spending schedule is feasible to find.
Of the four models that Sittler presented:
- "Trivial model" (section 3.1.2) has property 3, but not 1 or 2;
- "Constant risk, temporary effects" (section 3.1) has property 2, but not 1 or 3;
- "Variable risk, temporary effects" (section 3.2) and "Constant risk, lasting effects" (section 3.3) have properties 1 and 2, but not 3.
The models in my previous essay are similar to Sittler's. They gain solvability (property 3) at the expense of periodic decisionmaking (property 2).
However, the model in this essay does make some fairly specific assumptions, as discussed in Appendix A. Perhaps the most important assumptions are (a) there is only a single potential extinction event and (b) the long-term value of the future is bounded.
In an earlier draft of this essay, my model did not assign value to any capital left over after AGI emerges. It simply tried to minimize the probability of extinction. This older model came to the same basic conclusion—namely, shorter timelines mean we should spend faster. (The difference was that it spent a much larger percentage of the budget each decade, and under some conditions it would spend 100% of the budget at a certain point.) But I was concerned that the older model trivialized the question by assuming we could not spend our money on anything but AI safety research—obviously if that's the only thing we can spend money on, then we should spend lots of money on it. The new model allows for spending money on other things but still reaches the same qualitative conclusion, which is a stronger result.
None of these models is perfect, but some are more realistic than others. Which one is more realistic largely depends on an important question: what happens after we develop AGI? For example:
- Will the AGI behave like a better version of a human, allowing us to do all the stuff we would have done anyway, but at a faster rate? Or will it be so radically better as to make the world unrecognizable?
- Will the AGI be able to prevent all future x-risks, or will we still need to worry about the possibility of extinction?
- Does it matter how much capital we have? If we invest more now, that might give the AGI a useful headstart. But the AGI might so radically change the economy that the state of the economy prior to AGI won't matter, or altruistic capital might become (relatively) useless.
The answers to these questions could meaningfully change how much we should be spending on AI safety (or on other forms of x-risk mitigation).
It's at least plausible that the world economy allocates far too little to x-risk, therefore thoughtful altruists should spend their entire budgets on x-risk reduction. But the same could be argued for other effective and neglected causes such as farm animal welfare, so you have to decide how to prioritize between neglected causes. And that doesn't get around the problem of determining the optimal spending schedule: even if you should spend your entire budget on x-risk, it doesn't follow that you should spend the whole budget now.
Unless, of course, my model contains the same math error. Which is entirely possible. ↩︎
We could parameterize the distribution using the mean rather than the median, but I find medians a little more intuitive when working with log-normal distributions. ↩︎
Published 2018-01-02. Accessed 2021-08-02. ↩︎
Last year I spent a good 200 hours trying to figure out how to model this problem. Then, after not working on it for a year, I suddenly get an idea and write up a working program in an hour. Funny how that works. ↩︎
On the old model, the only downside to spending now rather than later was that you lose out on investment returns, so you can spend less total money. When investments could earn a relatively low return and timelines were short, the model would propose spending a little each decade and then spending the entire remaining budget at a specific point, usually on or shortly before the decade of peak risk. When investments could earn a high return or timelines were long, the model would never spend the whole budget at once, preferring to always save some for later. ↩︎
This doesn't seem to me like an appropriate assumption to make if analyzing this from an altruistic perspective.
If friendly AGI is developed, and assuming it can handle all future x-risks for us, then don't we just get utility equal to our cosmic endowment? We get a future of astronomical value. The amount of capital we have leftover affects how quickly we can begin colonization by a small amount, but isn't that a pretty trivial effect compared to the probability of actually getting friendly AGI?
It seems to me then that we should roughly be following Bostom's maxipok rule to "Maximise the probability of an ‘OK outcome’, where an OK outcome is any outcome that avoids existential catastrophe." In toy model terms, this would be maximizing the probability of friendly AGI without regard for how much capital is leftover after AGI.
Am I correct that that's not what your model is doing? If so, why do you think doing what your model is doing (with the 5th assumption quoted above) is more appropriate?
(With your assumption, it seems the model will say that we should spend less on AGI than we would with the assumption I'm saying is more appropriate to make (maximize probability of friendly AGI), since your model will accept a marginally higher probability of x-risk in exchange for a sufficiently higher amount of capital remaining after friendly AGI.)
I don't have any well-formed opinions about what the post-AGI world will look like, so I don't think it's obvious that logarithmic utility of capital is more appropriate than simply trying to maximize the probability of a good outcome. The way you describe it is how my model worked originally, but I changed it because I believe the new model gives a stronger result even if the model is not necessarily more accurate. I wrote in a paragraph buried in Appendix B:
Thanks, I only read through Appendix A.
It seems to me that your concern "that the older model trivialized the question by assuming we could not spend our money on anything but AI safety research" could be addressed by dividing existing longtermist or EA capital up into one portion to be spent on AI safety and one portion to be spent on other causes. Each capital stock can then be spent at independent rates according to the value of availabkr giving opportunities in their respective cause areas.
Your model already makes the assumption:
It just seems like a weird constraint to say that with one stock of capital you only want to spend it on one cause (AI safety) before some event but will spend it on any cause after the event.
I'm not sure that I can articulate a specific reason this doesn't make sense right now, but intuitively I think your older model is more reasonable.
The reason I made the model only have one thing to spend on pre-AGI is not because it's realistic (which it isn't), but because it makes the model more tractable. I was primarily interested in answering a simple question: do AI timelines affect giving now vs. later?
This is cool, thanks for posting :) How do you think this generalises to a situation where labor is the key resource rather than money?
I'm a bit more interested in the question 'how much longtermist labor should be directed towards capacity-building vs. 'direct' work (eg. technical AIS research)?' than the question 'how much longtermist money should be directed towards spending now vs. investing to save later?'
I think this is mainly because longtermism, x-risk, and AIS seem to be bumping up against the labor constraint much more than the money constraint. (Or put another way, I think OpenPhil doesn't pick their savings rate based on their timelines, but based on whether they can find good projects. As individuals, our resource allocation problem is to either try to give OpenPhil marginally better direct projects to fund or marginally better capacity-building projects to fund.)
[Also aware that you were just building this model to test whether the claim about AI timelines affecting the savings rate makes sense, and you weren't trying to capture labor-related dynamics.]
That's an interesting question, and I agree with your reasoning on why it's important. My off-the-cuff thoughts:
Labor tradeoffs don't work in the same way as capital tradoffs because there's no temporal element. With capital, you can spend it now or later, and if you spend later, you get to spend more of it. But there's no way to save up labor to be used later, except in the sense that you can convert labor into capital and then back into labor (although these conversions might not be efficient, e.g., if you can't find enough talented people to do the work you want). So the tradeoff with labor is that you have to choose what to prioritize. This question is more about traditional cause prioritization than about giving now vs. later. This is something EAs have already written a lot about, and it's probably worth more attention overall than the question of giving (money) now vs. later, but I believe the latter question is more neglected and has more low-hanging fruit.
The question of optimal giving rate might be irrelevant if, say, we're confident that the optimal rate is somewhere above 1%, we don't know where, but it's impossible to spend more than 1% due to a lack of funding opportunities. But I don't think we can be that confident that the optimal spending rate is that high. And even if we are, knowing the optimal rate still matters if you expect that we can scale up work capacity in the future.
I'd guess >50% chance that the optimal spending rate is faster than the longtermist community is currently spending, but I also expect the longtermist spending rate to increase a lot in the future due to increasing work capacity plus capital becoming more liquid—according to Ben Todd's estimate, about half of EA capital is currently too illiquid to spend.
 I'm talking about longtermism specifically and not all EA because the optimal spending rate for neartermist causes could be pretty different.
Nice, thanks for these thoughts.
Ah sorry I think I was unclear. I meant 'capacity-building' in the narrow sense of 'getting more people to work on AI' eg. by building the EA community, rather than building civilisation's capacity eg. by improving institutional decision-making. Did you think I meant the second one? I think the first one is more analogous to capital as building the EA community looks a bit more like investing (you use some of the resource to make more later)
I think we are falling for the double illusion of transparency: I misunderstood you, and the thing I thought you were saying was even further off than what you thought I thought you were saying. I wasn't even thinking about capacity-building labor as analogous to investment. But now I think I see what you're saying, and the question of laboring on capacity vs. direct value does seem analogous to spending vs. investing money.
At a high level, you can probably model labor in the same way as I describe in OP: you spend some amount of labor on direct research, and the rest on capacity-building efforts that increase the capacity for doing labor in the future. So you can take the model as is and just change some numbers.
Example: If you take the model in OP and assume we currently have an expected (median) 1% of required labor capacity, a rate of return on capacity-building of 20%, and a median AGI date of 2050, then the model recommends exclusively capacity-building until 2050, then spending about 30% of each decade's labor on direct research.
One complication is that this super-easy model treats labor as something that only exists in the present. But in reality, if you have one laborer, that person can work now and can also continue working for some number of decades. The super-easy model assumes that any labor spent on research immediately disappears, when it would be more accurate to say that research labor earns a 0% return (or let's say a -3% return, to account for people retiring or quitting) while capacity-building labor earns a 20% return (or whatever the number is).
This complication is kind of hard to wrap my head around, but I think I can model it with a small change to my program, changing the line in
In that case, the model recommends spending 100% on capacity-building for the next three decades, then about 30% per decade on research from 2050 through 2080, and then spending almost entirely on capacity-building for the rest of time.
But I'm not sure I'm modeling this concept correctly.