The optimal timing of spending on AGI safety work; why we should probably be spending more now

Tristan Cook; Guillaume Corlouer

Comments 12

Sorted by

New & upvoted

Hey, I think this sort of work can be really valuable—thanks for doing it, and (Tristan) for reaching out about it the other day!

I wrote up a few pages of comments here (initially just for Tristan but he said he'd be fine with me posting it here). Some of them are about nitpicky typos that probably won't be of interest to anyone but the authors, but I think some will be of general interest.

Despite its length, even this batch of comments just consists of what stood out on a quick skim; there are whole sections (especially of the appendix) that I've barely read. But in short, for whatever it's worth:

I think that a model roughly in this direction is largely on the right track, if you think you can allocate the entire AI safety budget (and think that the behavior of other relevant actors, like AI developers, is independent of what you do). If so, you can frame the problem as an optimization problem, as you have done, and build in lots of complications. If not, though—i.e. if you’re trying to allocate only some part of the AI safety budget, in light of what other actors are doing (and how they might respond to your own decisions)—you have to frame the problem as a game, at which point it quickly loses tractability as you build in complications. (My own approach has been to think about the problem of allocating spending over time as a simple game, and this is part of what accounts for the different conclusions, as noted at the top of the doc.) I don't know if the “only one big actor” simplification holds closely enough in the AI safety case for the "optimization" approach to be a better guide, but it may well be.
That said, I also think that this model currently has mistakes large enough to render the quantitative conclusions unreliable. For example, the value of spending after vs. before the "fire alarm" seems to depend erroneously on the choice of units of money. (This is the second bit of red-highlighted text in the linked Google doc.) So I'd encourage someone interested in quantifying the optimal spending schedule on AI safety to start with this model, but then comb over the details very carefully.

Tristan Cook

Thanks again Phil for taking the read this through and for the in-depth feedback.

I hope to take some time to create a follow-up post, working in your suggestions and corrections as external updates (e.g. to the parameters of lower total AI risk funding, shorter Metaculus timelines).

I don't know if the “only one big actor” simplification holds closely enough in the AI safety case for the "optimization" approach to be a better guide, but it may well be.

This is a fair point.

The initial motivator for the project was for AI s-risk funding, of which there's pretty much one large funder (and not much work is done on AI s-risk reduction outside of people and organizations and people outside the effective altruism community) though this result is entirely on AI existential risk, which is less well modeled as a single actor.

My intuition is that the "one big actor" does work sufficiently well for the AI risk community given the shared goal (avoid an AI existential catastrophe) and my guess that a lot of the AI risk done by the community doesn't change the behaviour of AI labs much (i.e. it could be that they choose to put more effort into capabilities over safety because of work done by the AI risk community, but I'm pretty sure this isn't happening).

For example, the value of spending after vs. before the "fire alarm" seems to depend erroneously on the choice of units of money. (This is the second bit of red-highlighted text in the linked Google doc.) So I'd encourage someone interested in quantifying the optimal spending schedule on AI safety to start with this model, but then comb over the details very carefully.

To comment on this particular error (though not to say that other errors Phil points to are not also unproblematic - I've yet to properly go through them), for what it's worth, the main results of the post suppose zero post fire alarm spending^[1] and (fortunately) since in our results we use units of millions of dollars and take the initial capital to be on the order of 1000 $m, I don't think we face this problem of smaller having the reverse than desired effect for

In a future version I expect I'll just take the post-fire alarm returns to spending to use the same returns exponent $η$ from before the fire alarm but have some multiplier - i.e. $x^{η}$ returns to spending before the fire-alarm and $k x^{η}$ afterwards.

^{^}
Though if one thinks there will many good opportunities to spend after a fire alarm, our main no-fire-alarm results would likely be an overestimate

kokotajlod

In most cases, we find that the optimal spending schedule is between 5% and 15% better than the ‘default’ strategy of just spending the interest one accrues and generally between 5% to 35% better than a naive projection of the community’s current spending rate

I am curious about the right way to think about this. One voice in my head is saying "OMG here is One Simple Trick to get 5-35% extra lifetime impact for the entire EA community--just lower our bar for spending until we are spending X% per year as the schedule indicates!" Another voice in my head is saying "Huh, 35% extra lifetime impact is small potatoes actually, there's probably all sorts of mistakes we are making that, if corrected, would yield similar effects. Maybe I should be trying to identify and correct those mistakes instead."

I lean more towards the first voice than the second currently.

kokotajlod

A third voice is saying "Well, clearly lowering our bar for spending that much would be a terrible idea, therefore something about this model must be wrong -- but which part specifically?"

Dan_Keys

The model assumes gradually diminishing returns to spending within the next year, but the intuitions behind your third voice think that much higher spending would involve marginal returns that are a lot smaller OR ~zero OR negative?

kokotajlod

Huh, now that you mention it, I think the third voice thinks that much higher spending would be negative, not just a lot smaller or zero. So maybe that's what's going on: The third voice intuits that there are backfire risks along the lines of "EA gets a reputation for being ridiculously profligate" that the model doesn't model?

Maybe another thing that's going on is that maybe we literally are funding all the opportunities that seem all-things-net-positive to us. The model assumes an infinite supply of opportunities, of diminishing quality, but in fact maybe there are literally only finitely many and we've exhausted them all.

kokotajlod

A counter to that second thing is: Well we can always just give to GiveDirectly or something like that.

Vasco Grilo🔸

Great work!

Have you considered modelling the inputs as distributions to run a Monte Carlo simulation? Since you are doing an optimisation, I am afraid it would take some time to get results, but maybe 100 or 1 k samples is feasible?

The Monte Carlo simulation would consider the uncertainty of all the inputs simulaneously, so the sensitivity analyses would be simplified.

Tristan Cook

Thanks!

And thanks for the suggestion, I've created a version of the model using a Monte Carlo simulation here :-)

Vasco Grilo🔸

Great, I will have a look!

MichaelDickens

Can you explain how to use the discount rate parameter? From context, it looks like it's meant as the "social discount rate", not the "pure time preference" (which I would think should be 0). But for EAs, I think of the social discount rate as being positive mostly due to x-risk, and your model already separately accounts for AI x-risk, so should the social discount rate exclude discounting due to AI x-risk? In which case I would think it should be pretty low.

Edit: Never mind, I missed that you explained this in the post. Seems like you agree with my interpretation that the discount rate is based on non-AI x-risk.

(In which case 2% seems way too high to me, but that's just details.)

kokotajlod

Further, for all results, we begin maximum crunching after the modal AGI arrival date, which is understandable while the rate of growth of the movement is greater than the rate of decrease in probability of AGI (times the discount factor).

Huh, I find that super counterintuitive. Is this maybe a reductio against the model?

Comments

More from the author

Center on Long-Term Risk: Annual Review & Fundraiser 2025

Tristan Cook, Center on Long-Term Risk·7mo ago·4m read

137

Replicating and extending the grabby aliens model

Tristan Cook·4y ago·62m read

Neartermists should consider AGI timelines in their spending decisions

Tristan Cook·3y ago·5m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 2d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

172

The first video from Giving What We Can's new channel is out now!

JustinPortela·3d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·5d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·1d ago·1m read

PauseCon London '26: Applications now open

Jonathan@PauseAI·1d ago·1m read

Seeking feedback and collaborators for an AI welfare project

Juliana Grant·1d ago·2m read

Tristan Cook

Thanks again Phil for taking the read this through and for the in-depth feedback.

I don't know if the “only one big actor” simplification holds closely enough in the AI safety case for the "optimization" approach to be a better guide, but it may well be.

This is a fair point.

For example, the value of spending after vs. before the "fire alarm" seems to depend erroneously on the choice of units of money. (This is the second bit of red-highlighted text in the linked Google doc.) So I'd encourage someone interested in quantifying the optimal spending schedule on AI safety to start with this model, but then comb over the details very carefully.

^{^}
Though if one thinks there will many good opportunities to spend after a fire alarm, our main no-fire-alarm results would likely be an overestimate

^{^}

Todd (2021) claims 1% donated per year for the whole effective altruism community.

The Open Philanthropy project spent $80m in 2021 on AI risk interventions. In 2021 they had approximately $17.8b committed. Supposing that at least one sixth of their budget is committed to AI risk interventions, this gives a spending rate of at most 2.6%.

The FTX Future Fund has granted around $31m on AI risk interventions since starting over a year ago. Supposing at least one tenth of their budget is committed to AI risk interventions, this gives a spending rate of at most $31m/($16,600m /10) = 1.9% per year.

For the AI s-risk community, the figure is around 3%.

^{^}

This is true when supposing a 4% constant spending rate, which is an overestimate of current spending but maybe underestimating future spending.

^{^}

The second model requires us to split activities into capacity growing and capacity shrinking that increase our probability of success whereas the first model talks concretely about the spending rate of money.

^{^}

The values we choose are discussed here.

^{^}

25% AGI by 2027, 50% 2030

^{^}

50% AGI by AGI by 2040, 75% by 2060

^{^}

50% AGI by 2050, 75% by 2075

^{^}

The easy case is operationalised by the inputs in the model: probability of success if AGI this year = 25%, and the steepness of the S-shaped success curve $l = 0.2$ .

^{^}

The “break even line” is the maximum rate at which the funder can spend and still have their money increase.

^{^}

Probability of success if AGI this year = 10%, and $l = 0.1$

^{^}

Probability of success if AGI this year = 4%, and $l = 0.05$

^{^}

This is a lower bound for two reasons: first, there may be spending schedules better than those that we present. Second, we calculate the utility of the 4% strategy for an arbitrarily long time horizon whereas we compute the optimal spending schedules within the year 2100.

^{^}

Added post-publication on 2022-11-29

^{^}

We use semi-arbitrary units of ‘quality-adjusted relevant effort units’. We use ‘quality-adjusted’ to imply that an expert's hour of contribution is worth more than a novice’s hour of contribution. We use ‘relevant’ to discount any previously acquired research or influence that is no longer useful. We use ‘effort units’ to mostly account for the time that people have put into working on something.

^{^}

We give a more complete account in the technical description. In short, the solution is given as a numerical solution to an expanded set of differential equations.

^{^}

The opposite is also plausible if AI risk becomes increasingly salient.

^{^}

Taking $δ = 0.4$

^{^}

Taking $δ = 0.2$

^{^}

We see that even with practically no money we still achieve some utility. This is because both (1) the probability of success is non-zero when we have no research or influence, and (2) we already have some research and influence that will depreciate over time with no further spending.

^{^}

$η_{R} = 0.15$ , $η_{I} = 0.1$

^{^}

As discussed in the appendix, this is $η_{R} = 0.3, η_{I} = 0.2$

^{^}

$η_{R} = 0.45, η_{I} = 0.3$

^{^}

This weight-adjustment is determined the $γ$ term in the constant elasticity of substitution function.

^{^}

This is done by increasing $ρ$ up to 1.

^{^}

This is due to our choice of $γ$

^{^}

In the sense that (1) it depreciates at a lower rate (2) there are lower diminishing marginal returns.

^{^}

At a high enough spending rate one marginal unit of influence is cheaper than one marginal unit of influence due to diminishing marginal returns

^{^}

We take $ρ = - 10$

^{^}

We take $ρ = 0.001$

^{^}

This has $ρ = 0.3$

^{^}

This is $ρ = 1$

^{^}

See table in the introduction

^{^}

The default strategy is where you spend exactly the amount your money appreciates, and so your money remains constant.

^{^}

E.g. thinking that ‘although I think AGI is more likely than not in the next $t$ years, it is intractable to increase the probability of success in the next t years and so I should work on interventions that increase the probability of success in worlds where AGI arrives at some time $t^{'} > t$ ’

^{^}

Probability of success if AGI is this year is 0.1%, and steepness $l = 0.1$

^{^}

$ρ = 0.8$

^{^}

This is $ρ$ is low, perhaps less than or equal to zero.

^{^}

We take $ρ \approx 0$ , so this is approximately the Cobb-Douglas production function.

^{^}

Similar to how $c (t)$ makes influence more expensive over time.

^{^}

We have $r = 0.1, η_{R} = 0.3, M_{0} = 4000$

^{^}

Inputs that are not considered include: historic spending on research and influence, the rate at which the real interest rate changes, the post-fire alarm returns are considered to be the same as the pre-fire alarm returns.

^{^}

And supposing a 50:50 split between spending on research and influence

^{^}

This notebook is less user-friendly than the notebook used in the main optimal spending result (though not un user friendly) - let me know if improvements to the notebook would be useful for you.

^{^}

The intermediate steps of the optimiser are here.

	Median AGI arrival
	2030^[5]	2040^[6]	2050^[7]
Easy^[8]^[9]
Medium^[10]
Hard^[11]

How much better the optimal spending schedule is compared to the 2%+2% constant spending schedule (within-model lower bound)^[12]
	Median AGI
	2030	2040	2050
Easy	37.6%	18.4%	11.8%
Medium	39.3%	14.9%	12.0%
Hard	12.3%	5.85%	1.55%

Starting money multiplier	0.001	0.01	0.1	0.5	1	1.1
Absolute utility	0.031^[19]	0.044	0.092	0.219	0.317	0.332
Multiple of 100% money utility	0.098	0.139	0.290	0.691	1	1.047


Research and influence are very poor substitutes^[28]	Research and influence are poor substitutes^[29]	Our best guess^[30]	Research and influence are perfect substitutes^[31]

Time evolution of things that grow	$˙ K (t) = r (t) \cdot K (t) - x (t)$
Time evolution of things that depreciate	$˙ C (t) = x (t)^{η} \cdot c (t) - λ_{C} \cdot C (t)$
Post-fire alarm total of things that depreciate	$^C (t) = C (t) + (\frac{K (t) \cdot f}{T_{F A}}) \cdot T_{F A}$
The probability of success given AGI at $t$	$Q (t) = (1 + exp {(- κ \cdot (^C (t) - σ))}^{- 1}$
Objective function	$U = \int_{0}^{\infty} Q (t) \cdot p (t) \cdot e^{- d \cdot t} d t$


The yearly optimal spending schedule averaged over this decade, 2022-2030 (left), and the next, 2030-2040 (right). For each level of AI safety difficulty (easy, medium and hard columns) and each decade we reported the average spending rates in research and influence in % of the funder’s endowment.


Low discount rate: $d = 0.5 %$	Our guess, $d = 2 %$	High discount rate, $d = 5 %$


Highly pessimistic growth rate: 5% growth rate	Pessimistic growth rate: 10% current growth decreasing to 5% in the five years^[17].	Our guess: 20% current growth decreasing to 8.5% in the next ten years^[18].


When we have 10% of our current budget of $4000m	When we have 1000% of our current budget


High diminishing returns^[20]	Our guess^[21]	Low diminishing returns^[22]


Research is highly serial ( $α_{R} = - 0.5$ )	Default ( $α_{R} = 0$ )	Research is highly parallelizable ( $α_{R} = 0.5$ )

	Median AGI arrival
Difficulty of AGI success	2030	2040	2050
Easy
Medium
Hard

The optimal timing of spending on AGI safety work; why we should probably be spending more now

Summary

Introduction

A qualitative description of the model

Research and influence

Money

Preparedness

AGI fire alarm

Success

Solving the model

Results

Optimal spending scheduled when varying AGI timelines and difficulty of success

Sensitivity

Discount rate

Growth rate

Current money committed to AGI interventions

Diminishing returns

Parallel vs serial research

Presence of a fire alarm

Substitutability of research and influence

Discussion

Some hot takes derived from the model

The community’s current spending rate is too low

The optimal spending schedule is generally 5 to 15% better than the default strategy[33]

In most cases, we should not ‘wager’[34] on long AGI timelines when we believe AGI timelines are short

In some cases, we should ‘wager’ on shorter timelines by spending at a high rate now

Limitations

Research is endogenous

AGI timelines are independent of our spending schedule

Research and capabilities are independent.

Diminishing returns, and other features, are constant

The optimal spending schedule is not always found

Appendix

Further limitations

Many inputs are required from the user

Appreciation of money is continuous and endogenous

The model only maximises the probability of ‘success’ (with constraints given by keeping money and spending non-negative)

Technical description

Fire alarm

Preparedness and success

Solving the model

Model guide

Explaining and estimating the model parameters

AI timelines

Discount rate

Money

Research and influence

Competition effects

Historical spending on research and influence

Fire alarm

Preparedness and probability of success

Alternate model

Formalisation

Estimating parameters

Results

Limitations

Full results from the nine cases

Robust spending schedules by Monte Carlo simulation

Some notes

Author contributions

Acknowledgements

The optimal spending schedule is generally 5 to 15% better than the default strategy^[33]

In most cases, we should not ‘wager’^[34] on long AGI timelines when we believe AGI timelines are short