I am a research analyst at the Center on Long-Term Risk.
I've worked on grabby aliens, the optimal spending schedule for AI risk funders, and evidential cooperation in large worlds.
Some links
Thanks again Phil for taking the read this through and for the in-depth feedback.
I hope to take some time to create a follow-up post, working in your suggestions and corrections as external updates (e.g. to the parameters of lower total AI risk funding, shorter Metaculus timelines).
I don't know if the “only one big actor” simplification holds closely enough in the AI safety case for the "optimization" approach to be a better guide, but it may well be.
This is a fair point.
The initial motivator for the project was for AI s-risk funding, of which there's pretty much one large funder (and not much work is done on AI s-risk reduction outside of people and organizations and people outside the effective altruism community) though this result is entirely on AI existential risk, which is less well modeled as a single actor.
My intuition is that the "one big actor" does work sufficiently well for the AI risk community given the shared goal (avoid an AI existential catastrophe) and my guess that a lot of the AI risk done by the community doesn't change the behaviour of AI labs much (i.e. it could be that they choose to put more effort into capabilities over safety because of work done by the AI risk community, but I'm pretty sure this isn't happening).
For example, the value of spending after vs. before the "fire alarm" seems to depend erroneously on the choice of units of money. (This is the second bit of red-highlighted text in the linked Google doc.) So I'd encourage someone interested in quantifying the optimal spending schedule on AI safety to start with this model, but then comb over the details very carefully.
To comment on this particular error (though not to say that other errors Phil points to are not also unproblematic - I've yet to properly go through them), for what it's worth, the main results of the post suppose zero post fire alarm spending[1] and (fortunately) since in our results we use units of millions of dollars and take the initial capital to be on the order of 1000 $m, I don't think we face this problem of smaller having the reverse than desired effect for
In a future version I expect I'll just take the post-fire alarm returns to spending to use the same returns exponent from before the fire alarm but have some multiplier - i.e. returns to spending before the fire-alarm and afterwards.
Though if one thinks there will many good opportunities to spend after a fire alarm, our main no-fire-alarm results would likely be an overestimate
Strong agreement that a global moratorium would be great.
I'm unsure if aiming for a global moratorium is the best thing to aim for rather than a slowing of the race-like behaviour -- maybe a relevant similar case is whether to aim directly for the abolition of factory farms or just incremental improvements in welfare standards.
This post from last year - What an actually pessimistic containment strategy looks like - has some good discussion on the topic of slowing down AGI research.
I agree. This lines with models of optimal spending I worked on which allowed for a post-fire alarm "crunch time" in which one can spend a significant fraction of remaining capital.
I think "different timelines don't change the EV of different options very much" plus "personal fit considerations can change the EV of a PhD by a ton" does end up resulting in an argument for the PhD decision not depending much on timelines. I think that you're mostly disagreeing with the first claim, but I'm not entirely sure.
Yep, that's right that I'm disagreeing with the first claim. I think one could argue the main claim either by:
I think (1) is false, and think that (2) should be qualified by how one's advice would change depending on timelines. (You do briefly discuss (2), e.g. the SOTA comment).
To put my cards on the table, on the object level, I have relatively short timelines and that fewer people should be doing PhDs on the margin. My highly speculative guess is that this post has the effect of marginally pushing more people towards doing PhDs (given the existing association of shorter timelines => shouldn't do a PhD).
I think you raise some good considerations but want to push back a little.
I agree with your arguments that
- we shouldn't use point estimates (of the median AGI date)
- we shouldn't fully defer to (say) Metaculus estimates.
- personal fit is important
But I don't think you've argued that "Whether you should do a PhD doesn't depend much on timelines."
Ideally as a community we can have a guess at the optimal number of people in the community that should do PhDs (factoring in their personal fit etc) vs other paths.
I don't think this has been done, but since most estimates of AGI timelines have decreased in the past few years it seems very plausible to me that the optimal allocation now has fewer people doing PhDs. This could maybe be framed as raising the 'personal fit bar' to doing a PhD.
I think my worry boils down to thinking that "don't factor in timelines too much" could be overly general and not get us closer to the optimal allocation.
Thanks for the post!
In this post, I'll argue that when counterfactual reasoning is applied the way Effective Altruist decisions and funding occurs in practice, there is a preventable anti-cooperative bias that is being created, and that this is making us as a movement less impactful than we could be.
One case I've previously thought about is that some naive forms of patient philanthropy could be like this - trying to take credit for spending on the "best" interventions.
I've polished a old draft and posted it as short-form with some discussion of this (in the When patient philanthropy is counterfactual section).
Epistemic status: I’ve done work suggesting that AI risk funders be spending at a higher rate, and I'm confident in this result. The other takes are less informed!
I discuss
In principle I think the effective giving community could be in a situation where we should marginally be saving/investing more than we currently do (being ‘patient’).
However, I don’t think we’re in such a situation and in fact believe the opposite. My main crux is AI timelines; if I thought that AGI was less likely than not to arrive this century, then I would almost certainly believe that the community should marginally be spending less now.
I think patient philanthropy could be thought of as saying one of:
I don’t think we should call (1) patient philanthropy. Large funders (e.g. Open Philanthropy) already do some form of (1) by just not spending all their capital all this year. Doing (1) is instrumentally useful for the community and is necessary in any case where the community is not spending all of its capital this year.
I like (2) a lot more. This definition is relative to the community’s current spending rate and could be intuitively ‘impatient’. Throughout, I’ll use ‘patient’ to refer to (2): thinking the community’s current spending rate is too high (and so we do better by saving more now and spending later).
As an aside, thinking that the most ‘influential’ time ahead is not equivalent to being patient. Non-patient funders can also think this but believe their last spending this year goes further than in any other year.
A potential third definition could be something like “patience is spending 0 to ~2% per year” but I don’t think it is useful to discuss.
Of course, the large funders and the patient philanthropist may have different beliefs that lead them to disagree on the community’s optimal spending rate. If I believes one of the following, I’d like decrease my guess of the community’s optimal spending rate (and becoming more patient):
Since it seems likely that there are multiple points of disagreement leading to different spending rates, ‘‘patient philanthropy’ may be a useful term for the cluster of empirical beliefs that imply the community should be spending less. However, it seems better to be more specific about which particular beliefs are driving this the most.
For example “AI skeptical patient philanthropists” and “better-AI-opportunities-now patient philanthropists” may agree that the community’s current spending rate is too high, but disagree on the optimal (rate of) future spending.
Patient philanthropists can be considered as funders with a very high ‘bar’. That is, they will only spend down on opportunities better than utils per $ and if none currently exist, they will wait.
Non-patient philanthropists also operate similarly but with a lower bar . While the non-patient philanthropist has funds (and funds anything above utils per dollar, including the opportunities that the patient philanthropist would otherwise fund) the patient philanthropist spends nothing. The patient philanthropist reasons that the counterfactual value of funding something the non-patient philanthropist would fund is zero and so chooses to save.
In this setup, the patient philanthropist is looking to fund and take credit for the ‘best’ opportunities and - while the large funder is around - the patient philanthropist is just funging with them. Once the large funder runs out of funds, the patient philanthropist’s funding is counterfactual.[1]
If the large funder and patient philanthropist have differences in values or empirical beliefs, it is unsurprising they have different guesses of the optimal spending rate and 'bar'.
However, this should not happen with value and belief-aligned funders and patient philanthropists; if the funder is acting 'rationally' and spending at the optimal rate, then (by definition) there are no type-2 patient patient philanthropists that have the same beliefs.
There are some opportunities for trade between patient philanthropists and non-patient philanthropists, similar to how people can bet on AI timelines.
Let’s say Alice pledges to give $/year from her income and thinks that the community should be spending more now. Let’s say that Bob thinks the community should be spending less and and saves $/year from his income in order to give it away later. There’s likely an agreement possible (dependent on many factors) where they both benefit. A simple setup could involve:
This example closely follows similar setups suggested for betting on AI timelines.
Unless the amazing opportunity of utils/$ appears just after the large funder runs out of funds. Where ‘just after’ is the time that the large funder would have kept going with their existing spending strategy of funding everything about utils/$ by using the patient philanthropist’s funds.
DM = digital mind
Archived version of the post (with no comments at the time of the archive). The post is also available on the Sentience Institute blog
The consequence of this for the "spend now vs spend later" debate is crudely modeled in The optimal timing of spending on AGI safety work, if one expects automated science to directly & predictably precede AGI. (Our model does not model labor, and instead considers [the AI risk community's] stocks of money, research and influence)
We suppose that after a 'fire alarm' funders can spend down their remaining capital, and that the returns to spending on safety research during this period can be higher than spending pre-fire alarm (although our implementation, as Phil Trammell points out, is subtly problematic, and I've not computed the results with a corrected approach).