Are top existential risk estimates 50,000 times too high? An optimizer’s curse model and analysis

Yes, if you assume your errors are log normal distributed, you expect to see big errors.

Your simulation says that the range of actual threats is tightly bounded between 10e-5 and 10e-7 (IMO much too small a range). In contrast, your error estimates span 8 orders of magnitude (IMO likely too large a range).

I really think your choice of parameters fully explains your results.

Arepo

Makes sense that range of threats should be wider (arbitrarily wide I guess, but a function of what scenarios you consider and differentiate between). I don't see why error estimates should be thin though - there are certainly people guessing close to 100% for some risks and various mechanisms that we might not even have considered that we would consider high risk if we knew more about them, which lead to us underestimating the risks of innocuous actions by a huge amount (c.f. the recent upsurge in concern about mirror bacteria)

I agree my claim for tighter error estimates is very weak.

I could say that looking at the estimate you get by aggregating many different folks' together reduces variance (assuming you believe the estimates have some amount of uncorrelated signal). Individual estimates are noisy, but aggregate estimates are less noisy. This is basically the point discussed in the 'Why do different groups have the same rankings' section of OP's post.

But frankly I'm largely making a vibes claim (ie, model gives silly results -> model probably wrong).

Clara Torres Latorre 🔸

the OOM of variation in "ground truth" come from alpha and n, not xmin

alpha, we could talk all day, but the model is not extremely sensitive to it

on the other hand, if you say let's have more OOMs in the possible values of ground truth, following the power law, that means jacking n up

and when you jack n up you have even more opportunities for errors to be crazy big, and this effect dominates (at least that's what I read from the OP) and the curse becomes worse

now if we change alpha and n at the same time, idk

my honest opinion is that numbers are just one way to process information, and using them for this is so out of distribution that it's essentially meaningless (as it is when discussing p(doom) and stuff like that)

Sure, I understand where the values come from. I'm saying the distribution created leads to (IMO) clearly wacky results. The difference between the most vs least spooky X-risks is way more than a 100X difference.

I personally get some value out of numbers & sanity checking them like this, but your mileage may vary.

titotal

Here is an example run I did where I tuned down the alpha to 1.5 and tuned the lognormal standard deviation down to 1.5:

Here, the top actual threat really is 7 orders of magnitude more dangerous than the bottom evaluated threat. However, the top apparent threat is overestimated by a factor of 10,000 or so.

If I do a bunch of runs with these settings, the median overestimate is over 100x:

So even if I trust your vibes here (which do not seem to be based on anything), the curse can still hit quite badly. I personally believe that the spread of numbers that people make up is going to be higher than the spread of actual threats: when we look at actual surveys you get a highest-lowest estimate spread of 11 orders of magnitude for some questions.

One thing that might be confusing you is that the power law model assumes that only threats above a certain threshold of actual danger are considered (this is the xmin factor). Obviously nuclear risk is a much greater risk than stubbing your toe, but it's not going to show up the model.

Clara Torres Latorre 🔸

The difference between the most vs least spooky X-risks is way more than a 100X difference.

I think I would agree with this, if I had to put a number.

What I mean in my comment is, with this model, if you say okay let's pick a bigger n so that we see bigger differences in OOMs, then you are also introducing more points of failure in the estimation, and that effect dominates.

Do you have an a priori reason to discard this? Besides the conclusion being wacky, which is a good reason to discard a model anyways.

I agree that merely increasing n would not change the OP's conclusion that errors dominate.

My point is more that they picked too big of parameters for error variance and too small of parameters for risk-size variance.

Given the optimizer's cause, how do I optimally pick a cause area? Two observations:

In the standard model above, I cannot improve upon picking the cause with the highest estimate. While this cause will likely be overestimated by several OOMs and will likely not be the top x-risk cause, it is still the optimal cause to choose in expectation.
But in the extended model with grounded and speculative causes, the logic changes. Here, it can be optimal to pick the cause with the highest x-risk estimate in the grounded class, even if there are causes with higher x-risk estimates in the speculative class.

I think this has very interesting implications.

2) implies that working in a more grounded cause area (like global health?) can be better than working on speculative x-risk. This is a powerful implication and I think EA should take this very seriously.

1) implies that even if everyone makes an individually optimal cause-prioritization decision, some people's top causes will still look highly implausible to others.

Jack_S🔸

Good question, but I implore you not to take this post too seriously. It's a real phenomenon, but it's a real stretch to claim that this applies in the implied way to cause areas like AI safety.

The model in the article is just a toy model of a world where existential threats are randomly distributed according to a power law, where genuinely high-probability threats are, by assumption, basically absent from the space of possible threats, and where there's no process of updating based on evidence.

A more narrow claim like: "single narrowly defined x-risk estimates of genuinely speculative, unresearched causes are likely to be inflated" might be valid, but that doesn't seem to be what titotal is implying. He's making a claim way beyond anything implied by the model, even if the model were a valid representation of the phenomenon. He seems to believe that almost all of today's concern for AI risk is all downstream of a belief cultivated within a narrow subcommunity subject to the optimiser's curse - an extraordinary claim that requires a lot more evidence than that supplied in the article.

The claim being made is something like:

Some time in the 2000s, Eliezer Yudkowsky and friends made up some numbers for AI risk
Community dynamics (rather than the merits of the arguments) spread these numbers from Bostrom to Tegmark to Sam Harris to Elon Musk to the EA community etc.
These social dynamics had such an effect that multiple seemingly independent and unrelated people/experts from Geoffrey Hinton to Yoshua Bengio to Chinese academics and tech people from completely different intellectual lineages have absorbed this false belief from the cultural milieu (again, not at all based on the merits of the arguments), leading many of these people, as well as specialists across unrelated fields, to make estimates of AI x-risk that remain orders of magnitude too high

Note that this requires some pretty wild, difficult-to-justify assumptions on how this belief has spread.

The opposing narrative (which I would advocate for) is that:

AI x-risk is something that has independently been identified as a risk by multiple people - often far before quantifying or ranking risks.
The spread of these beliefs was obviously affected by community dynamics, but people largely adopted somewhat independent beliefs based on the merits of rational arguments
Quantitative predictions were inherently imprecise because they're so dependent on messy world models, but they were grounded enough in reason and evidence, and composed of enough independent estimates, that any optimiser's curse is massively weakened
As certain predictions came to pass, and current LLMs approach AGI in non-ideal geopolitical and competitive circumstances, this is increasingly being seen by a wider range of thinkers as a >1% x-risk
Even if there were "optimiser curse" risks in initial prioritisation of AI, it's now increasingly recognised that AI will be a massive deal. AI-assisted engineered pandemic uplift work, observed cyber-capabilities, and signs of misalignment/scheming etc. are building on the strong theoretical evidence base that AI-generated catastrophe is possible.

And on your particular question of how to act, even given the optimiser's curse as stated in the toy model, working on the speculative thing could still be optimal. If a highly uncertain intervention seems exceptionally promising or x-risky, the value of information becomes incredibly high, because accurate or well-reasoned research will lead to this intervention being prioritised or not by far more people. If you have a research focus, it's therefore probably more recommended to focus on a more uncertain, "high-risk-high-reward" area. You could also draw up a toy model for explore/exploit based on the optimiser's curse.

Finally, you don't necessarily have to "pick" from a narrow set of pre-defined cause areas. You can also divide or merge risks, cause areas, skill-sets etc. to be more robust, precise, or coherent with your own world model (e.g. focusing on engineered pandemics because you realise this could interact with AI-related x-risk, GCBRs, and global health).

Cause prioritisation is a function of the marginal impact on the outcome per marginal dollar/hour spent or similar.

This adds another layer of complexity because you can’t just eg “integrate out” the existential risk first and then reason about the impact, you kind of need to do it jointly.

This means you also need to include sources of impact uncertainty.
To me, this ~ removes many longtermist cause areas because it drags most of the “impact mass” too close to zero - but people have different opinions on this of course.

Point 1 is interesting - you can do better if you have and incorporate a prior, but the question is where that comes from. I think often it’s easier to have priors over intervention success than existential outcomes per se.

David T

Above all it implies don't focus the vast majority of efforts on one cause.

That might not be practical for career choices,^[1] but it's certainly possible for a funder or movement

^{^}
though a corollary of it is "don't assume that just because you've picked direct work that your career choice is maximally good and stuff like donations and helping others is just a distraction". This is arguably true for speculative career choices even if the optimal cause is the correct one (i.e. even if AI x-risk really does dominate everything, lots of the promising approaches to resolving it that people might choose will have no impact)

I like diversification as a reaction to this type of uncertainty, but it does not trivially follow? I might be missing something - do you have a favourite minimal set of assumptions that rigorously yield diversification as a function of this?

One simple model is:

each person can choose only one cause area
errors are iid across people
the x-risk coming from each cause decreases with each additional person working on this cause

Whether diversification is better (in expectation) depends on how a cause's x-risk decreases as additional people work on this cause. If x-risk decreases linearly (the 1000th person makes the same marginal contribution as the 1st), then diversification is not better in expectation. But if the contribution to x-risk prevention is marginally decreasing in people, diversification is better.

(By diversification I mean each person choosing their top estimated x-risk cause individually. But it can also mean that some people deliberately do not work on the cause with the highest aggregated risk estimate.)

I was more referring to the diversification as implied by “don't focus the vast majority of efforts on one cause”, which to me meant more “if you’re a decision maker over some amount of resources, you should diversify the allocation across cause areas”. Which I agree with, but it’s quite hard to really justify.

Yes, via nonlinearity you can get to diversification, but this means making additional assumptions beyond sampling error/publication bias/optimiser curse type effects.
The nonlinearity you’re describing matters on a movement level, but not on a individual decision makers level.
”Impact risk aversion” is another mechanism to get diversification which I think can be reasonable in cases where eg low impact reduces the probability of future donations or similar.

One channel I think is under explored and might work well as a justification for diversification in practice is something like this (I haven’t thought about this rigorously though): if I predictably optimise and my objective function is known to others, they will (in the worst case, possibly thru misaligned incentives) feed me biased information to influence my decision, and optimisation is very sensitive to noise, therefore I subject myself to adverse selection. So basically by not optimising but diversifying across good options you reduce the negative impact of this type of adverse selection. In this case, the “errors” are not iid. Hard to say how much diversification that yields.

David T

I think the conclusion that diversification is a good strategy follows trivially from the optimizers' curse: if you focus all your efforts on the apparent biggest threat, you've probably just focused on the cause with the largest risk assessment error and entirely neglected the actual biggest threat. A more diverse allocation is more likely to address the actual biggest threat. If there are diminishing returns to resources allocated to mitigate particular risk areas that makes diversification look better (complex nonlinear returns complicate it). As does the possibility that larger errors in risk assessment for a particular type of risk are inversely correlated with ability to invest in the best mitigation strategy for that type of risk.^[1]

But your point about adverse selection is a good one too. Metrics are gameable, and there are stronger incentives to do so when funding is "winner takes all" rather than "we disburse funds to a wide selection of causes and value rigour and disclosure of uncertainties"

^{^}
I think there are probably exceptions to this, but I think it's generally true. Good understanding of celestial mechanics and early warning systems, for example, are absolutely essential to potentially preventing hypothetical large space rocks colliding with earth, but also mean that we are less likely to overestimate the imminence of destruction by a rogue asteroid than we are for more unpredictable phenomenon.

While this sounds intuitively right, I think in the simplest utility maximising setting (iid additive errors with mean zero) your first claim does not seem true? The best looking noisy option is still most likely to be the best?

(I need to think more about the maths, but at least you need some kind of shrinkage to a prior that can change the ranking, which you’re unlikely to get, and if you’re maximising utility the solution is always fully concentrated?)

David T

I'm not sure naive total utility maximization [in a static framework] is the best framework to be thinking about dealing with existential risk over time.^[1]

Assuming the number of risks and error bars are not trivially small, the universal outcome of concentrating all your risk mitigations on one is that most risks continue to be a high as they could possibly be. The modal outcome is that the risks ignored includes at least one risk greater than the one all efforts are concentrated on mitigating. Some reasonable assumptions in the article above show this can hold even where the actual biggest risk is orders of magnitude greater than the one targeted. In the diversified approach, less money are devoted to reducing the perceived biggest risk, but the rest is apportioned to reducing other risks. This seems more robust to conventional assumptions like uncertainty and some risks being easier to mitigate than others.

^{^}
And tbh I'm not even seeing an average utility boost from concentrating on the single largest risk as opposed to mitigating lots of risks without ancillary assumptions like increasing returns to risk reduction expenditure or the actual value of many risks under consideration being 0.

Yeah I agree - expected utility maximisation really starts to fall apart in this existential risk regime, even over trajectories rather than applied statically, and it only makes sense “locally” and at the margin.

Personally I’m very happy to bite the bullet and not be rigorously utilitarian, but I’m also a global health focussed “old school EA” thinking about how much to diversify donations across charities ;)

Interesting. Who might these people be who deliberately feed you biased information? How do they benefit from you focusing on cause area y instead of cause area z?

To be clear, it need not be deliberate and they need not benefit personally!

Arepo

I think 1) implies that you should give up some substantial optimisation for the sake of greater versatility (which seems approx titotal's view with reference to overcommitting)

2) feels correct and important to me, also since I've been arguing in the post op linked and elsewhere that treating extinction as special is a heuristic that was useful for initial cause prioritisation but isn't a valid reason for focusing on it two decades later.

Jim Buhler

2w*

Say I compare different GHD interventions that help the worst off. I know that most such interventions are not crazy effective and that it's easy to under or overestimate those with little evidence. I have a good reference class. If someone tells me about a new intervention in the area, I don't expect it to be crazy good. It's much more likely to be close to the mean. I have a prior expectation. If my math tells me there's some poorly-studied intervention that beats the one that has consistently proven most effective so far, this weak evidence should not override my prior expectation. The math without accounting for my prior would give an overestimated number, surely.

But now say someone presents me with an x-risk intervention. I don't have a "cross-cause prior" that says that an intervention from any cause is close to the GHD mean, do I?^[1] If I understand your post correctly, you are implicitly assuming we do have such a cross-cause prior. (Otherwise, there wouldn't be any OC-related reason to downweight the x-risk intervention.)^[2] Is that correct?

^{^}
In fact, given how drastically different x-risks and GHD are, I would be surprised if the x-risk intervention doesn't end up far at the bottom or at the top, here.
^{^}
We might downweight it because of ambiguity aversion or something, but that's a completely different issue. No need to invoke OC.

Ben_West🔸

Interesting post, thanks! On this:

It’s possible that the lack of evidence has been accounted for in other ways. Perhaps someone who initially guesses a 20% chance of extinction is subtly dropping that down to 5% on the grounds of epistemic modesty. But it’s unlikely they are doing so in the exact right way to counteract the effect of the optimizer’s curse.

My understanding is that worldview diversification partially addresses things like this (see this old critique from Holden Karnofsky, which makes a similar point to yours and I think is intellectually upstream of CG's later thinking), in addition to accounting for e.g. normative uncertainty.

I'm not exactly sure how worldview diversification works (maybe someone from CG can comment) but I share your skepticism that it's being done in exactly the right way to counteract these effects.

Mo Putera

In 2021 Ajeya described the practical institutional reasoning behind worldview diversification on the 80K podcast like so, in case useful:

Ajeya Cotra: Yeah. I mean, I don’t know that there’s necessarily something to be said for it on a rigorously philosophical point of view, but I think there’s something to be said for not going all in on what you believe a rigorously philosophical accounting would say to value. So, I think one way you could put it is that Open Phil is — as an institution — trying to place a big bet on this idea of doing utilitarian-ish, thoughtful, deep intellectual philanthropy, which has never been done before, and we want to give that bet its best chance. And we don’t necessarily want to tie that bet — like Open Phil’s value as an institution to the world — to a really hyper-specific notion of what that means.
Ajeya Cotra: So, you can think about the longtermist team as trying to be the best utilitarian philosophers they can be, and trying to philosophy their way into the best goals, and win that way. Where at least moderately good execution on these goals that were identified as good (with a lot of philosophical work) is the bet they’re making, the way they’re trying to win and make their mark on the world. And then the near-termist team is trying to be the best utilitarian economists they can be, trying to be rigorous, and empirical, and quantitative, and smart. And trying to moneyball regular philanthropy, sort of. And they see their competitive advantage as being the economist-y thinking as opposed to the philosopher-y thinking.
Ajeya Cotra: And so when the philosopher takes you to a very weird unintuitive place — and, furthermore, wants you to give up all of the other goals that on other ways of thinking about the world that aren’t philosophical seem like they’re worth pursuing — they’re just like, stop… I sometimes think of it as a train going to crazy town, and the near-termist side is like, I’m going to get off the train before we get to the point where all we’re focusing on is existential risk because of the astronomical waste argument. And then the longtermist side stays on the train, and there may be further stops.
Robert Wiblin: Yeah, interesting. I like the idea that rather than thinking about this as exclusively a philosophical disagreement, think about it as a disagreement on the strategy question of, what’s our edge? What’s our edge over everyone else who’s trying to do good? And one of them is, “Well, we’ll be better at philosophy, and we’ll reach more philosophically rigorous conclusions”. And the other people are like, “We’ll be better in some other way. We’ll be more empirical, or be more careful about thinking about…”
Ajeya Cotra: More quantitative, yeah.
Robert Wiblin: More quantitative, exactly.
Ajeya Cotra: I mean, I actually think the near-termist side of the organisation empirically uses quantitative estimates way, way more than the longtermist side of the organisation does. So, on the longtermist side, we’ve talked ourselves into highly prioritising causes where there are only like 10 people working on them. And so most of our effort is trying to convince potential grantees — potential people who could be helpful in this mission — that it’s reasonable to work on at all. And trying to fund people who are trying to do the basic thing that we want to do — for example, reducing global catastrophic biorisks as opposed to focusing on biorisks in general. And that is where almost all of our selection pressure has to go. But on the near-termist side of things, they’re looking at lists of hundreds of things they could focus on, like air pollution in India, or migration from low-income countries to middle-income countries. And they have a huge list of causes and they’re just doing the math on the number of lives that get better per dollar with each of these options.
Ajeya Cotra: So, the feel of doing near-termist work at Open Phil is definitely much more quantitative and rigorous, and in some sense it feels more like what you would have thought a cartoon EA foundation would feel like, because they have more opportunity to map things out.

I'd be interested to know how much this has changed since if at all, especially on the longtermist side.

Questions for clarification:

1) "This means that probability values that are 10 times higher are 10times common." Shouldn't it be probabilities that are 10 times lower are 10 times more common?

2) In the section on speculative bias, you say that "grounded and speculative threats are identical in all ways, except that the speculative threats are much more uncertain". Shouldn't the frequencies of grounded actual (blue) and speculative actual (yellow) look the same then?

titotal

Point 1 was a typo, thanks for pointing it out!

Point 2 is something that confused me at first as well. The reason they are different is that we are looking at the performance of the top apparent threat. If we were perfectly good estimators, this would be the same as the top actual threat, but we aren't: the threat of the top pick is generally going to be lower than the top actual threat due to uncertainty and the curse.

For the grounded estimator, the process of ranking threats gives useful information, and it means that the top threat picked is much higher than you would get from picking at random. Whereas in the speculative case, we are much closer to just picking at random, and thats reflected in the yellow curve which looks a lot like the power law sampling we are drawing from.

Daniel_Friedrich

I like the high-level idea but for now, I am skeptical that going into the details of the math would make me decrease my AI p(doom) by 50,000 times (sorry, only skimmed it so far). As I understand it (knowing CTT), you're weighing "cause X p(doom)" against the general prior that "your reasoning is partially random and will come to incorrect conclusions".

However, in the case of AI x-risk, I'm not sure in which direction the prior should push me (which I take as a sign that it's integrated within my reasoning). Should I ignore AI as a concept, just singularity scenarios and put more weight on "there won't be more progress from now on"?

For me, this is particularly hard because I don't see AI p(doom) as just one number. E.g. I think there's a 20% chance of a "computronium-maximizer with arbitrary goals" and some chance of scenarios like "utility monster AGI", "global cooperation leading to value-maximizing AI", "global cooperation leading to a fraction of possible value", "AI dictatorship", "AI x-risk via terrorism or war" and then some chance "a significant stall in AI development - e.g. due to a moratorium, pandemic, war and maybe due to AGI being impossibly hard." ^[1]

It seems to me the lesson many readers will take away is putting more weight on the "AGI being impossibly hard" scenario but that, paradoxically, seems like a guess that requires a lot of confidence that the world will take a very different trajectory than what the trends suggest - i.e. such update would go against the spirit of the prior that prioritizes modesty and uncertainty.

Would this objection disappear if I tried to understand the math more deeply?

^{^}
Also, what is impossibly hard? I agree with Thorstad that if AGI took a 1000 more years, other problems should be a priority. If it took 70 more years, I would still think AI alignment research is extremely important, although I wouldn't think the same of AI safety activism.

Okay, so power law here means: Of all the possible causes of event Y, most will have a relatively small probability of causing Y, and only a few will have a higher probability of causing Y. To me, this makes sense, and I would expect this to hold across many domains.

The best cause will disappoint you: An intro to the optimisers curse

Here is a power law pattern for causes of death in the US. 22 causes have a share of total deaths between 1% and 10%, and many more causes have much smaller shares. Makes sense, right?

Comments

More from the author

170

titotal·5mo ago·Curated 5mo ago·17m read

286

A deep critique of AI 2027’s bad timeline models

titotal·1y ago·47m read

Does disaster frequency follow power laws? It's complicated

titotal·2mo ago·16m read

Curated and popular this week

Hard-to-reverse decisions destroy option value

Stefan_Schubert·9y ago·Curated 4d ago·14m read

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·5d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. Crossposted to LessWrong. ...

Coefficient Giving just gave GiveWell $1 billion. Where should other donors give now?

Jack Lewars·3d ago·2m read

Linkpost for my Substack piece, lightly adapted. Disagreement very welcome. What happened Coefficient Giving (cG) announced a $1 billion gift to GiveWell on 23rd July. This increases a previous commitment of $175m for 2026. cG say this could be a one-off surge, but it has implications for other donors either way. Both organisations...

Recent opportunities to take action

Blog Revival Project

Austin, Carol N·3h ago·2m read

Job: Executive Director of CEEALAR (EA Hotel)

CEEALAR·1d ago·3m read

Amsterdam Insect Protest

Bentham's Bulldog·1d ago·3m read

^{^}

though a corollary of it is "don't assume that just because you've picked direct work that your career choice is maximally good and stuff like donations and helping others is just a distraction". This is arguably true for speculative career choices even if the optimal cause is the correct one (i.e. even if AI x-risk really does dominate everything, lots of the promising approaches to resolving it that people might choose will have no impact)

^{^}

I think there are probably exceptions to this, but I think it's generally true. Good understanding of celestial mechanics and early warning systems, for example, are absolutely essential to potentially preventing hypothetical large space rocks colliding with earth, but also mean that we are less likely to overestimate the imminence of destruction by a rogue asteroid than we are for more unpredictable phenomenon.

^{^}

And tbh I'm not even seeing an average utility boost from concentrating on the single largest risk as opposed to mitigating lots of risks without ancillary assumptions like increasing returns to risk reduction expenditure or the actual value of many risks under consideration being 0.

^{^}

In fact, given how drastically different x-risks and GHD are, I would be surprised if the x-risk intervention doesn't end up far at the bottom or at the top, here.

^{^}

We might downweight it because of ambiguity aversion or something, but that's a completely different issue. No need to invoke OC.

Mo Putera