Hide table of contents

This is the fourth post in a series. Here are the first, second, third, and fifth posts.

Sheltering Within the Herd

There are many ways to handle tiny probabilities of vast utilities; previously I divided them into seven categories. Within each category, there are some ways that undermine long-termist arguments, and some that do not.

However, not all the ways are equally plausible. This section argues that long-termist projects shelter in the herd of ordinary behaviors, meaning that it is difficult to find a well-motivated solution to the steelmanned problem that undermines mainstream long-termist projects without also undermining some ordinary behaviors. (Note that this does not mean that mainstream long-termist projects are safe; there may still be other arguments that undermine them.)

The (claimed) probabilities aren’t that small

Many of the ways to deal with tiny probabilities of vast utility focus on the “tiny probabilities” part. (In particular, #1, #2, and #5) So how small are the relevant probabilities for mainstream long-termist projects? Are long-termists launching projects that they think have a one-in-a-million chance of success, or a one-in-a-trillion-trillion?

This subsection argues that mainstream long-termist projects aren’t that different from (some) ordinary projects in this regard. Rather than canvas all mainstream long-termist projects, this article will focus on MIRI-style AI safety research, since it is arguably the most speculative. What goes for MIRI-style AI safety research probably also goes for other long-termist projects like pandemic prevention, disaster response, nuclear war prevention, climate-change mitigation, advocating for large-scale social or political change, and prioritization research.

MIRI aims to help avert a possible existential catastrophe by producing foundational research on AI safety. The probability of preventing such a catastrophe by joining MIRI as a researcher (or donating a significant amount of money to MIRI) varies across many orders of magnitude, depending on what assumptions are made about the probability of there being a problem in the first place, the probability that MIRI’s research will address the problem if there is one, the probability that an additional researcher will make the difference, the probability that the relevant people will read the research, etc.

Much has been written on this topic. It is beyond the scope of this article to run through the arguments for and against the plausibility of AI risk. However, I will say that  “sometimes it’s better to pull numbers out of your ass and use them to get an answer, than to pull an answer out of your ass.” Thinking about the probability of saving the world by joining MIRI, it’s intuitively obvious that the probability is small, but it’s not at all intuitive just how small the probability is—it’s equally intuitive to think that the probability is as it is to think that the probability is . (Matthews, for example, argues this.) But intuitions can sometimes be corrected via comparisons. For example, compare the work-for-MIRI plan with an alternate plan that involves convincing the U.S. President that you are God by having meteors strike the front lawn precisely when & where you point. The probability that three distinct meteors will happen to land exactly when and where you want them to is . While the work-for-MIRI plan is unlikely to succeed, surely it is more likely to succeed than that! The lesson is that when the numbers get extreme, putting a little effort into making some back-of-the-envelope estimates and models with made-up parameters is more reliable than just trusting your intuition.

MIRI and the Future of Humanity Institute each created models for calculating the probability that a new researcher joining MIRI will avert existential catastrophe. [CORRECTION: This is an error, both models were created by the Oxford Prioritization Project, with input from MIRI and the FHI] MIRI’s model puts it at between and , while the FHI estimates between and . 80,000 Hours, when they looked into it, didn’t give an explicit probability estimate but they thought there was about $10M of annual funding for AI safety work in 2015 and that doubling the total AI safety effort would have a chance of preventing “serious catastrophe.”

If those estimates are accurate, then even MIRI-style AI safety research involves probabilities that are comparable to those of ordinary things like your vote affecting an election (~), or your choice to buckle your seatbelt saving your life ( for a 100-mile drive).

True, voting and fastening your seatbelt take much less effort than devoting your life to a cause. Perhaps this is a relevant difference; perhaps we should be less willing to consider low-probability scenarios the more costly our decision is. Even if this is so, we can think of examples that are similarly high-effort to devoting your career to AI safety. For example, consider nuclear plant safety: Existing nuclear plants have an annual chance of core damage of , but many work-hours and thousands of dollars are spent ensuring that the latest designs lower the probability to . And that’s just the chance of core damage, which might not even hurt anyone if it happens. The chance of a Chernobyl-style disaster are even lower. (That said, there’s an arguably important disanalogy between these cases and long-termist projects, having to do with the law of large numbers. See appendix 8.4.1 for discussion.)

The point is, long-termists who prioritize AI safety as a cause agree that the probabilities involved are low, but they believe that the probabilities involved are comparable to those involved in various ordinary behaviors.

Perhaps they are significantly overconfident; perhaps the probabilities involved are much smaller than they think.

If so, fair enough; then there will be some solutions to the problem of tiny probabilities of vast utilities that undermine AI safety arguments without undermining ordinary behaviors. But in that case, the disagreement isn’t a philosophical disagreement about how to handle tiny probabilities of vast utilities; it’s an ordinary empirical disagreement about what the probabilities are.

What goes for AI safety should go for other mainstream long-termist projects also. Climate change reduction, nuclear war and pandemic prevention, and general long-termist community-building (to name a few) are widely regarded to be on much firmer footing, with more evidence and consensus in support of them. So even if you think that AI safety research is too speculative to be worthwhile (according to your preferred method of handling tiny probabilities of vast utilities) there’s a good chance you’ll still endorse these other projects.

This subsection has focused on probabilities, but much the same thing could be said for other properties like “being evidence-based” or “being endorsed by experts.” Even AI safety research has those things to some extent—and certainly the long-termists who prioritize it think that it does. Things are even better for the long-termists when the “cancelling out” strategy (#6) is used: As we discussed in section 3, there are good reasons to think that (for example) the possibility of preventing extinction by joining MIRI is not cancelled out by an equal-or-greater probability of preventing extinction by donating to AMF.    

Unusually high utilities should not count against a project

This subsection argues that, with respect to solutions #3, #4, and #7, long-termist projects also shelter within the herd. 

First, let’s focus on solutions #3 and #4: those that involve bounded or otherwise restricted utility functions. These solutions are like taking a set of funnel-shaped action profiles and squishing them down systematically, so that they become bullet-shaped. Stars that begin on top of each other will end on top of each other, though the distance between them may be less. So if project A is better than project B before the squish, it will also be better afterwards unless the probabilities of good outcomes for A are smaller than the probabilities of good outcomes for B. So if long-termist projects and ordinary behaviors involve similar probabilities, it will be hard to undermine one without undermining the other. Even if long-termist projects involve significantly lower probabilities than ordinary behaviors—even if, for example, the probability of preventing catastrophe by joining MIRI is —the overall shelter-within-the-herd point might still stand. Arguably, when bounding the utility function, the resulting rate of diminishing returns should be smooth and steady rather than spiked around the current population of Earth. It would be suspiciously convenient if helping people (e.g. by preventing world war three) is almost times better than helping one person, yet helping people is only (say) times better than helping one person. So either we say that helping people is significantly less than times better than helping one person, or we say that helping people is significantly more than times better than helping one person. If we take the second option, then (for example) preventing existential catastrophe will still seem promising even if the probability of success is . If we take the first option, we start to undermine some ordinary decisions that involve politics and society—cases like voting in major elections, in which swinging the election affects hundreds of millions, and perhaps billions, of people. (Of course, you might be fine with this. Some people think that the reasons we have to vote don’t have much to do with influencing the election; rather, it’s about showing support, or fulfilling our civic duty.)

What about the “Go with your gut” approach? This category of solution seems quite likely to undermine long-termist projects without undermining ordinary behaviors—all it takes is to have a gut that dislikes one and likes the other. However, this category of solution risks running afoul of the plausible principle that when probabilities are equal, higher utilities are better. Because both Pascal's Mugging and saving the world from extinction involve gigantic payoffs, it's intuitive to put them in the same category. But the probability of saving the world with a long-termist donation may be no lower than the probability of swaying a US presidential election with a donation in the same dollar amount. The danger is that “go with your gut” solutions might make you unfairly penalize actions that have really good potential outcomes, compared to actions that have moderately good potential outcomes of similar probability. If we make sure to avoid this danger somehow, then the “go with your gut” approach may well have similar implications to the other approaches discussed in this subsection.

Finally, note that this article so far has evaluated outcomes relative to some sort of default point for your decision, perhaps what would happen if you do nothing: In the early diagrams, saving N lives has utility N, and doing something with no effect (like voting for an election and not making a difference) has utility 0. But this is a potentially problematic way to do things. Perhaps we should take a less self-centered perspective; I argue that if we do, the “shelter within the herd” point becomes even stronger. (See appendix 8.5.)

6.3: Concluding this section

We saw before that saying  “Even if the probability of extinction is one in a billion and the probability of preventing it is one in a trillion, we should still prioritize x-risk reduction over everything else…” is naive.

But it is also naive to say things like “Long-termists are victims of Pascal’s Mugging.” People who prioritize long-termist causes like x-risk reduction are not trying to Mug you, nor are they victims of a Mugging. Long-termist prioritization decisions are consistent with a wide range of solutions, and the solutions which undermine long-termist projects tend to undermine various ordinary behaviors as well. Insofar as long-termists are victims of the Mugging, so are we all.


The initial worry of Pascal’s Mugging—does expected utility theory recommend giving money to the Mugger?—can be dispelled. The more general issue of how to handle tiny probabilities of vast utilities is much more problematic: Standard utility functions, when combined with standard probability functions and expected utility maximization, lead to incoherence or paralysis in everyday cases. There are many ways to fix this, but they all are controversial.

Amidst this controversy, it would be naive to say things like “Even if the probability of extinction is one in a billion and the probability of preventing it is one in a trillion, we should still prioritize x-risk reduction over everything else…” This might be true, but it is a very controversial philosophical claim; some of the contending solutions support it and others don’t.

Yet it would also be naive to say things like “Long-termists are victims of Pascal’s Mugging.” For one thing, as the response to the initial worry shows, there’s an important difference between the Mugger’s proposal and mainstream long-termist projects. More importantly, mainstream long-termist projects shelter within the herd of ordinary behaviors.

Of course, this is just one line of argument related to long-termism. There are many other arguments for long-termism, and there are other arguments that undermine mainstream long-termist projects. In particular, if you think that the probabilities of success are several orders of magnitude lower than long-termists think, then (for you) those projects do not shelter within the herd, opening up the possibility of a solution that undermines them without undermining ordinary behaviors. And if you think that the probabilities of success are many orders of magnitude lower than long-termists think (e.g. or so) then for you they are sufficiently far outside the herd that it would be unwise to support them. The point is that, in our current state of uncertainty about how to solve this problem, whether or not to prioritize long-termist projects hinges on what the relevant probabilities are. And that is how it should be.

This is the fourth post in a series. Here are the first, second, third, and fifth posts.


30:  80,000 Hours has done overviews of several of these projects. For example, generic AI safety research, pandemic prevention, nuclear security,and climate change: See http://allfed.info/ for work into disaster response, and https://80000hours.org/2015/07/effective-altruists-love-systemic-change/ for some examples of political and social change advocacy. 

31:  There are 65,379,987,902,693,376,000 square feet on earth’s surface. Assume you are pointing to a 10-by-10 foot area on the White House lawn for five seconds. 500 meteors hit earth each year. The calculation is roughly (5x100x500/31536000)/65,379,987,902,693,376,000 ≈10^-22. A better version of this plan would involve only two meteors and would use the remaining probability to e.g. guess what the President had for breakfast that morning, guess the names of his bodyguards, etc. but I didn’t bother to look up the odds of those things. Also, technically Matthew’s example was for donating $1000 to MIRI, not working for them. But for our purposes it doesn’t really matter: It only changes the estimate by a couple orders of magnitude.

32:  Both models can be seen, combined, here: https://www.getguesstimate.com/models/8789

33:  https://80000hours.org/career-reviews/artificial-intelligence-risk-research/ Note that their estimate is for AI safety research more generally rather than MIRI in particular.

34:  The voting probability is for US Presidential elections, as estimated by Gelman, Silver, and Edlin (2009). The road safety stats come from https://en.wikipedia.org/wiki/Traffic_collision and https://en.wikipedia.org/wiki/Transportation_safety_in_the_United_States.

For the latter, the estimate is that not wearing a seatbelt doubles your risk of death, which is ~10 deaths per billion miles in the USA.

35:  This isn’t obvious; in fact, one could argue the opposite. Caring about this difference in either direction leads to some interesting puzzles. Many big decisions can be thought of as composed of millions of little decisions, and vice versa. For example, the decision to smoke can be thought of as a million decisions to take a puff, each one of which has a less-than-one-in-a-million chance of causing cancer. https://www.cancer.org/healthy/stay-away-from-tobacco/cigarette-calculator.html The nuclear plant safety standards can be thought of as a a single decision to make all nuclear plants safe, carried out by hundreds of separate plants—thinking about it that way, the probability isn’t that low at all. But then if we thought about AI safety as a single decision to be extra careful made by all AI builders and policy-makers… then the probability wouldn’t be that low either.).

36: http://www.world-nuclear.org/information-library/safety-and-security/safety-of-plants/safety-of-nuclear-power-reactors.aspx

37:  Of course, if we account for model uncertainty, perhaps the probability of some sort of catastrophe should be higher. But the same could be said for various existential risks.

38:  And/or the probabilities of bad outcomes for A are greater than the probabilities of bad outcomes for B.

39:  Caveat: If you take the person-affecting view, then the value of preventing existential catastrophe is probably less than even without any diminishing returns, because making new people doesn’t count, morally. However, even on the person-affecting view, the value of preventing existential catastrophe may be astronomically high, since currently-existing people could survive for billions and maybe trillions of years with the right technology.

40:  More precisely, given two possibilities of equal probability, the one with higher utility should count for more in decision-making; all else equal an action that might lead to the first possibility is preferable to an action that might lead to the second.

41:  For many utility functions (such as those with diminishing returns to scale) it leads to some inconsistencies between what you should do as an individual and as a member of a group. For example, if a plague is detected in one country and the borders get closed to create a quarantine, it might be best for each individual border guard to let every refugee through (because that way they are saving several lives for sure vs. having a small chance of saving millions of lives) and yet if all the border guards do this millions of people will die for sure. We’d have the strange result that it is right for the government to order the guards to close the border, and yet wrong for the guards to obey the order.

42:  You may be wondering why “sheltering within the herd” is agent-relative—why it makes sense to say a project can shelter within the herd for me but not for you. Why not instead say that the true probability of success for long-termist projects is what it is, and if it is high, then they shelter within the herd, and if it is low, they don’t? The answer is that I am evaluating accusations that long-termists are “victims of Pascal’s Mugging.” For that, it makes sense to use their internal perspective. Analogy: If my friend calculates the odds of winning a certain lottery to be high enough that the expected utility of playing is positive, and accordingly buys a ticket, and I think my friend made an error in her calculation of the odds… it would be wrong for me to accuse her of failing to maximize expected utility. Rather, she’s maximizing expected utility but with a bad probability function. Similarly, if you think that long-termists would be correct to focus on the long-term if they were correct about how likely they are to succeed, it’s misleading to accuse them of falling prey to Pascal’s Mugging. Rather, you should simply say that they are overconfident.

Sorted by Click to highlight new comments since:

MIRI and the Future of Humanity Institute each created models for calculating the probability that a new researcher joining MIRI will avert existential catastrophe. MIRI’s model puts it at between and , while the FHI estimates between and .

The wording here makes it seem like MIRI/FHI created the model, but the link in the footnote indicates that the model was created by the Oxford Prioritisation Project. I looked at their blog post for the MIRI model but it looks like MIRI wasn't involved in creating the model (although the post author seems to have sent it to MIRI before publishing the post). I wonder if I'm missing something though, or misinterpreting what you wrote.

Thanks for this! It's been a long time since I wrote this so I don't remember why I thought it was from MIRI/FHI. I think it's because the guesstimate model has two sub-models, one titled "the MIRI method" and one titled "The community method (developed by Owen CB and Daniel Dewey" who were at the time associated with FHI I believe. So I must have figured the first model came from MIRI and the second model came from FHI.

I'll correct the error.


Ok I see, thanks for the clarification! I didn't notice the use of the phrase "the MIRI method", which does sound like an odd way to phrase it (if MIRI was in fact not involved in coming up with the model).

The average American drives 10^4 miles per year.  The seatbelt comparison is off.

I think the same sheltering happens if you talk about ignoring small probabilities, even if the probability of the x-risk is in fact extremely small.

The probability that $3000 to AMF saves a life is significant. But the probability that it saves the life of any one particular individual is extremely low. We can divide up the possibility space any number of ways. To me it seems like this is a pretty damning problem for the idea of ignoring small probabilities.

We can say that the outcome of the AMF donation has lower variance than the outcome of an x-risk donation, assuming equal EV. So we could talk about preferring low variance, or being averse to having no impact. But I don't know if that will seem as intuitively reasonable when we circle our new framework back to more everyday, tangible thought experiments.

Hmmm, good point: If we carve up the space of possibilities finely enough, then every possibility will have a too-low probability. So to make a "ignore small probabilities" solution work, we'd need to include some sort of rule for how to carve up the possibilities. And yeah, this seems like an unpromising way to go...

I think the best way to do it would be to say "We lump all possibilities together that have the same utility." The resulting profile of dots would be like a hollow bullet or funnel. If we combined that with an "ignore all possibilities below probability p" rule, it would work. It would still have problems, of course.

Curated and popular this week
Relevant opportunities