Hide table of contents

People should stop falling back on the argument that working on AI safety research is a 'Pascal's Mugging', because that doesn't address the actual reasons people who work on AI safety think we should work on AI safety today.

Most people who work on AI think the chances of affecting the outcome are not infinitesimal, but rather entirely macroscopic, the same way that voting in an election has a low but real chance of changing the outcome, or having an extra researcher has a low but real chance of causing us to invent a cure for malaria sooner, or having an extra person on ebola containment makes it less likely to become a pandemic.

For example someone involved might believe:

i) There's a 10% chance of humanity creating a 'superintelligence' within the next 100 years.

ii) There's a 30% chance that the problem can be solved if we work on it harder and earlier.

iii) A research team of five suitable people starting work on safety today and continuing through their working lives would raise the odds of solving the problem by 1% of that (0.3 percentage points). (This passes a sanity check, as they would represent a 20% increase in the effort being made today.)

iv) Collectively they therefore have a 0.03% chance of making an AI significantly more aligned with human values in the next 100 years, such that individual person involved has a 0.006 percentage point share.

Note that the case presented here has nothing to do with there being some enormous and arbitrary value available if you succeed, which is central to the weirdness of the Pascal's Mugging case.

Do you think the numbers in this calculation are way over-optimistic? OK - that's completely reasonable!

Do you think we can't predict whether the sign of the work we do now is positive or negative? Is it better to wait and work on the problem later? There are strong arguments for that as well!

But those are the arguments that should be made and substantiated with evidence and analysis, not quick dismissals that people are falling for a 'Pascal's Mugging', which they mostly are not.

Given the beliefs of this person, this is no more a Pascal's Mugging than working on any basic science research, or campaigning for an outsider political campaign, or trying to reform a political institution. These all have unknown but probably very low chances of making a breakthrough, but could nevertheless be completely reasonable things to try to do.

Here's a similar thing I wrote years ago: If elections aren't a Pascal's Mugging, existential risk work shouldn't be either.


As far as I can see all of these are open possibilities:

1) Solving the AI safety problem will turn out to be unnecessary, and our fears today are founded on misunderstandings about the problem.

2) Solving the AI safety problem will turn out to be relative straightforward on the timeline available.

3) It will be a close call whether we manage to solve it in time - it will depend on how hard we work and when we start.

4) Solving the AI safety problem is almost impossible and we would have to be extremely lucky to do so before creating a super-intelligent machine. We are therefore probably screwed.

We collectively haven't put enough focussed work into the problem yet to have a good idea where we stand. But that's hardly a compelling reason to assume 1), 2) or 4) and not work on it now.

Sorted by Click to highlight new comments since:

It might not be a strong response to the whole cause area, but isn't it the only response to the Bostrom-style arguments linked below? Which in my experience covers the majority of the arguments I hear in favour of x-risk.

Very few one line arguments are strong responses to whole world views that smart people actually believe, so I sort of feel like there's nothing to see here.

I asked Bostrom about this and he said he never even made this argument in this way to the journalist. Given my experience the the media misrepresenting everything you say and wanting to put sexy ideas into their pieces, I believe him.

The New Yorker writer got it straight out of this paper of Bostrom's (paragraph starting "Even if we use the most conservative of these estimates"). I've seen a couple of people report that Bostrom made a similar argument at EA Global.

Look, no doubt the argument has been made by people in the past, including Bostrom who wrote it up for consideration as a counterargument. I do think the 'astronomical waste' argument should be considered, and it's far from obvious that 'this is a Pascal's Mugging' is enough to overcome its strength.

But it's also not the main reason, only reason, or best reason, many people who work on these problems could ground their choice to do so.

So if you dismiss this argument, before you dismiss the work, move on to look at what you think is the strongest argument, not the weakest.

I actually think there's an appropriate sense in which it is the strongest argument -- not in that it's the most robust, but in that it has the strongest implications. I think this is why it gets brought up (and that it's appropriate to do so).

Agreed - despite being counterintuitive, it's not obviously a flawed argument.

If I were debating you on the topic, it would be wrong to say that you think it's a Pascal's mugging. But I read your post as being a commentary on the broader public debate over AI risk research, trying to shift it away from "tiny probability of gigantic benefit" in the way that you (and others) have tried to shift perceptions of EA as a whole or the focus of 80k. And in that broader debate, Bostrom gets cited repeatedly as the respectable, mainstream academic who puts the subject on a solid intellectual footing.

(This is in contrast to MIRI, which as SIAI was utterly woeful and which in its current incarnation still didn't look like a research institute worthy of the name when I last checked in during the great Tumblr debate of 2014; maybe they're better now, I don't know.)

In that context, you'll have to keep politely telling people that you think the case is stronger than the position your most prominent academic supporter argues from, because the "Pascal's mugging" thing isn't going to disappear from the public debate.

I have no opinion on what Bostrom did or didn't say, to be clear. I've never even spoken to him. Which is why I said 'Bostrom-style'. But I have heard this argument, in person, from many of the AI risk advocates I've spoken to.

Look, any group in any area can present a primary argument X, be met by (narrow) counterargument Y, and then say 'but Y doesn't answer our other arguments A, B, C!'. I can understand why that sequence might be frustrating if you believe A, B, C and don't personally put much weight on X, but I just feel like that's not an interesting interaction.

It seems like Rob is arguing against people using Y (the Pascal's Mugging analogy) as a general argument against working on AI safety, rather than as a narrow response to X.

Presumably we can all agree with him on that. But I'm just not sure I've seen people do this. Rob, I guess you have?

I get what you're saying, but, e.g., in the recent profile of Nick Bostrom in the New Yorker:

No matter how improbable extinction may be, Bostrom argues, its consequences are near-infinitely bad; thus, even the tiniest step toward reducing the chance that it will happen is near-­infinitely valuable. At times, he uses arithmetical sketches to illustrate this point. Imagining one of his utopian scenarios—trillions of digital minds thriving across the cosmos—he reasons that, if there is even a one-per-cent chance of this happening, the expected value of reducing an existential threat by a billionth of a billionth of one per cent would be worth a hundred billion times the value of a billion present-day lives. Put more simply: he believes that his work could dwarf the moral importance of anything else.

While the most prominent advocate in the respectable-academic part of that side of the debate is making Pascal-like arguments, there's going to be some pushback about Pascal's mugging.

I've also seen Eliezer (the person who came up with the term Pascal's mugging) give talks where he explicitly disavows this argument.

Two things:

i) I bet Bostrom thinks the odds of a collective AI safety effort of achieving its goal is better than 1%, which would itself be enough to avoid the Pascal's Mugging situation.

ii) This is a fallback position from which you can defend the work if someone thinks it almost certainly won't work. I don't think we should do that, instead we should argue that we can likely solve the problem. But I see the temptation.

iii) I don't think it's clear you should always reject a Pascal's Mugging (or if you should, it may only be because there are more promising options for enormous returns than giving it to the mugger).

Yes! Thank you for this. Pascal's Muggings have to posit paranormal/supernatural mechanisms to work. But x-risk isn't like that. Big difference which people seem to overlook. And Pascal's Muggings involve many orders of magnitude smaller chances than even the most pessimistic x-risk outlooks.

I agree with your second point but not your first. Also it's possible you mean "optimistic" in your second point: if x-risks themselves are very small, that's one way for the change in probability as a result of our actions to be very small.

I mean pessimism about the importance of x-risk research, which is more or less equivalent to optimism about the future of humanity. Similar idea.

Btw., this article series of yours convinced me of the importance of AI safety work. Thank you and good work!

Curated and popular this week
Relevant opportunities