# St. Petersburg Demon – a thought experiment that makes me doubt Longtermism

by wuschel2 min read23 May 2022 39

# 48

Epistemic status: Just a thought that I have, nothing too rigorous

The reason Longtermism is so enticing (to me at least), is that the existence of so many future life hangs in the balance right now. It just seems to be a pretty good deed to me, to bring 10^52 people (or whatever the real number will turn out to be) into existence.

This hinges on the belief that Utility scales linearly with the number of QUALYs, so that twice as many people are also twice as morally valuable. My belief in this was recently shaken by this thought experiment:

***

You are a traveling EA on a trip to St. Petersburg. In a dark alley, you meet a Demon with the ability to create Universes and a serious gambling addiction. He says, he was about to create a universe with 10 happy people. But he gives you three fair dice and offers you a bet: You can throw the three dice and if they all come up 6, he refrains from creating a universe. If you roll anything else, he will double the number of people in the universe he will create.

You do the expected value calculation and figure out, that by throwing the dice you will create 696,8 QUALYs in expectation. You take the bet and congratulate yourself on your ethical decision.

After the good deed is done, and the demon has now committed to creating 20 happy people, he offers you the same bet again. Roll the 3 dice: he won't create a universe at 6,6,6 and doubles it at anything else. The demon tells you that he will offer you the same bet repeatedly. You do your calculations and throw the dice again and again, until, eventually, you throw all sixes, and the demon vanishes, without having to create any universe, in a cloud of sulfury mist and leaves you wondering if you should have done anything differently.

***

There are a few ways to weasel out of the demon's bet. You could say, that the strategy “always take the demons bet” has an expected value of 0 QUALYs, and so you should go with some tactic like “Take the first 20 bets, then call it a day”. But I think if you refuse a bet, you should be able to reject this bet without taking into account what bets you have taken in the past or are still taking in the future.

I think the only consistent way to refuse the Demons bets at some point is to have a bounded utility function. You might think it would be enough to have a utility function that does not scale linearly with the number of QUALYs, but logarithmically or something. But in that case, the demon can offer to double the amount of utility, instead of doubling the amount of QUALYs, and we are back in the paradox. At some point, you have to be able to say: “There is no possible universe that is twice as good as the one, you have promised me already”. So at some point, adding more happy people to the universe must have a negligible ethical effect. And once we accept that that must happen at some point, how confident are we, that 10^52 people are that much better than 8billion?

Overall I am still pretty confused about this subject and would love to hear more arguments/perspectives.

# 48

Sorted by Click to highlight new comments since:

Expected utility maximization with unbounded utility functions is in principle vulnerable to Dutch books/money pumps, too, with a pair of binary choices in sequence, but who have infinitely many possible outcomes. See https://www.lesswrong.com/posts/gJxHRxnuFudzBFPuu/better-impossibility-result-for-unbounded-utilities?commentId=hrsLNxxhsXGRH9SRx

To me it seems the main concern is with using expected value maximization, not with longtermism. Rather than being rationally required to take an action with the highest expected value, I think you are probably only rationally required not to take any action resulting in a world that is worse than an alternative at every percentile of the probability distribution. So in this case you would not have to take the bet because at the 0.1st percentile of the probability distribution taking the bet has a lower value than status quo, while at the 99th percentile it has a higher value.

In practice, this still ends up looking approximately like expected value maximization for most EA decisions because of the huge background uncertainty about what the world will look like. (My current understanding is that you can think of this as an extended version of "if everyone in EA took risky high EV options, then the aggregate result will pretty consistently/with low risk be near the total expected value")

See this episode of the 80,000 hours podcast for a good description of this "stochastic dominance" framework: https://80000hours.org/podcast/episodes/christian-tarsney-future-bias-fanaticism/.

(Note: I've made several important additions to this comment within the first ~30 minutes of posting it, plus some more minor edits after.)

I think this is an important point, so I've given you a strong upvote. Still, I think total utilitarians aren't rationally required to endorse EV maximization or longtermism, even approximately except under certain other assumptions.

Tarsney has also written that stochastic dominance doesn't lead to EV maximization or longtermism under total utilitarianism, if the probabilities (probability differences) are low enough, and has said it's plausible the probabilities are in fact that low (not that he said it's his best guess they're that low). See "The epistemic challenge to longtermism", and especially footnote 41.

It's also not clear to me that we shouldn't just ignore background noise that's unaffected by our actions or generally balance other concerns against stochastic dominance, like risk aversion or ambiguity aversion, particularly with respect to the difference one makes, as discussed in "The case for strong longtermism" by Greaves and MacAskill in section 7.5. Greaves and MacAskill do argue that ambiguity aversion with respect to the outcomes doesn't point against existential risk reduction, and if I recall correctly from following citations, that ambiguity aversion with respect to the difference one makes is too agent-relative.

On the other hand, using your own precise subjective probabilities to define rational requirement seems pretty agent-relative to me, too. Surely, if the correct ethics is fully agent-neutral, you should be required to do what actually maximizes value among available options, regardless of your own particular beliefs about what's best. Or, at least, precise subjective probabilities seem hard to defend as agent-neutral, when different rational agents could have different beliefs even with access to the same information, due to different priors or because they weigh evidence differently.

Plus, without separability (ignoring what's unaffected) in the first place, the case for utilitarianism itself seems much weaker, since the representation theorems that imply utilitarianism, like Harsanyi's (and generalization here) and the deterministic ones like the one here, require separability or something similar.

FWIW, stochastic dominance is a bit stronger than you write here, since you can allow A to strictly beat B at only some quantiles, but equality at the rest, and then A dominates B.

Kei
1y20

Relevant: The von Neumann-Morgenstern utility theorem shows that under certain reasonable seeming  axioms, a rational agent should act as to maximize expected value of their value function: https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem

There have of course been arguments people have raised against some of the axioms - I think most commonly people argue against axioms 3 and 4 from the link.

Thank you for pointing me to that and getting me to think critically about it. I think  I agree with all the axioms.

a rational agent should act as to maximize expected value of their value function

I think this is misleading. The VNM theorem only says that there exists a function  such that a rational agent's actions maximize . But  does not have to be "their value function."

Consider a scenario in which there are 3 possible outcomes:  = enormous suffering,  = neutral, = mild joy. Let's say my value function is , and , in the intuitive sense of the word "value."

When I work through the proof you sent in this example, I am forced to prefer  for some probability , but this probability does not have to be 0.1, so I don't have to maximize my expected value. In reality, I would be "risk averse" and assign  or something. See 4.1Automatic consideration of risk aversion.

More details of how I filled in the proof:

We normalize my value function so  and . Then we define .

Let , then , and I am indifferent between  and. However, nowhere did I specify what  is, so "there exists a function u such that I'm maximizing the expectation of it" is not that meaningful, because it does not have to align with the value I assign to the event.

I think you are probably only rationally required not to take any action resulting in a world that is worse than an alternative at every percentile of the probability distribution

I think this is probably wrong, and I view stochastic dominance as a backup decision rule, not as a total replacement for expected value. Some thoughts here.

Why try to maximize EV at all, though?

I think Dutch book/money pump arguments require you to rank unrealistic hypotheticals (e.g. where your subjective probabilities in, say, extinction risk are predictably manipulated by an adversary), and the laws of large numbers and central limit theorems can have limited applicability, if there are too few statistically independent outcomes.

Even much of our uncertainty should be correlated across agents in a multiverse, e.g. uncertainty about logical implications, facts or tendencies about the world. We can condition on some of those uncertain possibilities separately, apply the LLN or CLT to each across the multiverse, and then aggregate over the conditions, but I'm not convinced this works out to give you EV maximization.

Thanks for writing this up!

I'm not sure about the implications, but I just want to register that deciding to roll repeatedly, after each roll for a total of n rolls, is not the same as committing to n rolls at the beginning. The latter is equivalent in expected value to rolling every trial at the same time: the former has a much higher expected value. It is still positive, though.

Emrik
10mo10

The cumulative EV of  decisions to roll repeatedly is

(where  is the initial utility of 10, and  stays constant at )

whereas the EV of committing  to roll up to  times is

Which is much-much lower than , as you point out.

But then again, for larger values of , you're very unlikely to be allowed to roll for  times. The EV of  decisions to roll () times the probability of getting to the th roll (), is

In other words,  collapses to  if you don't assume any special luck. Which is to say that committing to a strategy has the same EV ex ante as fumbling into the same path unknowingly. This isn't very surprising, I suppose, but the relevancy is that if you have a revealed tendency  to accept one-off St. Petersburg Paradox bets, that tendency has the same expected utility as deliberately committing to accept the same number of SPPs. If the former seems higher, then that's because your expectancy is wrong.

More generally, this means that it's important to try to evaluate one-off decisions as clues to what revealed decision rules you have. When you consider making a one-off decision, and that decision seems better than deliberately committing to using the decision-rules that spawned it, for all the times you expect to be in similar situation, then you are fooling yourself and you should update.

If you can predict that the cumulatively sum of the EV you assign to each one-off decision individually as you go along will be higher, compared to the EV you'd assign ex ante to the same string of decisions, then something has gone wrong in one your one-off predictions and you should update.

I've been puzzling over this comment from time to time, and this has been helpfwly clarifying for me, thank you. I've long been operating like this, but never entirely grokked why as clearly as now.

"You never make decisions, you only ever decide between strategies."

Similar issues come up in poker - if you bet everything you have on one bet, you tend to lose everything too fast, even if that one bet considered alone was positive EV.

I think you have to consider expected value an approximation. There is some real, ideal morality out there, and we imperfect people have not found it yet. But, like Newtonian physics, we have a pretty good approximation. Expected value of utility.

Yeah, in thought experiments with 10^52 things, it sometimes seems to break down. Just like Newtonian physics breaks down when analyzing a black hole. Nevertheless, expected value is the best tool we have for analyzing moral outcomes.

Maybe we want to be maximizing log(x) heee, or maybe that’s just an epicycle and someone will figure out a better moral theory. Either way, the logical principle that a human life in ten years shouldn’t be worth less than a human life today seems like a plausible foundational principle.

Nevertheless, expected value is the best tool we have for analyzing moral outcomes

Expected value is only one parameter of the (consequentialist) evaluation of an action. There are more, e.g. risk minimisation.

It would be a massive understatement to say that not all philosophical or ethical theories so far boil down to "maximise the expected value of your actions".

This case (with our own universe, not a new one) appears in a Tyler Cowen interview of Sam Bankman-Fried:

COWEN: Should a Benthamite be risk-neutral with regard to social welfare?

BANKMAN-FRIED: Yes, that I feel very strongly about.

COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?

BANKMAN-FRIED: With one caveat. Let me give the caveat first, just to be a party pooper, which is, I’m assuming these are noninteracting universes. Is that right? Because to the extent they’re in the same universe, then maybe duplicating doesn’t actually double the value because maybe they would have colonized the other one anyway, eventually.

COWEN: But holding all that constant, you’re actually getting two Earths, but you’re risking a 49 percent chance of it all disappearing.

BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.

COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?

BANKMAN-FRIED: Well, not necessarily. Maybe you St. Petersburg paradox into an enormously valuable existence. That’s the other option.

COWEN: Are there implications of Benthamite utilitarianism where you yourself feel like that can’t be right; you’re not willing to accept them? What are those limits, if any?

BANKMAN-FRIED: I’m not going to quite give you a limit because my answer is somewhere between “I don’t believe them” and “if I did, I would want to have a long, hard look at myself.” But I will give you something a little weaker than that, which is an area where I think things get really wacky and weird and hard to think about, and it’s not clear what the right framework is, which is infinity.

All this math works really nicely as long as all the numbers are finite. As soon as you say, “What are the odds that there’s a way to be infinitely happy? What if infinite utility is a possibility?” You can figure out what that would do to expected values. Now, all of a sudden, we’re comparing hierarchies of infinity. Linearity breaks down a little bit here. Adding two things together doesn’t work so well. A lot of really nasty things happen when you go to infinite numbers from an expected-value point of view.

There are some people who have thought about this. To my knowledge, no one has thought about this and come away feeling good about where they ended. People generally think about this and come away feeling more confused.

Daniel Kokotajlo has a great sequence on the topic. I think the second post is going to be most relevant.

In my mind that’s no more a challenge to longtermism than general relativity (or the apparent position of stars around the sun during an eclipse) was a challenge to physics. But everyone seems to have their own subtly different take on what longtermism is. 🤷

You could say, that the strategy “always take the demons bet” has an expected value of 0 QUALYs

• This is not true. The expected value of this strategy is undefined. [Edit: commenters reasonably disagree.]

• So maybe we want to keep our normal expected-utility-maximizing behavior for some nicely-behaved prospects, where "nicely-behaved" includes conditions like "expected values are defined," and accept that we might have to throw up our hands otherwise.

• That said, I agree that thought experiments like this give us at least some reason to disfavor simple expected-utility-maximizing, but I caution against the jump to expected-utility-maximizing is wrong since (in my opinion) other (non-nihilistic) theories are even worse. It's easy to criticize a particular theory; if you offered a particular alternative, there would be strong reasons against that theory.

• Regardless, note that you can have a bounded utility function and be a longtermist, depending on the specifics of the theory. And it's prima facie the case that (say) x-risk reduction is very good even if you only care about (say) the next billion years.

• (QUALY should be QALY)

The expected value of this strategy is undefined.

If you follow the strategy, won't you eventually get three sixes? Which means that any run through this procedure will end up with a value of zero, no?

Does the undefined come from the chance that you will never get three sixes, combined with the absolutely enormous utility of that extremely improbable eventuality?

Medium answer: observe that the value after winning n bets is at least 10×2^n, and the limit of (215/216)^n × 10×2^n as n→∞ is undefined (infinite). Or let ω be an infinite number and observe that the infinitesimal probability (215/216)^ω times the infinite value 10×2^ω is 10×(2×215/216)^ω which is infinite. (Note also that the EVs of the strategy "bet with probability p" goes to ∞ as p→1.)

Edit: hmm, in response to comments, I'd rephrase as follows.

Yes, the "always bet" strategy has value 0 with probability 1. But if a random variable is 0 with probability measure 1 and is undefined with probability measure 0, we can't just say it's identical to the zero random variable or that it has expected value zero (I think, happy to be corrected with a link to a math source). And while 'infinite bets' doesn't really make sense, if we have to think of 'the state after infinite bets' I think we could describe its value with the aforementioned random variable.

While you will never actually create a world with this strategy, I don't think the expected value is defined because 'after infinite bets' you could still be talking to the demon (with probability 0, but still possibly, and talking-with-the-demon-after-winning-infinitely-many-bets seems significant even with probability 0).

But if a random variable is 0 with probability measure 1 and is undefined with probability measure 0, we can't just say it's identical to the zero random variable or that it has expected value zero (I think, happy to be corrected with a link to a math source).

The definition of expected value is . If the set of discontinuities of a function has measure zero, then it is still Riemann integrable.  So the integral exists despite not being identical to the zero random variable, and the value is zero. In the general case you have to use measure theory, but I don't think it's needed here.

Also, there's no reason our intuitions about the goodness of the infinite sequence of bets has to match the expected value.

Saying that the expected value of this strategy is undefined seems like underselling it.  The expected value is positive infinity since the cumulative reward is increasing strictly faster than the cumulative probability of getting nothing.

sawyer
1y70

This might be a disagreement about whether or not it's appropriate to use "infinity" as a number (i.e. a value). Mathematically, if a function approaches infinity as the input approaches infinity, I think typically you're supposed say the limit is "undefined", as opposed to saying the limit is "infinity". So whether this is (a) underselling it or (b) just writing accurately depends on the audience.

I agree with you that the limit of the EV of "bet until you win n times" is infinite as n→∞. But I agree with Guy Raveh that we probably can't just take this limit and call it the EV of "always bet." Maybe it depends on what precise question we're asking...

(parent comment edited)

The expected value of this strategy is undefined

It looks to me like there's some confusion in the other comments regarding this. The expected value is, in fact, defined, and it is zero. The problem is that if you look at a sequence of n bets and take n to infinity, that expected value does go to positive infinity. So thinking in terms of adding one bet each time is actually deceiving.

In general, a sequence of pointwise converging random variables does not converge in expected value to the expected value of the limit variable. That requires uniform convergence.

Infinities sometimes break our intuitions. Luckily, our lives and the universe's "life" are both finite.

The expected value is, in fact, defined, and it is zero.

Is the random variable you're thinking of, whose expectation is zero, just the random variable that's uniformly zero? That doesn't seem to me to be the right way to describe the "bet" strategy; I would prefer to say the random variable is undefined. (But calling it zero certainly doesn't seem to be a crazy convention.)

It's zero on the event "three sixes are rolled at some point" and infinity on the event that they're never rolled. The probability of that second event is zero, though. So the expected value is zero.

Wouldn't the EV be 0?
My reasoning:
- The condition for the universe to be created is when you stop betting.
- Under this strategy, the only way where you stop betting is when you get three sixes.

So there is no plausible scenario where any amount of QALY is being created. Am I missing something?

Well, what value do we assign a scenario in which you're still talking with the demon? If those get value 0, then sure, the EV is 0. But calling those scenarios 0 seems problematic, I think.

I don't have a confident opinion about the implications to longtermism, but from a purely mathematical perspective, this is an example of the following fact: the EV of the limit of an infinite sequence of policies (say yes to all bets; EV=0) doesn't necessarily equal the limit of the EVs of each policy (no, yes no, yes yes no, ...; EV goes to infinity).

In fact, either or both quantities need not converge. Suppose that bet 1 is worth -$1, bet 2 is worth +$2, bet k is worth  and you must either accept all bets or reject all bets. The EV of rejecting all bets is zero. The limit of EV of accepting the first k bets is undefined. The EV of accepting all bets depends on the distribution of outcomes of each bet and might also diverge.

The intuition I get from this is that infinity is actually pretty weird. The idea that if you accept 1 bet, you should accept infinite identical bets should not necessarily be taken as an axiom.

Sometimes, but very few times, you keep taking that bet, having created 2^(80 000 hours × 60 mins / hour × 1 doubling per minute) = 2^4800000 happy people, until you die of old age, happy but befuddled to have achieved as much.

Coafos
1y90

You say the first throw has an expected value of 693,5 (=700•215/216 -700•1/216) QALY, but it is not precise. The first throw has has an expected value of 693,5 QALY if your policy is to stop after the first throw.

If you continue, then the QALY gained from these new people might decrease, because in the future there is a greater chance that this 10 new people disappear, therefore decreasing the value of creating them.

This proves too much.

You could use the same argument to prove that you don't actually care about money, or [money squared] or even literally utility.

In your post, replace the word "people" with "money" (or one of the other examples) and see what happens.

wdyt?

I agree. You could substitute "happy people" with anything.

But I don't think it proves too much. I don't want to have as much money as possible, I don't want to have as much ice cream as possible. There seems to be in general some amount that is enough.

With happy people/QALYs, it is a bit trickier. I sure am happy for anyone who leads a happy life, but I don't, on a gut level, think that a world with 10^50 happy people is worse than a world with 10^51 happy people.

Total Utilitarianism made me override my intuitions in the past and think that 10^51 people would be way better than 10^50, but this thought experiment made me less confident of that.

Though I do agree with the other commenters here, that just because total Utilitarianism breaks down in the extreme cases, that does not mean we have to doubt more common sense beliefs like "future people matter as much as current people".

What if the devil would offer you \$1, and the same "infinite" offer to double it [by an amount that would mean, to you personally, twice the utility. This might mean doubling it 10x] with a 90% probability [and a 10% probability that you get nothing].

Would you stop doubling at some point?

If so - would you say this proves that at some point you don't want any more utility?

My hint was: After reading this thought experiment I did not feel less confused.

(this is the closest post I have in mind)

Well, the marginal value of money does go to zero eventually, once you have enough money to buy everything,  hire everyone and take over the world. It's trickier with QALYs.

I think I have a decent solution for this interesting paradox.

First, imagine a bit of a revised scenario. Instead of waiting until you finish the bet, the demon first creates a universe and then adds people to it. In this case, I would claim the optimal strategy is betting forever.
I think the intuition here is similar to the quantum immortality thought experiment. In most universes, you will end up with 0 people created. Still, there will be one universe with infinitesimal probability where you will never get three sixes. You and the demon will forever sit throwing the dice while countless people will be living happily in this universe. And in terms of EV, it will beat stopping at any point.

But in the original formulation, this strategy has an EV of 0 - and that's not due to weaseling out, but because the only condition that will create a universe with any amount of people is when you stop betting, and this will only happen after you get three sixes. So the 0 EV is a trivial conclusion in this case.

The way out of this trap is to add a random stopping mechanism. Before every throw of the dice, you will roll a random number generator, and if you hit a very specific number (The odds to hit it will be infinitesimal, e.g., 1/(9^^^9)), you will stop betting. To maximize EV, you should use the lowest probability to stop you can practically generate while making sure it's > 0.

Whenever your expected value calculation relies on infinity—especially if it relies on the assumption that an infinite outcome will only occur when given infinite attempts—your calculation is going to end up screwy. In this case, though, an infinite outcome is impossible: as others have pointed out, the EV of infinitely taking the bet is 0.

Relatedly, I think that at some point moral uncertainty might kick in and save the day.

In David Deutsch's The Beginning of Infinity: Explanations That Transform the World  there is a chapter about infinity in which he discusses many aspects of infinity. He also talks about the hypothetical scenario that David Hilbert proposed of an infinity hotel with infinite guests, infinite rooms, etc. I don't know which parts of the hypothetical scenario are Hilbert's original idea and which are Deutsch's modifications/additions/etc.

In the hypothetical infinity hotel, to accommodate a train full of infinite passengers, all existing guests are asked to move to a room number that is double the number of their current room number. Therefore, all the odd numbered rooms will be available for the new guests. There are as many odd numbered rooms (infinity) as there are even numbered rooms (infinity).

If an infinite number of trains filled with infinite passengers arrive, all existing guests with room number n are given the following instructions: Move to room n*((n+1/2)).  The train passengers are given the following instructions: every nth passenger from mth train go to room number n+n^2+((n-m)/2). (I don't know if I wrote that equation correctly. I have the audio book and don't know how it is written.)

All of the hotel guests' trash will disappear into nowhere if the guests are given these instructions: Within a minute, bag up their trash and give it to the room that is one number higher than the number of their room. If a guest receives a bag of trash within that minute, then pass it on in the same manner within a half minute. If a guest receives a bag of trash within that half minute, then pass it on within the a quarter minute, and so on. Furthermore, if a guest accidentally put something of value to them in the trash, they will not be able to retrieve it after the two minutes. If they were somehow able to retrieve it, to account for the retrieval would involve explaining it with an infinite regress.

Some other things about infinity that he notes in the chapter:

It may be thought that the set of natural numbers involves nothing infinite. It merely involves a finite rule that brings you from one number to a higher number. However, if there is one natural number that is the largest,  then such a finite rule doesn't work (since it doesn't take you to a number higher than that number). If it doesn't exist, then the set of natural must be infinite.

To think of infinity, the intuition that a set of numbers has a highest number must be dropped.

According to Kant, there are countable infinities. The infinite points in a line or in all of space and time are not countable and do not have a one to one correspondence with the infinite set of natural numbers. However, in theory, the infinite set of natural numbers can be counted.

The set of all possible permutations that can be performed with an infinite set of natural numbers is uncountable.

Intuitive notions like average, typical, common, proportion, and rare don't apply to infinite sets. For example, it might be thought that proportion applies to an infinite set of natural numbers because you can say that there an equal number of odd and even numbers. However, if the set is rearranged so that, after 1, odd numbers appear after every 2 even numbers, the apparent proportion of odd and even numbers would look different.

Xeno noted that there are an infinite number of points between two points of space. Deutsch said Xeno is misapplying the idea of infinity. Motion is possible because it is consistent with physics. (I am not sure I completely followed what he said the mistake Xeno made here was.)

This post reminds me of Ord's mention in the The Precipice about the possibility of creating infinite value being a game changer.

[Parent comment deleted, so I integrated this comment into its grandparent]