Hide table of contents

TL;DR: People often use the thought experiment of flipping a coin, giving 50% chance of huge gain and 50% chance of losing everything, to say that maximizing utility is bad. But the real problem is that our intuitions on this topic are terrible, and there's no real paradox if you adopt the premise in full.

Epistemic status: confident, but too lazy to write out the math

There's a thought experiment that I've sometimes heard as a counterargument to strict utilitarianism. A god/alien/whatever offers to flip a coin. Heads, it slightly-more-than-doubles the expected utility in the world. Tails, it obliterates the universe. An expected-utility maximizer, the argument goes, keeps taking this bet until the universe goes poof. Bad deal.

People seem to love citing this thought experiment when talking about Sam Bankman-Fried. We should have known he was wrong in the head, critics sigh, when he said he'd bet the universe on a coinflip. They have a point; SBF apparently talked about this a lot, and it came up in his trial. I'm not fully convinced he understood the implications, and he certainly had a reckless and toxic attitude towards risk.

But today I'm here to argue that, despite his many, many flaws, SBF got this one right.

There is a lot of value in the universe

Suppose I'm a utilitarian. I value things like the easing of suffering and the flourishing of sapient creatures. Some mischievous all-powerful entity offers me the coinflip deal. On one side is the end of the world. On the other is "slightly more than double everything I value." What does that actually mean?

It turns out the world is pretty big. There is a lot of flourishing in it. There's also a lot of suffering, but I happen to arrange my preference-ordering such that the net utility of the world continuing to exist is extremely large. To make this coinflip an appealing trade, the Cosmic Flipper has to offer me something whose value is commensurate to that of the whole entire world and everyone in it, plus all the potential future value in humanity's light cone.

That's a big freaking deal.

The number of offers that weigh heavily enough on the other side of the scale is pretty darn small. "Double the number of people in the world" doesn't begin to come close; neither does "make everyone twice as happy." A more appropriate offer IMO might look more like "everyone becomes unaging, doesn't need to eat or drink except for fun, grows two standard deviations smarter and wiser, and is basically immune to suffering."

That's a bet I'd at least consider taking. Odds are, you might feel that way too. 

(If you don't, that's okay, but it means the Cosmic Flipper still isn't offering you enough. What would need to be on the table for you, personally, to actually consider wagering the fate of the universe on a coinflip? What would the Cosmic Flipper have to offer? How much better does the world have to be, in the "heads" case, that you would be tempted?)

Suppose I do take the bet, and get lucky. How do you double that? Now we're talking something on the order of "all animals everywhere also stop suffering" and I don't even know what else.

By the time we get to flipping the coin five, ten, or a hundred times, I literally can't even conceive of what sort of offer it would take to make a 50% chance of imploding utopia sound like a good price to pay. It's incredibly difficult to wrap our brains around what "doubling the value in the world" actually means. And that's just the tip of the iceberg.

We already court apocalypse 

The thought experiment gets even more complicated when you factor in existing risks.

If you buy the arguments about threats from artificial superintelligence - which I do, for the record - then our world most likely has only a few years or decades left before we're eaten by an unaligned machine. If you don't buy those arguments, there's still the 1 in 10,000 chance per year that we all nuke ourselves to death (or into the Stone Age), which is similar to the odds that you die this year in a car crash (if you're in the US). Even if humanity never invents another superweapon, there's still the chance that Earth gets hit by a meteor or Mother Nature slaughters our civilization with the next Black Death before we get our collective shit together.

What does it mean to "double the expected value of the universe" given the threat of possible extinction? I genuinely don't know. And we can't just say "well, holding x-risk constant..." because any change to the world that's big enough to double its expected utility is going to massively affect the odds of human extinction.

When it comes to thought experiments like this, we can't just rely on what first pops into our head when we hear the phrase "double expected value." For the bargain to make sense to a true expected-utility maximizer, it has to still sound like a good deal even after all these considerations are factored in.

Everything breaks down at infinity

OK, so maybe it's a good idea to flip the coin once or twice, or even many times. But if you take this bet an infinite number of times, then you're guaranteed to destroy the universe. Right?

Firstly, lots of math breaks down at infinity. Infinity is weird like that. I don't think there exists a value system that can't be tied in knots by some contrived thought experiment involving infinite regression, and even if there did, I doubt it would be one I wanted to endorse.

Secondly, and more importantly, I question whether it is possible even in theory to produce infinite expected value. At some point you've created every possible flourishing mind in every conceivable permutation of eudaimonia, satisfaction, and bliss, and the added value of another instance of any of them is basically nil. In reality I would expect to reach a point where the universe is so damn good that there is literally nothing the Cosmic Flipper could offer me that would be worth risking it all.

And given the nature of exponential growth, it probably wouldn't even take that many flips to get to "the universe is approximately perfect". Sounds like a pretty good deal.

Conclusion

The point I'm hoping to make is that this coinflip thought experiment suffers from a gap between the mathematical ideal of "maximizing the expected value in the universe" and our intuitions about it. 

On a more specific level, I wish people would stop saying "Of course SBF had a terrible understanding of risk, he took EV seriously!" as though SBF's primary failing was being a utilitarian, and not being reckless and hopelessly blinkered about the real-world consequences of his actions. 

Comments3


Sorted by Click to highlight new comments since:

I agree that there's been a phenomenon of people suddenly all agreeing that all of SBF's opinions were wrong post-FTX collapse. So I appreciate the effort to make the case for taking the deal, and to portray the choice as not completely obvious.

To the extent that you're hoping to save "maximizing utility via maximizing expected value," I think it's still an uphill battle. I like Beckstead and Thomas's "A paradox for tiny probabilities and enormous values" on this, which runs essentially the same thought experiment as "flip the coin many times," except with the coin weighted to 99.9% heads (and only your own life in play, not the universe). They point out that both positions, "timidity" and "recklessness", have implausible conclusions.

I'm ultimately quite philosophically troubled by this "concentrating all the value into narrow regions of probability space" feature of EV maximization as a result (but I don't have a better alternative on hand!). This makes me, in particular, not confident enough in EV-maximization to wager the universe on it. So while I'm more sympathetic than most to the position that the coin flip might be justifiable, I'm still pretty far from wanting to bite that bullet.

It's often in the nature of thought experiments to try to reduce complicated things to simple choices. In reality, humans rarely know enough to do an explicit EV calculation about a decision correctly. It can still be an ideal that can help guide our decisions, such that "this seems like a poor trade of EV" is a red flag the same way "oh, I notice I could be Dutch booked by this set of preferences" is a good way to notice there may be a flaw in our thinking somewhere. 

I take your claim in the post not to be "the fact that an offer is +EV is one strong reason to be in favor of it," but rather "you ought to take the cosmic coin flip, risking the universe, just because it is +EV." (Because being +EV definitionally means the good scenario is super unbelievably good, much better than most people considering the thought experiment are probably imagining.)

But even within the thought experiment, abstracting away all empirical uncertainties, I have enough philosophical uncertainty about EV maximization that I don't want to take the bet.

Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f