Hide table of contents

Introduction

This post seeks to estimate how much we should expect a highly cost-effective charity to spend on reducing existential risk by a certain amount. By setting a threshold for cost-effectiveness, we can be selective about which longtermist charities to recommend to donors.

We appreciate feedback. We would like for this post to be the first in a sequence about cost-effectiveness thresholds for giving, and your feedback will help us write better posts.

How many beings does extinction destroy?

This chart gives six estimates for the size of the moral universe that would be lost in an extinction event on Earth this century. There is a truly incredible range in the possible size of the moral universe, and the value you see in the future depends on the moral weights you put on different types of moral patients.

(Click this link to explore the value of the future. You can change the moral weights on humans, vertebrates, invertebrates, and beings of the future; the number of years you expect humans and animals to live without inevitable extinction; and the number of animals and humans you expect to exist at one time, from conservative to liberal estimates. The value at the bottom, T, shows you the total value you put on the future. You can then see how changing your moral weights radically influences the value you put on the future – and you can recalculate the cost-effectiveness thresholds below based on that.)

If the universe could have between 8 x 10^9 and 5 x 10^55 morally valuable beings, then a 0.01% absolute reduction in cumulative x-risk is “equivalent” (given several assumptions) to saving 8 x 10^5 to 5 x 10^51 lives. This is a reduction in risk of one basis point, or bp. Note that this is a much greater accomplishment than reducing per-century x-risk by one basis point, which we will discuss later.

How much should we pay to prevent this destruction?

Using near-termist thresholds as a starting point

Using the SoGive Gold Standard for cost per life saved in a near-termist framework (£5,000), this would mean a “good cost” for a 0.01% absolute reduction in the cumulative risk of extinction would be between £4 billion and £2.5 * 10^55. For context, all the money in the world is probably between £10^13 and £10^14.

Are either of these the thresholds we should be using? The cost-effectiveness of longtermist interventions is different from near-termist interventions:

  • The interventions we are analyzing are of a different character.
  • The evidence of their cost-effectiveness is more speculative.
  • It might be much cheaper, or much more expensive, to prevent the loss of future life than to save a person now. It would not be useful to set a threshold that ~all highly effective longtermist charities would be above or below, when the purpose of a threshold is to help us select the top tier of existing cost-effective interventions within a cause area.

Furthermore, we might never have enough evidence to say whether an intervention has reduced cumulative x-risk by a certain amount. It might be more manageable to set a threshold based on reduction in per-century x-risk.

Per-century risk

Epistemic status: Uncertain. Please leave feedback and help us improve the math here.

Briefly looking at this, let’s assume that humans will last some time into the future, and that those humans are our moral patients. (If we didn’t put value on future humans, then cumulative risk would be equal to the risk of extinction for present beings. This would give us a lower bound on a good cost for a 0.01% absolute reduction in per-century risk of £4 billion.)

Let’s also assume for now that we are not in a time of perils, so that we have a simpler relationship between cumulative risk and per-century risk. This will give us a lower bound for what we should spend on a reduction in per-century risk under our assumption above. (Given the chance we are in a time of perils, a “good cost” for a reduction in per-century risk this century would only be higher. If we are in an extreme time of perils where a reduction in per-century risk is about equivalent to a reduction in cumulative risk, then we already have our upper bound on a good cost for a 0.01% reduction in per-century risk – £2.5 * 10^55.)

According to David Thorstad, assuming a future of 1 billion years, an absolute reduction in cumulative x-risk of only 1/100,000,000 requires that we reduce per-century x-risk down from its current figure (estimated by Toby Ord at 1/6) at least down to below 16/10,000,000.[1]

Let’s try to use the same math. On the low end of the size of the future, suppose humans could exist for a maximum of 800,000 more years before inevitable extinction. Then, an absolute reduction in cumulative risk of 1/10,000 (1 bp) would require us to drive per-century risk at least down to below 11/10,000.

Bringing per-century risk from ⅙ to 11/10,000 is an absolute reduction of 1656 basis points. We might say that if the future is 800,000 years and there are 10^14 moral patients (low end) that can exist in that time, then we should pay 10^14 * 0.01% * £5000 / 1656 = £30.19 billion[2] to reduce per-century x-risk by 1 bp.

On the high end of the size of the future, suppose humans could exist for a maximum of 100 billion years before inevitable extinction. Then, an absolute reduction in cumulative risk of 1/10,000 (1 bp) would require us to drive per-century risk at least down to below 92/10,000,000,000.

Bringing per-century risk from ⅙ to 92/10,000,000,000 is an absolute reduction of 1667 basis points. We might say that if the future is 100 billion years and there are 5 x 10^55 moral patients (high end) that can exist in that time, then we should pay 5 * 10^55 * 0.01% * £5000 / 1667 = £1.50 × 10^52 to reduce per-century x-risk by 1 bp.

Clearly, this is a very difficult task in either situation. What does this mean for our threshold? Do we think that reducing cumulative x-risk by one basis point is worth a different amount, when we know that it means reducing per-century x-risk so extremely?

  • Should we tolerate a higher cost threshold for a more difficult task?
  • Should we reduce our tolerance for the cost, since the task may be less tractable?

Overall, our range for what we should spend on a reduction in a 0.01% absolute reduction in x-risk for this century is £4 billion, if you only put moral value on present humans; £30.19 billion, if you believe the future is small, the class of moral patients is small, and we are not in a time of perils; £1.50 × 10^52, if you predict a very large future and a large class of moral patients, and believe we are not in a time of perils; and £2.5 * 10^55, if you predict a very large future and a large class of moral patients, and believe we are in an extreme time of perils.

Using benchmarks for cost-effectiveness from current longtermist charities

We are currently seeking estimates of how much longtermist charities are reducing x-risk. These estimates can give us a benchmark for cost-effectiveness. Benchmarking is another method we can use to set a threshold. However, estimates we get from charities for their own cost-effectiveness are likely to be biased.

Using the estimates of others

Now let’s look at cost-effectiveness thresholds from others, taken from this post by Linch and this post by Vasco and their comments sections, which use USD. The table below is adapted and expanded from Vasco’s post.

Most commenters seem to have been thinking of an absolute reduction of one basis point this century (it’s unclear) – but one commenter, NunoSempere, gave a range for an absolute reduction by one basis point in the yearly x-risk for one century, and another range for an absolute reduction by one basis point in the x-risk this century. The commenters used a range of methods to answer the question, but their answers clustered around $1 billion.

Please note Linch’s disclaimer, which likely also applies to the estimates of others:

EDIT 2022/09/21: The 100M-1B estimates are relatively off-the-cuff and very not robust, I think there are good arguments to go higher or lower. I think the numbers aren't crazy, partially because others independently come to similar numbers (but some people I respect have different numbers). I don't think it's crazy to make decisions/defer roughly based on these numbers given limited time and attention. However, I'm worried about having too much secondary literature/large decisions based on my numbers, since it will likely result in information cascades. My current tentative guess as of 2022/09/21 is that there are more reasons to go higher (think averting x-risk is more expensive) than lower. However, overspending on marginal interventions is more -EV than underspending, which pushes us to bias towards conservatism.”

EstimatorHighly cost-effectiveMiddling (default column if not specified)Upper bound
Linch$100 million per bp$300 million per bp to $1 billion per bp$10 billion per bp
Ajeya Cotra“AI risk is something that we think has a currently higher cost effectiveness”

“$200 trillion per world saved”

Or $20 billion per bp

 
Oliver Habryka“I think we are currently funding projects that are definitely more cost-effective than that”“My very rough gut estimate says something like” $1 billion per bp“Probably more on the margin”
NunoSempere$99 billion per bp, lower bound assuming a “one-off 0.01% existential risk reduction over a century”$16 trillion per bp, lower bound per yearly reduction of 1bp … to $330T per bp, upper bound assuming a “one-off 0.01% existential risk reduction over a century”$3.8 x 10^15 per bp, upper bound per yearly reduction of 1bp
Anonymous person from Vasco’s post$1 trillion per existential catastrophe averted, or $100 million per bp$100 trillion per existential catastrophe averted, or $10 billion per bp 
Simon Skade“Note that I do think that there are even more effective ways to reduce x-risk, and in fact I suspect most things longtermist EA is currently funding have a higher expected x-risk reduction than 0.01% per 154M$.”$154 million per bp“I just don't think that it is likely that the 50 billionth 2021 dollar EA spends has a much higher effectiveness than 0.01% per 154M$, so I think we should grant everything that has a higher expected effectiveness.”
William Kiely$34 million per bp

“I thought it would be interesting to answer this using a wrong method (writing the bottom line first).”

$100 million per bp

$340 million per bp
Zach Stein-Perlman$25 million per bp$50 million per bp“Unlike Linch, I would be quite sad about trading $100M for a single measly basis point — $100M (granted reasonably well) would make a bigger difference, I think.” $100 million per bp
Median$100 million per bp$1 billion per bp$5.17 billion per bp


 

 

This suggests that a reasonable range for the cost of reducing absolute x-risk by 0.01% this century could be between USD $100 million and $5.17 billion, or £75 million and £3.88 billion.[3]

Earlier, we put a “good cost” of a 1 bp absolute reduction in per-century x-risk, based on SoGive’s near-termist Gold Standard, between £4 billion and £2.5 * 10^55, depending on how much you value the future what parts of the future you value, and your forecast for the future. Note that the estimates given in this table are all below that range.

Why is that?

  • Were these commenters expecting it to be much cheaper to save a life by preventing the loss of potential in an extinction, than to save a life using near-termist interventions?
  • Were they envisioning a relatively small future?
  • Were they discounting the future?

Probably, instead of these possibilities, the commenters were giving an implicit haircut to their estimates based on uncertainty.

What does this say for our threshold?

Conclusion

For near-termist interventions, when SoGive has high confidence that the intervention is effective, we use a Gold Standard. tentative threshold for a “SoGive Longtermist Gold Standard” could be £750 million / bp. This is the median estimate given by the commenters above.

Having a threshold for cost-effectiveness can help us decide whether to give now or give later. If all the available funding opportunities cost more than our threshold, then we can hold onto our donations until a more cost-effective opportunity comes along. We may come back to the nuanced relationship between thresholds and give now/give later in a future post.

There remains a practical problem with using a cost-effectiveness threshold for deciding where to give within longtermism. Most longtermist projects do not publish cost-effectiveness estimates – and if they did, they could be as flawed as misleading cost-effectiveness estimates from near-termist charities. For third-party evaluators, there is no clearly reliable method for estimating how much x-risk has been reduced by one project.

Thus, a threshold for longtermist cost-effectiveness could be better applied to the average cost-effectiveness of categories, e.g., “AI risk-reducing projects in Europe,” rather than to individual projects. (It is not clear that cost-effectiveness for projects varies less within geographic regions than across regions – this is just an example of a category of funding opportunities.) Analyzing cost-effectiveness over a larger category of funding opportunities, instead of per-intervention or per-project, could help us avoid false precision.

These estimates are not robust enough to make the most important decisions we face. We recommend conducting a survey of funders, charities, and experts to get a stronger picture of what the standard should be and the cost-effectiveness of different types of work.

 

  1. ^

    This represents a reduction from a cumulative risk of 1 to a cumulative risk of 0.9999. If we are starting with a cumulative risk below 1, then we need to reduce per-century risk somewhat more, but likely not orders of magnitude more.

  2. ^

     (number of moral patients) * (% saved in expectation given 0.01% reduction in cumulative risk) * (“good cost” to save a life) / (conversion from 0.01% reduction in cumulative risk to 0.01% reduction in per-century risk)

  3. ^

     Converting at an average exchange rate of 0.75.

Comments36
Sorted by Click to highlight new comments since: Today at 5:54 AM

This was a difficult post, and my first post for SoGive as the Lead Researcher & Philanthropy Advisor! I hope it can be useful to our discussions on cost-effectiveness.

I hope my uncertainty comes though. I haven't been thinking about the size of the future for a very long time, but I learned a lot from writing this. As I mentioned at the beginning, please leave feedback on my assumptions, math, and methods, so I can write better posts about thresholds in the future.

It might be a while, but I'd like to do some writing about cost-effectiveness thresholds for animal advocacy and multipliers as well. Feel free to leave your thoughts about those as a reply to this comment as well.

Interesting read, and a tricky topic! A few thoughts:

  1. What were the reasons for tentatively suggesting using the median estimate of the commenters, rather than being consistent with the SoGive neartermist threshold?
  2. One reason against using the very high-end of the range is the plausible existence of alien civilisations. If humanity goes extinct, but there are many other potential civilisations and we think they have similar moral value to humans, then preventing human extinction is less valuable.
    1. You could try using an adapted version of the Drake equation to estimate how many civilisations there might be (some of the parameters would have to be changed to take into account the different context, i.e. you're not just estimating current civilizations that could currently communicate with us in the Milky Way, but the number there could be in the Local Supercluster)
  3. I'm still not entirely sure what the purpose of the threshold would be.
    1. The most obvious reason is to compare longtermist causes with neartermist ones, to understanding the opportunity cost - in which case I think this threshold should be consistent with the other SoGive benchmarks/thresholds (i.e. what you did with your initial calculations).
      1. Indeed the lower end estimate (only valuing existing life) would be useful for donors who take a completely neartermist perspective, but who aren't set on supporting (e.g.) health and development charities
    2. If the aim is to be selective amongst longtermist causes so that you're not just funding all (or none) of them, then why not just donate to the most cost-effective causes (starting with the most cost-effective) until your funding runs out?
      1. I suppose this is where the giving now vs giving later point comes in. But in this case I'm not sure how you could try to set a threshold a priori
        1. It seems like you need some estimates of cost-effectiveness first. Then (e.g.) choose to fund the top x% of interventions in one year, and use this to inform the threshold in subsequent years. Depending on the apparent distribution of the initial cost-effectiveness estimates, you might decide 'actually, we think there are plenty of interventions out there that are better than all the ones we have seen so far, if only we search a little bit harder'
  4. Trying to incentivise more robust thinking around the cost-effectiveness of individual longtermist projects seems really valuable! I'd like to see more engagement by those working on such projects. Perhaps SoGive can help enable such engagement :)

Thanks Matt!

  1. My estimate was just one estimate. I could have included it in the table but when I did the table it seemed like such an outlier, and done with a totally different method as well, perhaps useful for a different purpose... It might be worth adding it into the table? Not sure.
  2. Interesting consideration! If we expect humanity to at one point technologize the LS, and extinction prevents that, don't we still lose all those lives? It would not eradicate all life if there were aliens, but still the same amount of life in total. (I'm not endorsing any one prediction for how large the future will be.) My formulas here don't quantify how much worse it is to lose 100% of life than 99% of life.
  3. Sure, you could set your threshold differently depending on your purpose. I could have made this clearer!
    1. Exactly as you say, comparing across cause areas, you might want to keep the cost you're willing to pay for an outcome (a life) consistent.
    2. If you've decided on a worldview diversification strategy that gives you separate buckets for different cause areas (e.g. by credence instead of by stakes), then you'd want to set your threshold separately for different cause areas, and use each threshold to compare within a cause area. If you set a threshold for what you're willing to pay for a life within longtermist interventions, and fewer funding opportunities live up to that compared to the amount of money you have available, you can save some of your money in that bucket and donate it later, in the hopes that new opportunities that meet your threshold can arise. For an example of giving later based on a threshold, Open Philanthropy wants to give money each year to projects that are more cost-effective than what they will spend their "last dollar" on.
  4. Thanks, me too!

Re 2 - ah yeah, I was assuming that at least one alien civilisation would aim to 'technologize the Local Supercluster' if humans didn't. If they all just decided to stick to their own solar system or not spread sentience/digital minds, then of course that would be a loss of experiences.

Thanks for clarifying 1 and 3!

Great that SoGive is setting these benchmarks. I'm of the opinion that we are very much in a time of perils, and indeed the entire future may well hinge on the next few years (or less), with regard to AI x-risk. I also think that work aimed at slowing down AI is woefully neglected. Right now, I think such work (examples) could easily be as cost effective as $1M per bp of x-risk reduction (this century at least).

Note also that if timelines are short, then the cost effectiveness of saving lives via traditional neartermist interventions decreases. IIRC, GiveWell equates a "life saved" with ~40 years extra life (or 40 QALYs). So if we only have 5 years left before extinction, then the cost effectiveness estimates worsen by a factor of 8 (40/5). So instead of £5,000 to save a life, neartermist interventions would need to clear a bar of £625 to save a life. Although I think at this point we might be better off just thinking in terms of £/QALY. And interventions aimed at averting severe depression ("£200 to avert one year of severe depression") or chicken suffering ("£5 to avert the suffering of one chicken who is living in very poor conditions") seem more promising.

Interesting point about how any extinction timelines less than the length of a human life change the thresholds we should be using for neartermism as well! Thank you, Greg. I'll read what you linked.

Also, for this reason I think that x-risk reduction should very much not be lumped in with longtermism. It is, in fact, very near term, now.

A couple of nitpicks on the calculations for the size of the moral universe:

  • The the average span of a mammalian species seems somewhat a spurious metric to use for humanity, given that we have already largely nullified the natural environmental and evolutionary drivers of such a bound (via technology and civilisation[1]).
  • perhaps the length of the Stelliferous Era (~100T years) is a more natural bound than the lifetime of the Sun (~5B years). It seems unlikely that we would colonise the solar system and last for billions of years without successfully crossing interstellar space.
  • Open Individualism suggests that we should be counting person-years (or animal-years) rather than individual animals. Intuitively, this seems more natural especially for animals with lower complexity brains who are unlikely to have much of a sense of self (even if they are sentient and can suffer).
  1. ^

    This is not to say that we won't go extinct from artificial environmental and evolutionary causes of our own making. Just that bounding to a number (1M years) that originates from natural (nonanthropogenic) processes seems highly arbitrary.

Directionally, I agree with your points. On the last one, I'll note that counting person-years (or animal-years) falls naturally out of empty individualism as well as open individualism, and so the point goes through under the (substantively) weaker claim of “either open or empty individualism is true”.[1]

(You may be interested in David Pearce's take on closed, empty, and open individualism.)

  1. ^

    For the casual reader: The three candidate theories of personal identity are empty, open, and closed individualism. Closed is the common sense view, but most people who have thought seriously about personal identity—e.g., Parfit—have concluded that it must be false (tl;dr: because nothing, not memory in particular, can “carry” identity in the way that's needed for closed individualism to make sense). Of the remaining two candidates, open appears to be the fringe view—supporters include Kolak, Johnson, Vinding, and Gomez-Emilsson (although Kolak's response to Cornwall makes it unclear to what extent he is indeed a supporter). Proponents of (what we now call) empty individualism include Parfit, Nozick, Shoemaker, and Hume.

Agree. I find Empty Individualism pretty depressing to think about though. And Open Individualism seems more natural, from (my) subjective experience. 

If we're looking at upper bounds, even the Stelliferous Era is highly conservative. The Black Hole era could last up to 10<sup>100</sup> years https://en.wikipedia.org/wiki/The_Five_Ages_of_the_Universe and it's at least conceivable under known physics that we could farm their rotational energy or still more speculatively their Hawking radiation

Usual Pascalian reasoning applies in that this would allow such a ridiculously large number of person-years that even with an implausibly low credence in its possibility the expectation dwarfs the whole stellar era.

The problem with this position is that the Black Hole Era—at least, the way the “Five Ages of the Universe” article you link to defines it—only starts after proton decay has run to (effective) completion,[1] which means that all matter will be in black holes, which means that conscious beings will not exist to farm black holes for their energy. (If do, however, agree that life is in theory not dependent on luminous stars, and so life could continue beyond the Stelliferous Era and into the Degenerate Era, which adds many years.)

  1. ^

    Whether proton decay will actually happen is still a major open question in physics. See, for example, Hadhazy (2021) or Siegel (2020).

    (Additionally, if proton decay does happen, there's then the question of “could a technologically mature civilization stop proton decay?”. My money would be on “no”, but of course our current understanding of particle decay physics could be incorrect, or an advanced civilization might find an ingenious workaround.)

As you say, whether proton decay will happen seems to be an open question. If you're feeling highly confident you could knock off another couple of zeroes to represent that credence and still end up with a number that eclipses everything else.

Hi Spencer,

You may be interested in Brian Tomasik's analysis on How the Simulation Argument Dampens Future Fanaticism. I think its essence is well captured in this short comment by Pablo Stafforini:

According to the simulation argument, either (1) humanity will soon become extinct; (2) posthumanity will never run ancestor simulations; or (3) we are almost certainly living in a simulation. Suppose (1) is true. Then the classical utilitarian case for focusing on existential risk reduction loses much of its force, since we are by assumption doomed to perish quickly anyway. Now suppose (3) is true. Here it seems plausible that the simulators will restart the simulation very quickly after the sims manage to kill themselves. So the case for focusing on existential risk is also weakened considerably. It is only on the second of the three scenarios that extinction is (roughly) as bad as classical utilitarians take it to be. So we can conclude: if you think there is a chance that posthumanity will run ancestor simulations (~2), the prospect of human extinction is much less serious than you thought it was.

If I recall correctly, the argument goes more or less as follows. The larger the future, the greater the likelihood of us being in a short-lived simulation, thus having negligible influence in the far future. The 2 effects cancel out, and therefore the ratio between far future value and near future value does not depend on the size of the future. The ratio is roughly inversely proportional to the fraction of resources going towards simulations, i.e. 2 times as much resources going to simulations means the far future is half as valuable relative to the near term future.

The possibility of us living in a short-lived simulation isn't enough to count much against longtermism, because it's also possible we could live in a long-lived simulation or a long-lived world, and those possibilities will be much higher stakes, so still dominate expected value calculations unless we assign them tiny probability together.

I think the argument crucially depends on the assumption that simulations will be disproportionately short-lived, and we have acausal influence over agents in other simulations. If for each long-running world (simulated or otherwise) with moral agents and moral patients, there are N short-lived worlds with (moral) agents and moral patients, and our actions are correlated with those of agents across worlds, then we get to decide for more agents in in short-lived worlds than long-lived ones. Basically, acausal influence will boost the expected value of all interventions, but if moral patients are disproportionately in short-lived simulations with agents whose decisions we're correlated with relative to long-run simulations with agents whose decisions we're correlated with (or more skewed towards the short-lived than it seems for our own world), acausal influence will disproportionately boost the expected value of neartermist interventions relative to longtermist ones.

Also, ~all of the expected value will be acausal if we fully count the value of acausal influence, based on the evidentialist's wager and similar, given the possibility of very large or even infinite numbers of agents with whom we're correlated.

Thanks for clarifying, Michael!

I think the argument crucially depends on the assumption that simulations will be disproportionately short-lived

Yes, the argument depends on Brian's parameter F not being super small. F is "fraction of all computational sent-years spent non-solipsishly simulating almost-space-colonizing ancestral planets (both the most intelligent and also less intelligent creatures on those planets)". "A non-solipsish simulation is one in which most or all of the people and animals who seem to exist on Earth are actually being simulated to a non-trivial level of detail". Brian guessed F = 10^-6, but it feels like it should be much smaller to me. If the value of the future is e.g. 10^30 times the value of this century, it is maybe reasonable to assume that the vast vast majority of computational sent-years are also simulations of the far future, as opposed to simulations of almost-space-colonizing ancestral planets.

I find this argument unconvincing. The vast majority of 'simulations' humans run are very unlike our actual history. The modal simulated entity to date is probably an NPC from World of Warcraft, a zergling from Starcraft or similar. This makes it incredibly speculative to imagine what our supposed simulators might be like, what resources they might have available and what their motivations might be.

Also the vast majority of 'simulations' focus on 'exciting' moments - pitched Team Fortress battles, epic RPG narratives, or at least active interaction with the simulators. If you and your workmates are just tapping away in your office on your keyboard doing theoretical existential risk research, the probability that someone like us has spent their precious resources to (re)create you seem radically lowered than if you're (say) fighting a pitched battle.

Thanks for commenting!

I find this argument unconvincing. The vast majority of 'simulations' humans run are very unlike our actual history. The modal simulated entity to date is probably an NPC from World of Warcraft, a zergling from Starcraft or similar. This makes it incredibly speculative to imagine what our supposed simulators might be like, what resources they might have available and what their motivations might be.

Agreed, but could you explain why that would be an objection to Brian's argument?

Also the vast majority of 'simulations' focus on 'exciting' moments - pitched Team Fortress battles, epic RPG narratives, or at least active interaction with the simulators. If you and your workmates are just tapping away in your office on your keyboard doing theoretical existential risk research, the probability that someone like us has spent their precious resources to (re)create you seem radically lowered than if you're (say) fighting a pitched battle.

I do not know, because I agree with your 1st paragraph about it being quite hard to predict future simulated entities based on past history.

I mainly had in mind Pablo's summary. It's been a long time since I read Brian's essay, and I don't have bandwidth to review it now, so if he says something substantially different there, my argument might not apply. But basically every argument I remember hearing about how the simulation argument implies we should modify our behaviour presupposes that we have some level of inferential knowledge of our simulators (this presupposition being hidden in the assumption that simulations would be primarily ancestor simulations). This presupposition seems basically false to me, because, for example:

a. A zergling would struggle to gain much inferential knowledge of its simulators' motivations.

b. A zergling looking around at the scope and complexity of its universe would typically observe that it itself is 2-dimensional (albeit with some quasi-3D properties), and is made from approx 38x94 'atoms'. Perhaps more advanced simulations would both be more numerous (and hence a higher proportion of simulationspace) and more complex, but it still seems hard to imagine they'll average to anything like the same level of complexity as we see in our universe, or have a consistent difference from it.

c. If the simulation argument is correct for a single layer of reality, it seems (to the degree permitted by a and b) far more likely that it's correct for multiple, perhaps vast numbers of layers of reality (insert 'spawn more Overlords' joke here). Thus the people whose decisions and motivations a zergling is trying to ultimately guess at is not ours, but someone whose distance from us is approx , where n is the number of layers. It's hard to imagine the zergling - or us - could make any intelligible assumptions at all about them at that level of removal. 

To show this in Pablo's argument:

Now suppose (3) is true. Here it seems plausible that the simulators will restart the simulation very quickly after the sims manage to kill themselves. 

For this to be 'plausible' is to assert that we know our simulators' motivations well enough to know that whatever they hoped to gain by running us will 'plausibly' be motivating enough for them to do it a second time in much the same form, and that their simulators will at least permit it, and so on. 

Another version of the anti-x-risk argument from simulation I've heard (and which I confess with hindsight I was conflating Pablo's with - maybe it's part of Brian's argument?) is that the simulators will likely switch off our universe if it expands beyond a certain size due to resource constraints. Again, this argument implies IMO vastly too high confidence in both their motivation and resource limits.

Thanks for explaining that!

Brian concludes that L/S = T*D/F, where:

  • L is the cost-effectiveness of longtermist interventions.
  • S is the cost-effectiveness of neartermist interventions.
  • T "represent[s] how much more important it is to influence a unit of sentience by the average future digital agent than a present-day biological one for these reasons ["future, simulated human might have much higher intensity of experience per unit time, and we may have much greater control over the quality of his experience"]".
  • D is "a discount representing how much harder it is to actually end up helping a being in the far future than in the near term, due to both uncertainty and the muted effects of our actions now on what happens later on".
  • F is "the fraction of all computational sent-years spent non-solipsishly simulating almost-space-colonizing ancestral planets (both the most intelligent and also less intelligent creatures on those planets)". "A non-solipsish simulation is one in which most or all of the people and animals who seem to exist on Earth are actually being simulated to a non-trivial level of detail".

Brian guesses T = 10^4, D = 10^-3, and F = 10^-6, thus concluding L/S = 10^7. I guess you are saying with your comment just above that F should be much lower than 10^-6? For reference, here is Brian's motivation for F = 10^-6:

It's very unclear how many simulations of almost-space-colonizing planets superintelligences would run. The fraction of all computing resources spent on this might be close to 100% or might be below 10-15. It's hard to predict resource allocation by advanced civilizations. But I set this parameter based on assuming that ~10-4 of sent-years will go toward ancestor simulations of some sort (this is probably too high, but it's biased upward in expectation, since, e.g., maybe there's a 0.05% chance that post-humans devote 20% of sent-years to ancestor simulations), and only 1% of those simulations will be of the almost-space-colonizing period (since there might also be many simulations of the origin of life, prehistory, and the early years after a planet's "singularity"). If we think that simulations contain more sentience per petaflop of computation than do other number-crunching calculations, then 10-4 of sent-years devoted to ancestor simulations of some kind may mean less than 10-4 of all raw petaflops devoted to such simulations.

The informality of that equation makes it hard for me to know how to reason about it. For eg, 

  • T, D and F seem heavily interdependent.
  • I'm just not sure how to parse 'computational sent-years spent non-solipsishly simulating almost-space-colonizing ancestral planets'. What does it mean for a year of sentient life to be spent simulating something? Do you think he means what fraction of experienced years exist in ancestor simulations? I'm still confused by this after reading the last paragraph.
  • I'm not sure what the expression's value represents. Are we supposed to multiply some further estimate we have of longtermist work by 10^7? (if so, what estimate is it that's so low that 10^7 isn't enough of a multiplier to make it still eclipse all short termist work?)

If you feel like you understand it, maybe you could give me a concrete example of how to apply this reasoning?

For what it's worth, I have much more prosaic reasons for doubting the value of explicitly longtermist work both in practice (the stuff I've discussed with you before that makes me feel like it's misprioritised) and in principle (my instinct is that in situations that reduce to a kind of Pascalian mugging, xP(x) where x is a counterfactual payoff increase and P(x) is the probability of that payoff increase, approaches 0 as x tends to infinity).

T, D and F seem heavily dependent.

I agree.

I'm just not sure how to parse 'computational sent-years spent non-solipsishly simulating almost-space-colonizing ancestral planets'.

I think F = "sent-years respecting the simulations of the beings in almost-space-colonizing ancestral planets"/"all sent-years of the universe". Brian defines sent-years as follows:

I'll define 1 sent-year as the amount of complexity-weighted experience of one life-year of a typical biological human. That is, consider the sentience over time experienced in a year by the median biological human on Earth right now. Then, a computational process that has 46 times this much subjective experience has 46 sent-years of computation.2 Computations with a higher density of sentience may have more sents even if they have fewer FLOPS.

I said Brian concluded that L/S = T*D/F, but this was after simplifying L/S = T*D/(E/N + F), where:

  • E is "the amount of sentience on Earth in the near term (say, the next century or two)".
  • "On average, these civilizations ["that are about to colonize space"] will run computations whose sentience is equivalent to that of N human-years".

Then Brian says:

Everyone agrees that E/N is very small, perhaps less than 10-30 or something, because the far future could contain astronomical amounts of sentience [see e.g. Table 1 of Newberry 2021]. If F is not nearly as small (and I would guess that it's not), then we can approximate L/S as T * D / F.

The simulation argument dampening future fanaticism comes from Brian assuming that E/N << F, in which case L/S = T*D/F, and therefore prioritising the future no longer depends on its size. However, for the reasons you mentioned (we are not simulating our ancestors much), I feel like we should a priori expect E/N and F to be similar, and correlated, in which case L/S will still be huge unless it is countered by a very small D (i.e. if the typical low tractability argument against longtermism goes through).

I'm not sure what the expression's value represents. Are we supposed to multiply some further estimate we have of longtermist work by 10^7? (if so, what estimate is it that's so low that 10^7 isn't enough of a multiplier to make it still eclipse all short termist work?)

I think L/S is just supposed to be a heuristic for how much to prioritise longtermist actions relative to neartermist ones. Brian's inputs lead to 10^7, but they were mainly illustrative:

This [L/S = 10^7] happens to be bigger than 1, which suggests that targeting the far future is still ~10 million times better than targeting the short term. But this calculation could have come out as less than 1 using other possible inputs. Combined with general model uncertainty, it seems premature to conclude that far-future-focused actions dominate short-term helping. It's likely that the far future will still dominate after more thorough analysis, but by much less than a naive future fanatic would have thought.

However, it seems to me that, even if one thinks that both E/N and F are super small, L/S could still be smaller than 1 due to super small D. This relates to your point that:

my instinct is that in situations that reduce to a kind of Pascalian mugging, xP(x) where x is a payoff size and P(x) is the probability of that payoff, approaches 0 as x tends to infinity

I share your instinct. I think David Thorstad calls that rapid diminution.

If you feel like you understand it, maybe you could give me a concrete example of how to apply this reasoning?

I think Brian's reasoning works more or less as follows. Neglecting the simulation argument, if I save one life, I am only saving one life. However, if F = 10^-16[1] of sentience-years are spent simulating situation like my own, and the future contains N = 10^30 sentience-years, then me saving a life will imply saving F*N = 10^14 copies of the person I saved. I do not think the argument goes through because I would expect F to be super small in this case, such that F*N is similar to 1.

  1. ^

    Brian's F = 10^-6 divided by the human population of 10^10.

Appreciate the patient breakdown :)

This [L/S = 10^7] happens to be bigger than 1, which suggests that targeting the far future is still ~10 million times better than targeting the short term. But this calculation could have come out as less than 1 using other possible inputs. Combined with general model uncertainty, it seems premature to conclude that far-future-focused actions dominate short-term helping. It's likely that the far future will still dominate after more thorough analysis, but by much less than a naive future fanatic would have thought.

This is more of a sidenote, but given all the empirical and model uncertainty in any  far-future oriented work, it doesn't seem like adding a highly speculative counterargument with its own radical uncertainties should meaningfully shift anyone's priors. It seems like a strong longtermist could accept Brian's views at face value and say 'but the possibility of L/S being vastly bigger than 1 means we should just accept the Pascalian reasoning and plow ahead regardless', while a sceptic could point to rapid diminution and say no simulationy weirdness is necessary to reject these views.

(Sidesidenote: I wonder whether anyone has investigated the maths of this in any detail? I can imagine there being some possible proof by contradiction of RD, along the lines of 'if there were some minimum amount that it was rational for the muggee to accept, a dishonest mugger could learn that and raise the offer beyond it whereas an honest mugger might not be able to, and therefore, when the mugger's epistemics are taken into account, you should not be willing to accept that amount. Though I can also imagine this might just end up as an awkward integral that you have to choose your values for somewhat arbitrarily)

I think Brian's reasoning works more or less as follows. Neglecting the simulation argument, if I save one life, I am only saving one life. However, if F = 10^-16[1] of sentience-years are spent simulating situation like my own, and the future contains N = 10^30 sentience-years, then me saving a life will imply saving F*N = 10^14 copies of the person I saved. I do not think the argument goes through because I would expect F to be super small in this case, such that F*N is similar to 1.

For the record, this kind of thing is why I love Brian (aside from him being a wonderful human) - I disagree with him vigorously on almost every point of detail on reflection, but he always come up with some weird take. I had either forgotten or never saw this version of the argument, and was imagining the version closer to Pablo's that talks about the limited value of the far future rather than the increased near-term value.

That said, I still think I can basically C&P my objection. It's maybe less that I think F is likely to be super small, and more that, given our inability to make any intelligible statements about our purported simulators' nature or intentions it feels basically undefined (or, if you like, any statement whatsoever about its value is ultimately going to be predicated on arbitrary assumptions), making the equation just not parse (or not output any value that could guide our behaviour).

Interesting. But how soon is "soon"? And even if we are a simulation, to all intents and purposes it is real to us. It doesn't seem like much of a consolation that the simulators might restart the simulation after we go extinct (any more than the Many Worlds interpretation of Quantum Mechanics gives solace over many universes still existing nearby in probability space in the multiverse). 

Maybe the simulators will stage an intervention over us reaching the Singularity. I don't think we can rely on this though (indeed, this is part of the exotic scenarios that make up the ~10% chance that I think we aren't doomed from AGI by default).

Thanks for engaging, Greg!

But how soon is "soon"?

I seem to remember a comment from Carl Shulman saying the risk of simulation shut-down should not be assumed to be less than 1 in 1 M per year (or maybe it was per century). This suggests there is still a long way before it happens. On the other hand, I would intuitively think the risk to be higher if the time we are in really is special. I do not remember whether the comment was taking that into account.

And even if we are a simulation, to all intents and purposes it is real to us. It doesn't seem like much of a consolation that the simulators might restart the simulation after we go extinct (any more than the Many Worlds interpretation of Quantum Mechanics gives solace over many universes still existing nearby in probability space in the multiverse).

Yes, it is not a consolation. It is an argument for focussing more on interventions which have nearterm benefits, like corporate campaigns for chicken welfare, instead of ones whose benefits may not be realised due to simulation shut-down.

Yes, it is not a consolation. It is an argument for focussing more on interventions which have nearterm benefits, like corporate campaigns for chicken welfare, instead of ones whose benefits may not be realised due to simulation shut-down

I still don't think this goes through either. I'm saying we should care about our world going extinct just as much as if it were the only world (given we can't causally influence the others).

Agreed, but if the lifespan of the only world is much shorter due to risk of simulation shut-down, the loss of value due to extinction is smaller. In any case, this argument should be weighted together with many others. I personally still direct 100 % of my donations to the Long-Term Future Fund, which is essentially funding AI safety work. Thanks for your work in this space!

Thanks for your donations to the LTFF. I think they need to start funding stuff aimed at slowing AI down (/pushing for a global moratorium on AGI development). There's not enough time for AI Safety work to bear fruit otherwise.

Thank you Vasco! This seems hard to model, but worthwhile. I'll think on it.

Hi Spencer,

You mention the following methods to estimate thresholds:

Using near-termist thresholds as a starting point

Using benchmarks for cost-effectiveness from current longtermist charities

Using the estimates [guesses] of others

I think the 2nd is the most promising. It is the one employed to establish thresholds for neartermist charities, which are usually benchmarked against GiveWell's top charities. Open Philanthropy is arguably the largest funder of longtermist projects, so I think it would be valuable to:

  • Know what are their marginal longtermist grants (in theory, the marginal grants in each cause area should be equally cost-effective, but it would be better to pick scalable interventions whose marginal cost-effectiveness would not decrease much for additional resources).
  • Then try to estimate how cost-effectively they reduce e.g. extinction risk until 2050. Maybe this can already be done using the quantitative models of Open Philanthropy's bio team.

In addition, I believe it is worth estimating the cost-effectiveness of longtermist charities in terms of a neartermist metric like DALYs averted per $ without accounting for future generations, and then see how they compare with GiveWell's top charities.

Cool! One point from a quick skim - the number of animals wouldn't be lost in many kinds of human extinction events or existential risks. Only a subset would erase the entire biosphere - e.g. a resource-maximising rogue AI, vacuum decay, etc. Presumably with extinction of just humans the animal density of reclaimed land would be higher than current, so the number of animals would rise (assuming it outweighs the end of factory farming). 

The implications of human existential risks for animals is interesting, and I can see some points either way depending on the moral theory (e.g. end of factory farming in human extinction, but rise of wild animal suffering; total number and quality of animal lives in a beyond-Earth humanity; potential of a completely re-wilded Earth in a beyond-Earth humanity; risks of astronomical suffering if a beyond-Earth humanity retains the equivalent of factory farming...)

Thanks Ben! I totally agree. The math in this post was trying to get at upper and lower bounds and a median -- but for setting one's personal thresholds, the nuance you mention is incredibly important. I hope this post, and the Desmos tool I linked, can help people play with these numbers and set their own thresholds!

Were these commenters expecting it to be much cheaper to save a life by preventing the loss of potential in an extinction, than to save a life using near-termist interventions?


I think that commenters are looking at the cost-effectiveness they could reach with current budget constraints. If we had way more money for longtermism, we could go to a higher cost per basis point. That is different than the value of reducing a basis point, which very well could be astronomical, given GiveWell costs for saving a life (though to be consistent, one should try to estimate the long-term impacts of a GiveWell intervention as well).

Nice post, Spencer!

0.01% absolute reduction

Nitpick, an absolute reduction of 10^-4 is often indicated as 0.01 pp (0.01 percentage points).

Furthermore, we might never have enough evidence to say whether an intervention has reduced cumulative x-risk by a certain amount. It might be more manageable to set a threshold based on reduction in per-century x-risk.

I would go a little further, and say that we might never have enough evidence to say whether non-extinction x-risk this century was reduced, as I think the evidence base for non-extinction value lock-in is quite poor (related links here). So I believe it is better to focus on extinction risk, or probability of a given population loss, as forecasted in the Existential-Risk Persuasion Tournament.

Ajeya Cotra“AI risk is something that we think has a currently higher cost effectiveness”

“$200 trillion per world saved”

Or $20 billion per bp

To clarify, the above estimate is a conservative cost-effectiveness bar for Open Philanthropy's longtermist grants. In this section of episode 90 of The 80,000 Hours Podcast, Ajeya says it concerns "meta R&D to make responses to new pathogens faster", and "[Open Philanthropy] were aiming for this to be conservative".

Were these commenters expecting it to be much cheaper to save a life by preventing the loss of potential in an extinction, than to save a life using near-termist interventions?

I guess so.

These estimates are not robust enough to make the most important decisions we face. We recommend conducting a survey of funders, charities, and experts to get a stronger picture of what the standard should be and the cost-effectiveness of different types of work.

I would say better quantitative models would also be needed to more reliably estimate the cost-effectiveness of interventions aiming to decrease extinction risk.

Thanks Vasco! This helps my understanding.

Curated and popular this week
Relevant opportunities