Red teaming a model for estimating the value of longtermist interventions - A critique of Tarsney's "The Epistemic Challenge to Longtermism"

AF; Chris Lonsberry; Bryce Woodworth

This was written by Anjay Friedman, Bryce Woodworth, and Chris Lonsberry. From May through July of 2022, our team participated in Training for Good's Red Team Challenge.^[1] Our team is critiquing Christian Tarsney's work-in-progress paper, which he posted to the EA forums in the following linkpost: The Epistemic Challenge to Longtermism (Tarsney, 2020). The latest version of the paper can be found on GPI's website.

Summary (2 mins)

Introduction
- Tarsney develops a model to estimate the expected value of longtermist interventions that aim to make an impact by increasing the probability we are in a persistent state in the future. The key intuition from the model is that the higher the probability of some external event (either positive or negative) resetting the world, the lower the value of the intervention. (See A brief introduction to Tarsney's paper and model)
- This paper could provide evidence that (x-risk mitigation) longtermist interventions can have very large benefits, but this result depends on a number of assumptions that serve to reduce the scope under which the evidence applies. (See Assumptions)
Critiques inside the model
- Tarsney aims to model epistemic persistence concerns. While persistence concerns have clearly been modeled, we are not confident that strong epistemic skeptics will be satisfied with the treatment. In particular, the model only begins to work with persistence concerns after crossing the threshold into the long-term future, 1,000 years from now, which seems unlikely to be an acceptable assumption to a persistence skeptic. (See A priori, the skeptical position is strong)
- We also argue that Tarsney, in attempting to incorporate uncertainty over the values of key parameters, makes a difficult to justify assumption about the expected value of space settlement, leading him to arrive at a biased set of expected values under which the case for longtermism looks stronger than it is in reality. (See Biased incorporation of uncertainty)
Critiques outside the model
- We believe there are some issues with the decision-theoretic framework used in this paper. We argue that using “expectational utilitarianism” implies an acceptance of fanaticism, akin to accepting Pascal’s mugging, which seems unjustified. (See Challenging the usage of expectational utilitarianism)
- We also have concerns about the first-order-approximation style of expected value reasoning in the paper, which doesn’t account for negative second-order effects of focusing our current resources primarily on longtermism. (See Suspicion of negative side-effects..)
Relation to AI risk and the longtermist movement
- Ultimately, we think that, although this model attempts to cover certain longtermist interventions focused on having a persistent impact, there are many plausible worldviews under which focusing on risks from advanced technologies such as AI, bioengineering and nuclear weapons makes sense and that are not addressed by this analysis or negated by our criticisms
Conclusion

Introduction

A brief introduction to Tarsney's paper and model

Tarsney opens the paper by showing that the case for longtermism stems from the fact that there is so much future: there's both a lot of time left on the clock and also a lot of territory we have not yet explored. The possibility of the future being very large necessarily leads to very large estimates of the potential value of the future of our civilization. For example, in estimating the value of x-risk prevention, one estimates the value of the future and multiplies that by the probability that the intervention prevents extinction. The value of the future is commonly estimated as the value of an average human life times the number of persons we expect to live in all of the future. Unsurprisingly, such EV estimates are enormous. Tarsney argues that our declining capacity to predict the impact of our intervention into the future will necessarily deflate such estimates: "The case for longtermism depends not just on the intrinsic importance of the far future but also on our ability to predictably influence it for the better" (p. 2).

Therefore, Tarsney aims to apply some analytical rigor to naive longtermist estimates by incorporating epistemic concerns. His intention in the paper is to "(i) provide a model that can be used to estimate the [expected value (EV)] of lots of longtermist interventions, and (ii) actually apply that model to x-risk mitigation". The focus of our analysis centers on the application of the model to predict the expected value of a specific longtermist intervention L, namely x-risk mitigation, into the far future. There are two variants: one in which humanity remains in roughly its present state on Earth, and another in which humans engage in large-scale space colonization, leading to cubic growth. Tarsney assigns a probability, p, that the intervention successfully prevents human extinction within the next 1,000 years. At the end of 1,000 years, the long run future begins. From that time forward, the persistence of the (counterfactual) difference made by the intervention is under attack from exogenous nullifying events (ENEs). As the name implies, an ENE nullifies the intervention by resetting the world and removing the impact of the intervention.

Tarsney's model takes the integral over time of the product of two factors. The first factor is the value of civilization at time t and it is multiplied by the second factor, a discount factor that accounts for the probability that an ENE has taken place by time t. As t increases, the probability of an ENE approaches 1 and thus the discount factor approaches zero, wiping out the value of the intervention. Thus the model sets up a race between the increasing value of civilization over time against deflation of value caused by the ENE discount factor. Tarsney concludes that the case for longtermism is robust to this epistemic challenge, though depending on one’s empirical position, this defense relies on an acceptance of Pascalian fanaticism. Particularly important empirical questions include the rate of ENEs, the chance of averting extinction with a marginal donation, and the likelihood that humanity will eventually be able to extract vast amounts of value from settled star-systems.

We found Tarsney's model useful in illustrating how we might think about the impact of persistence on an x-risk intervention. The key intuition from the model is that the higher the probability of some external event (either positive or negative) resetting the world, the lower the value of the intervention. ^[2]

Anecdotal illustration

To illustrate, we can imagine a case where Larry Longtermist^[3] has spent resources to mitigate risk from nuclear war. We can imagine many ENEs that might reset the world, but let us illustrate with one positive and one negative. In the negative case, despite Larry's work reducing the risk of nuclear war, a large asteroid strikes the earth and wipes out advanced civilization. In the positive case, humans went extinct, but another intelligent civilization evolved on Earth. In either case, the world has been "reset" by events unrelated to Larry's work and in the new state of the world, Larry's intervention has stopped producing benefits.^[4]

The key determinants of the benefits that accrue to Larry's intervention are the likelihood that it succeeds in its aims and, if it does, how long the world is in a state where we benefit from his work. In worlds where an ENE occurs very quickly after the intervention, the expected value is near zero. In worlds where it takes millions or billions of years for an ENE to occur, the intervention can have very high value.

Tarsney's model can also help us test how different beliefs about the probability of successful intervention or the rate of ENEs might change the value that accumulated due to longtermist interventions. We found it instructive to see how different parameter values (i.e. different assumptions about how the future will unfold) impacted the expected value (and its distribution over time) of x-risk interventions.

Assumptions

This paper could provide evidence that [x-risk mitigation] longtermist interventions can have very large benefits, but this result depends on a number of assumptions that serve to reduce the scope under which the evidence applies.

First, Tarsney only considers a single moral decision-making framework, which he calls “expectational utilitarianism”. Tarsney defines this as incorporating 3 components:

precise probabilities assigned to decision-relevant possibilities constrained by agent's evidence,
a total welfarist consequentialist normative framework, and
a decision-theoretic framework of maximizing EV.

This framing is intended to be favorable to longtermism, so that we can evaluate whether longtermism is defeated by the persistence challenge itself, without requiring unfavorable normative positions. Someone who rejected this framing might still reject longtermism. One particularly noteworthy potential objection is to the framing’s acceptance of fanaticism; it is willing to make decisions primarily on minuscule chances of astronomically positive outcomes. The case presented in the paper is significantly weaker in framings that do not accept fanaticism.

Additionally, the argument requires that either the rate of ENEs is low (e.g. less than 1 per hundred million years), or that it is moderate (e.g. less than 1 per ten thousand years) and that humans will engage in large-scale space colonization. Much of the probability mass in the model comes from assuming a non-negligible chance of outcomes at least as good as the construction of Dyson spheres around every nearby star, which will be used to simulate the maximum number of happy people.

Critiques inside the model

A priori, the skeptical position seems strong

Tarsney sums up the main problem of empirical skepticism concisely: "If our ability to predict the long-term effects of our present choices is poor enough, then even if the far future is overwhelmingly important, the main determinants of what we presently ought to do might lie mainly in the near future." This seems like an accurate description of the position held by epistemic skeptics.

Is forecasting required for predictable influence?

One theory of influence states that we must first be able to predict the future accurately before we can predict the impacts of our actions on the future. If we cannot predict the impacts of our actions, then we can have very little confidence that our actions will achieve the desired outcomes. Further, we must contend with the possibility that our ignorance will lead us to take actions that run counter to our desired outcomes.

Indeed, it seems reasonable to be skeptical, based on strong empirical evidence that humans are generally bad at predicting the future. Tarsney discusses some problems with our capacity to predict the future, finding that "the existing empirical literature on political and economic forecasting finds that human predictors—even well-qualified experts—often perform very poorly, in some contexts doing little better than chance" (p. 2). Further, in footnote 4 of the manuscript, he notes he was unable to find any data whatsoever on truly long-term forecasts (where long-term is defined as timescales greater than a century). Again, where data exists, the track record seems to be bad: "[T]here is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious—‘there will be conflicts’—and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts. These limits on predictability are the predictable results of the butterfly dynamics of nonlinear systems. In my [Expert Political Judgment] research, the accuracy of expert predictions declined toward chance five years out" (Tetlock and Gardner, 2015).

In summary, concerning predictions that can be clearly evaluated, the upper time bound on our capacity to predict the future seems to be less than a decade. Further, it seems humans have very little data for evaluating forecasts of geopolitical and economic events over even medium timescales (defining medium timescales as 10 to 100 years). It is conceivable that a more scientific approach to forecasting could push the bounds outward, but for the time being, 10 years seems to be a plausible maximum predictability horizon for events of broad complexity.^[5]

It also seems reasonable to infer that the chances of successfully influencing the future toward a desired outcome should generally be considered to be lower than the chances of predicting said outcome.^[6] This is because some interventions could fail to have an impact, or even backfire. Thus, if we limit our maximum chances of successful influence to be smaller than our chances of prediction, the chances of successfully implementing a strategy to achieve a specific outcome in 1,000 years begin to look infinitesimally small. In other words, a true skeptic would assign a prior probability of being able to influence the future far (far) beyond the horizon of predictability to be near to zero.^[7]

As illustrated in the following section, our hypothetical skeptic is convinced that a successful intervention could have long lasting consequences. The main point of the discussion above is to suggest that explicitly targeting a consequence 1,000 years into the future seems problematic. A more conservative approach might be to spend our longtermist resources on avoiding extinction within the next few decades and repeating this over time.

The current form of the model assumes 1,000 years of persistence

The following fictional dialog is intended to convey how we imagine a skeptic might argue against Tarsney's model. It's important to note that the skeptic below does not limit herself to Tarsney's empirical epistemic persistence skepticism, since we have doubts that true skeptics will have strong opinions on precisely why humans are not very good at predicting the long term consequences of their actions.

Larry: So, Sandy, you would describe yourself as a skeptic on longtermism.
Sandy: That's right. Given that we don't seem to be able to predict important things about the future 10 years from now, it's hard to see how we could have confidence that our efforts will have an impact in 100 or 1,000 years.
Larry: But don't you think if we made incredible efforts to mitigate x-risk today, it could increase the likelihood that humans are still alive in 1,000 years?
Sandy: It seems clear to me that humans still being here 10 years from now is a prerequisite for humans being here in 100 years or 1,000 years. To that extent, there is a clear causal link to the far future. So yes, if we reduce x-risk today and during the next 10 years, then we seem to increase the chances of human civilization surviving 1,000 years. But it seems that understanding the background chances of human civilization surviving 1,000 years are also very important here.
Larry: Would you be wiling to say that spending $1M today on x-risk mitigation would improve the chances of human civilization existing--that is, not having gone extinct--in 1,000 years by (as compared to doing nothing)?
Sandy: That feels oddly specific. I'm thinking there's a lot to unpack there.
Larry: OK, well just imagine that all of humanity spends 100% of their resources over the next 1,000 years to mitigate x-risk--
Sandy: Slow down, Larry. First of all, if all of humanity managed to perfectly coordinate themselves to ensure the survival of our species, we would already be living in a utopia and I'm not sure using a hypothetical utopia is the best place to start our reasoning about how the real world is going to work. Second, if they are working over the course of 1,000 years, there's a decreasing foreknowledge problem over time. In year 500, our actors are no longer trying to predict a future that's 1,000 years away, now they only have to worry about a future that is 500 years away. Third, what are these people eating?^[8]
Larry: Well, hear me out on this. I thought if humanity was spending all their resources on this one thing, we could get away with saying that the chances of our civilization still existing would increase by 1% and that seemed like a reasonable and conservative thing to say, given human capacity to learn. Then I'll just scale that down by the amount of resources I can purchase with $1M.
Sandy: OK, I think I see where you're trying to get to and I agree there is an element of conservatism on the surface, but the way you've made this estimate seems problematic on deeper analysis. I'm not sure I can be comfortable with that number without knowing a lot more about the chances of this risk. Also, have you considered that you might make wrong decisions with your $1M? There must be at least a small chance you could make things worse, rather than better.^[9]
Larry: I think my estimate of risk reduction is so small that it should be uncontroversial. This point can be a little confusing, since within the EA community risks are often expressed over different timescales, which makes them difficult to compare.

In order to convert, we can use the formula $1 - R = (1 - r)^{P}$ where
- $r_{p}$ is the risk in each period (e.g. one year)
- $R$ is the risk over the full timeframe, which consists of
- $P$ periods

For the sake of comparability, let's convert to annual risk, setting $P = 1, 000$ and $R = 1$ %, we can calculate $r_{p} (a n n u a l) = 1 * 10^{- 5}$ .

Since x-risk is often expressed in timescales of centuries let's calculate the century risk as well. $r_{p} (c e n t u r y) = 1 * 10^{- 3}$ (or 0.1%).

I think, within the context of the EA movement, all x-risks of concern have chances far enough above 0.1% per century that my estimated probability reduction would not mean complete elimination of the risk (which would be a clear indicator that I was not conservative in selecting my probability).
Sandy: OK, for the record, I was operating from the assumption that you cannot calculate the probability of your action being able to prevent X without estimating the likelihood of X (lest we fall prey to base rate neglect in our thinking, which would be problematic to a Bayesian). If I understand you correctly, you're saying the probability of the x-risk you're trying to prevent is much more likely than 0.1% per century, in which case we can safely accept 0.1% reduction as conservative?
Larry: Yes, and next we'll get on to talking about whether or not the intervention was persistent.
Sandy: Wait. What? You're telling me that the intervention persists for 1,000 years *before* we start dealing with persistence?
Larry: Well, yes…
Sandy: I think that's going to be a very big hurdle to overcome for anyone with a strong form of persistence skepticism.

As the dialog suggests, there seems to be an implicit assumption of being able to predict the results of our actions. It's not so much that the model helps us answer the question of whether or not our actions can have predictable results in the far future. Instead, the model helps us envision how we might select actions under different assumptions about our capacity to influence the far future.

The following dialogs lists some epistemic concerns related to human biases and the fact that our estimates of expected value are likely to be biased in favor of projects we like.

Larry: The model is attempting to show that longtermism is robust to a type of skepticism that I will call epistemic persistence skepticism, which claims "prohibitively difficult to identify potentially persistent differences and strategies for ensuring their persistence".
Sandy: So you're saying that the main way humans make mistakes in influencing the future is by overestimating how long their actions will have an impact?
Larry: Not the only way, but for those concerned about longtermism, that seems to be the most important.
Sandy: It seems to me that there are innumerable ways people are wrong about if and how their actions will influence the future when trying to act on timescales much shorter than this. In particular, psychological research into the planning fallacy suggests that humans tend to "underestimate the time, costs, and risks of future actions and at the same time overestimate the benefits of the same actions". This seems particularly disturbing in cases where evidence of success is very difficult to gather.
Larry: Why would evidence of success be difficult to gather?
Sandy: There's no counterfactual world we can observe in which we decide not to undertake the intervention and then observe whether or not humans go extinct as a result.

Finally, the discussion takes a turn toward trying to estimate short-term costs and benefits, which seem to be on better epistemic footing. Once we estimate the disvalue of killing 7 billion people, the estimate of the intervention's value hinges primarily on the chances of preventing that from occurring.

Sandy: It looks like you're substituting a feature of the world (how often it resets) for my concerns about the human capacity to know how long our actions will last.
Sandy: If you tell me you've spent $1M on public health, I have a variety of studies that I can look at that will tell me what to expect in terms of benefits. In addition, if I had enough time I could go visit the recipients over time and check on them. On the other hand, if you tell me you've spent $1M to make the future go better 1,000 years from now, what kind of evidence can you offer that your $1M was well spent?
Larry: [Provides some plausible evidence that the risk of nuclear war will decline by x% over the coming decades.]
Sandy: So you're really trying to prevent us from killing ourselves in the next couple of decades?
Larry: Yes.
Sandy: Then why are we talking about what happens 1,000 years from now?
Larry: Well, that's where most of the calculated benefit comes into existence.
Sandy: There are roughly 7B people alive today, with a median age of roughly 30. If we assume perfect quality of life and a life expectancy of 70 years, that would give us a benefit of 28B QALYs for preventing the lives lost in a human extinction event. You guessed that the value of spending $1M on public health was 10,000 QALYs. Your intervention, if successful, would be 2.8 million times more effective than the benchmark. It's hard to see why I'm using my energy trying to convince you that what's likely to happen 1,000 years from now is largely unknowable.
Larry: But that's only in the event that my intervention actually prevents the apocalypse.
Sandy: That's true! So we're back to asking ourselves what are the chances (1) that the risk that you're trying to prevent occurs, and (2) that your intervention is successful in preventing it.

The upshot of the dialog above is also covered below in under "The conclusions require a potentially-contentious degree of fanaticism", where reference is made to Scott Alexander's forum post contrasting longtermism and thoughtful short-termism. The argument is not fully elaborated above, but our hope is that the reader appreciates that arguments for longtermist interventions need not rely on exceptionally high value futures.

Biased incorporation of uncertainty

We argue that Tarsney, in attempting to incorporate uncertainty over the values of key parameters, makes an unjustified assumption about the expected value of space settlement, leading him to arrive at an upward-biased set of expected values under which the case for longtermism looks stronger than it is in reality.

Summary of section 6.1

In Section 6 of the paper, titled "Uncertainty and fanaticism", the author attempts to incorporate uncertainty “about the values of several key parameters, and that uncertainty is very consequential for the expected value of L” (where L is the longtermist intervention that the model aims to calculate the expected value of, as compared to N, the neartermist intervention used as a benchmark). Rather than specify a full uncertainty distribution for the key parameters r (rate of ENE's), s (speed of interstellar settlement), and V_s(expected value per star per unit time^[10]); which is an impossible task, he adopts the approach of calculating a minimum expected value of L and using this in the expected value calculation. The approach relies on asserting that “any distribution that didn’t assign at least X % credence to values at least this favorable to longtermism would be overconfident”, from which he calculates the minimum expected value of L.^[11] In this post, we call this method the "minimum credences" method.

Tarsney calculates the minimum expected value of L with a set of lower bound parameter estimates. First he considers them each independently and then he also considers them jointly. To do so, he also proposes a minimum of a 0.1% chance of space expansion happening (the cubic growth model). His parameter estimates are:

At least 0.1% chance that r is less than $10^{- 6}$ ENEs/yr
At least 1% chance that s is greater than 0.8c
At least 0.0001% chance (1 in a million) that V_s is greater than $10^{20}$ V/yr/star

The results of the above analysis are unambiguously in favor of longtermism (predicated on small probabilities of high impact, which can give rise to fanaticism as discussed later in this post). In fact, combining any two of the lower bound parameter estimates of r, s and V_s guarantees that EV(L) > EV(N). Combining uncertainty across all of the parameters according the minimum credences method gives an EV of the longtermist intervention that is 11 OoMs larger than the neartermist.^[12]

An unjustified assumption in the author’s minimum expected value reasoning

The underlying assumption that makes this minimum expected value reasoning possible is the claim that there are no negative tails in the uncertainty distributions of r, s and V_s. While r and s cannot be negative, V_s (the expected value per star per unit time of space settlement) certainly can. The author attempts to handle this in footnote 34, where he contends: “in the case of V_s, which can take negative values, we must also assume that its expected value conditional on its being less than the ‘Dyson Sphere’ value of 10²⁰ (V/yr)/star is non-negative.”

This assumption is very crucial and does not seem obvious enough to be made without explanation. If it is negative, then the negative tail might outweigh the positive tail.^[13] Even if the negative tail is skinnier, It could still dramatically curtail the total EV.

Evaluating the assumption: The key question is, how likely is the expected value of V_s, conditional on it being less than the utopic ‘Dyson Sphere’ value of 10²⁰ (V/yr)/star, to be non-negative?

It seems unlikely to us. Consider the following scenarios in which V_s < 10²⁰(analogous to the scenarios presented in the paper):

Positive space opera scenario^[14] (as referenced in the paper): 10⁴ (V/yr)/star
Negative space opera scenario: -10⁴ (V/yr)/star
- Essentially, a scenario where we populate the universe with negative consequences in a physical planetary way.
- One plausible scenarios: Bring factory farming wherever we go^[15]
- Uncertainties: It seems like this number should be higher in magnitude since suffering seems easier to cause than happiness and in cases where it is due to animals, there is usually at least an order of magnitude more animals than humans
Negative ‘Dyson Sphere’ scenarios: -10²⁰ (V/yr)/star
- Essentially, scenarios where we populate the universe with Dyson Spheres with net-negative lives or suffering
- Plausible Scenarios: Severe AI s-risk scenario involved simulation of suffering life, simulated uncontrolled ecosystems, simulated human lives that are net-negative

The flipside of using expected value reasoning with heavy-tails is that we have to use it with negative tails too^[16]: if the probability of an outcome at least as bad as -10²⁰ (V/yr)/star, an s-risk scenario on the scale of “Dyson Spheres”, is more than just 10^-14 as much as the likelihood of the positive space opera scenario (10⁴ (V/yr)/star), it causes the expected value of V_s to become negative, conditional on V_s < 10²⁰(V/yr)/star. It seems hard to reasonably reject this possibility.

Implications and S-risks

It is certainly possible and probable that the positive tail of V_s outweighs the negative tail of V_s. This is a question partially addressed in The expected value of extinction reduction is positive (which is also being red-teamed). If true, the case for longtermism would be strengthened by including uncertainty compared to just considering the point estimate expected value estimates.^[17]
The existence of negative tails of V_s (s-risks) can strengthen the case for working on longtermist interventions, if they are aimed at reducing the likelihood of these occurring or increasing the expected value of V_s.

For the reasons just shared, we do not believe that the unjustified assumption discussed in this section changes the outcome of the paper. However, we think that the paper should have still addressed these nuances since they could be crucial to some readers.

Critiques outside the model

Challenging the usage of expectational utilitarianism

The main purpose of this paper is to provide a quantitative model that accounts for persistence challenges to longtermist interventions. For tractability, only a single moral decision-making framework is evaluated (expectational utilitarianism, which we will argue also implies fanaticism). Tarsney claims that “I choose this set of assumptions partly because they represent a widely held package of views, and partly because I find them plausible.” Expectational utilitarianism is also implied to be a favorable choice to longtermists, so that we can evaluate the strength of the epistemic challenge on its own.^[18] We claim that this framework, taken seriously, implies a level of weirdness which is not captured in the model, and that it does not seem to accurately capture the views of almost any real EAs. Since the assumption of expectational utilitarianism is baked into all aspects of the model, this presents a serious issue for the conclusions of the paper. There are potential ways that one could try to weaken the acceptance of fanaticism implicit in expectational utilitarianism, but despite implications in the abstract, this paper does not explicitly attempt to do so.^[19]

The given model implicitly assumes bounded fanaticism

Fanaticism (as used in this context) is “the apparent problem faced by moral theories that rank a minuscule probability of an arbitrarily large value above a guaranteed modest amount of value”. The archetypical thought-experiment is Pascal’s Mugging, in which someone claims to have supernatural powers which they will use to provide arbitrarily-high amounts of utility/disutility depending on whether you give them some money. A proper Bayesian should assign at least some nonzero probability ε on the possibility that the mugger is telling the truth.^[20] The mugger can set the promised payout to some astronomically high value V so that the expected value of paying the mugger, $V * ε$ , is in turn also astronomical.

The “expectational utilitarianism” framing that Tarsney uses implies unbounded fanaticism, and thus that we ought to pay the mugger.^[21] This has implications which run counter to the core assumptions of Tarsney’s quantitative model.

The primary source of expected value in Tarsney’s model is the possibility of human-originating civilization to support a large number of happy lives, either by lasting a long time or by building something like Matrioshka brains to simulate happy people. But consider the possibility of getting mugged by someone claiming to be an Operator of the Seventh Dimension, promising Graham’s number of happy-life-equivalents. If the chance of getting mugged by a truthful Operator is not almost exactly 0,^[22] then this will dominate every other factor of your expected value calculation. If your decision-making framework takes these kinds of possibilities at face value, then it is completely irrelevant whether humanity colonizes the stars, or does so at 0.1c or 0.9c, or can survive 10,000 vs 10,000,000 years, or winds up building Matrioshka brains. All the EV is instead concentrated in mugging scenarios.

For a more common example, consider religion and Pascal’s original wager. Many religions claim literally infinite utility or disutility depending on whether or not you believe in them. There have been a number of arguments against Pascal’s wager over the years, but if you accept fanaticism then it would require a seemingly-implausibly-precise balancing of factors for an expectational utilitarian to avoid being forced into one of the following beliefs:

We should spend all our resources to convince ourselves and others to believe in the most plausible religion
We should spend all our resources studying religion to get an infinitesimally higher chance of identifying the correct one
We should spend all our resources trying to prevent new people from being born, in order to minimize the expected number of people who wind up in various hells
We should spend all our resources attempting to deconvert religious people and destroy the propagation of religion, in case there is a god who only rewards atheists

There are many other possibilities for extremely-unlikely but astronomically-valuable scenarios. These can have either a positive or a negative value, and should be the dominant factor in the expected-value calculations of a fanatical agent. Tarsney’s model uses the fanatical implications of expectational utilitarianism in a limited way, to imply our decisions should be dominated by the possibility of interstellar Matrioshka brains, without considering the much-weirder expected-value implications of actually agreeing with Pascal’s mugging.^[23] It thus seems that a bounded acceptance of fanaticism is a prerequisite for accepting the model in this paper, though this is never explicitly addressed.

The conclusions require a potentially-contentious degree of fanaticism

While the model implicitly assumes an upper bound on our acceptance of fanaticism, it more explicitly requires a lower bound as well. The strength of the conclusion depends on whether you, as the reader, accept the degree of fanaticism required in the expected value estimation. We expect that most EAs will find the lower-bound probabilities used in the paper to be well beyond what they consider tolerable, leading the conclusions to hinge upon questions of tractability that are explicitly outside the scope of the paper.

Tarsney attempts to be pessimistic in his choice of model parameters, leading to a conclusion that plausibly hinges “on a conjunction of improbable assumptions with joint probability on the order of (say) 10^-18 or less”. Certainly a number of EAs are indeed willing to accept that level of fanaticism, but we believe it is a concerningly low probability.

In one of the most highly-rated EA forum posts of all time, Scott Alexander writes “I don't think long-termists are actually asking for $30 million to make the apocalypse 0.0001% less likely - both because we can't reliably calculate numbers that low, and because if you had $30 million you could probably do much better than 0.0001%.” This has two implications about Alexander’s model of longtermists: that they think 10^-6 chances are too low to be trustworthy, and that they think the real probability of success is much higher. This matches our own understanding of large parts of the community as well. The per-dollar tractability estimate in Tarsney’s paper is almost two million times lower than the one in Alexander’s example, and Tarsney’s argument is only robust to empirical beliefs about things like ENE rates if we further accept an uncertainty argument requiring an additional several-orders-of-magnitude decrease in probability.

If you as the reader find this an unacceptable reliance on fanaticism, then the conclusions will depend on whether you believe that the real probability of success for longtermist interventions is much higher than the estimate in the paper. The key sources of uncertainty are the tractability of longtermist interventions, the empirical ENE rate, the likelihood of humanity colonizing the stars, and the likelihood that we will be able to support a much denser amount of value per star than we can currently produce. Most previous discussions we have seen primarily focus on tractability, which is largely outside the scope of this paper.

Accepting Tarsney’s conclusion depends on having your tolerance for fanaticism exceed your empirical beliefs about the amount of uncertainty. If you have previously thought about this issue without considering the rate of ENEs, then this argument may well cause you to update away from longtermism.

Suspicion of negative side-effects in first-order expected value calculations

Tarsney’s model assumes that the possibility of negative side-effects is unimportant: “I will assume (as seems to be true of most real-world debates) that the primary disagreement between longtermists and empirical skeptics is not about the expected value of available neartermist interventions (i.e., how much good we can do in the near term) nor about harmful side-effects of longtermist interventions…” This is a sensible assumption given the difficulty of estimating side-effects, but it still leads to a major potential blind-spot in the analysis. We suspect some readers to be generally suspicious of these kinds of expected-value arguments for unusual positions, which have a significant possibility of negative side effects.

There is a common thought-experiment in the ethics literature about whether we ought to kill people in order to donate their organs, if donating the organs would save more than one life in expectation. There is a similar thought experiment about whether we ought to steal money in order to donate to the impoverished. Most people reject these arguments even when the first-order expected value estimate is positive (a position we personally agree with). Even EAs, who tend to favor consequentialism, generally reject this kind of reasoning. A common counter-argument is that such actions have significant risk of negative second-order effects, such as eroded social trust, which is more important than the positive first-order effects. Even if you aren’t such a hardcore consequentialists that you view a decision to eschew lifesaving neartermist interventions as morally equivalent to murder, these examples imply that negative side-effects can easily flip your expected value estimation.

You might consider this a significant weakness in the model if you:

Believe there are concrete negative side-effects of working on longtermist interventions that outweigh the benefits. (As an example, our impression is that a number of community members feel like longtermism is true, but that it is over-emphasized within EA to a degree that is unhealthy for the movement)
Believe it is not appropriate to use uncertain EV-style reasoning to support causing concrete near-term harm, and additionally believe that failing to focus on neartermist interventions is sufficiently similar to causing harm
Are generally distrustful of uncertain EV arguments of this style

Relation to AI risk and the longtermist movement

The Effective Altruism movement in recent years has pivoted to focusing more on longtermist interventions and causes, including biosecurity, technical AI alignment research, AI governance, and others. Billions of dollars of longtermist funding is planned to be allocated in the next few years and many believe that we are likely in the most important century. So how does this model relate to many of the interventions people are focused on and what are the implications of our findings for the community?

We think that, although this model can and is used to evaluate certain longtermist interventions focused on having a persistent impact, there are many plausible worldviews under which focusing on risks from advanced technologies such as AI, bioengineering and nuclear weapons makes sense and that are not covered by this paper. One is if you believe that the likelihood of an existential catastrophe is significant this century, you do not need the justification of Dyson Spheres to support working on this issue.^[24]

One global issue that many are worried about is risks from advanced artificial intelligence. While risks from AI could pose an existential risk, there are some important differences between them and the longtermist intervention considered in the paper:

Some AI failure scenarios look like they may have the property of complement persistence whereby the universe could be placed into a very persistent negative state.^[25] Interventions that reduce the likelihood of such s-risks are not covered by the examples in the paper, but this would likely strengthen the case for having a persistent-impact, if we can reliably reduce the probability of these events.
(Claim) A superintelligent AI-enabled civilization would likely have a lower rate of extinction events, causing the rate of ENEs to be lower if we succeed in creating an aligned AI, thereby increasing the expected value of work towards this.
With many forecasting that AGI will be developed this century, it seems likely that p is higher than 1% if all of humanity dedicates all of their resources over the next 1000 years.
The problem of reducing risks from AI might not look like an all or nothing problem– alignment could be a spectrum and a marginally more aligned AI could be valuable in itself.^[26] This reduces the concerns around fanaticism because we don’t have to rely on small probabilities of success, but rather, any work that can nudge the state of our world in a better direction is impactful. Said another way, AI alignment research might have the quality of being able to slightly improve the future, rather than just increasing the probability of a future utopia by a small amount.

Conclusion

Tarsney has given us a model to estimate the expected value of longtermist interventions that aim to make an impact by increasing the probability we are in a persistent state in the future; showing that the higher the probability of some external event (either positive or negative) resetting the world, the lower the value of the intervention. As a result, some longtermists might conclude there is strong evidence that (x-risk mitigation) longtermist interventions can have very large benefits. However, we raised concerns on several fronts.

First, we have concerns about the form of the model and the methods of accounting for uncertainty within the parameters of the model.

The inclusion of a 1,000 year delay before ENEs (the key level for modeling persistence) begin requires, in essence, that we assume persistence of the intervention for at least 1,000 years. This seems likely to be a significant problem for persistence skeptics.
If we attempt to adjust the parameter p within the model to account for the above concern, it introduces complexity that is difficult to justify.
There is nothing within the model that is specific to epistemic concerns. Rather, it models persistence. Thus, we find it arbitrary to say that the epistemic concerns were modeled.
Tarsney's method of accounting for uncertainty (which we've called the minimum credences approach) leads to results that favor longtermism because they include the possibility of enormous positive futures, but exclude by assumption similarly large negative futures.

In addition to the concerns we have around decisions made within the model, we also believe there are some issues with the decision-theoretic framework used in this paper.

Adopting “expectational utilitarianism” and EV maximization produces scenarios where the bulk of the probability mass within the EV is born by low probability and high value tail scenarios, leading to decisions based on fanaticism (i.e. an acceptance of Pascal's mugging scenarios). We argue that we don't think most EAs think this way.
The first-order-approximation style of expected value reasoning in the paper doesn’t account for negative second-order effects of focusing our current resources primarily on longtermism. To put it bluntly, the opportunity cost of working on longtermist interventions is that we allow poor people to die preventable deaths today.

Finally, we've explored how Tarsney's model might apply to a specific, popular longtermist intervention: AI safety.

In conclusion, we are pleased that Tarsney has created this model to allow us to apply analytical reasoning to naive BOTEC estimates of the EV of longtermist interventions. Ultimately, it seems to us that the results of the model imply that longtermists should curtail their estimated value of longtermist interventions. Tarsney's use of "minimum credences" reasoning to attempt to help longtermism recover from the blow dealt by the initial run of the model was not entirely convincing, primarily due to heavy tails and fanaticism.

We would like to see if future iterations of the model produce useful results while eliminating the 1,000 year waiting time between the intervention and the long-term future. This would allow persistence concerns to be encapsulated more succinctly by r (the rate of ENEs).

Thank you to Training for Good and all those that provided comments and feedback, especially Ronja Lutz, Cillian Crosson and Christian Tarsney.

^{^}
See Apply for Red Team Challenge [May 7 - June 4] for the intro post.
^{^}
Another aspect of persistence which does not seem to be explored in this paper is the possibility that the impact of the intervention could fade out on its own in time. The reason this matters is that the methods Tarsney uses to estimate r do not seem to account for such fadeout effects.
^{^}
To be clear, although we use the phrasing Larry Longtermist and the example of working on x-risk reduction, one does not have to hold longtermist views to prioritize existential risk reduction.

^{^}

To illustrate an example of fade-out (see footnote above) based on the same example, perhaps Larry's work to reduce nuke risk tends to fade away over time. Imagine he did a lot of work to educate leaders. As a result that particular generation was particularly good at deescalating conflicts. But as they are replaced or retire, the new leadership has effectively "forgotten" the lessons Larry taught. In order to maintain ongoing vigilance within the leadership, he would have to continue his educational work every few years. This feels qualitatively different from ENEs as Tarsney describes them. [What type of skepticism is this? Epistemic persistence]

^{^}

Chris: By maximum predictability horizon, I mean the point at which forecasters no longer do better than chance. For the purposes of the model analyzed here, we could just as easily say 100 years because the key timescale is the 1,000 year period between the intervention and the beginning of the long-term future.

Further, to avoid stretching the evidence on forecasting too far, it should be noted that the literature on forecasting is focused on events and world-states of a different nature than we are dealing with here. Predicting whether or not human civilization will still exist in 2122 is very different (and not just in timescale) than forecasting the price of a given commodity in a decade or whether an armed conflict will take place in a given region in the next ten years.

^{^}

Chris: That is: if successful predictable influence is always contingent on successful prediction, then the chances of predictable influence must, by definition, be lower than the chances of prediction. Epistemic status is exploratory on this point. In particular, I have trouble envisioning how my chances of predictable influence will ever be better than my chances at prediction. Since my objective is to select the intervention with the maximum expected value, the influence must be predictable and attributable to my intervention.

^{^}

Chris: This is a more straightforward and practical application of Bayesian reasoning than the work of assigning credibility bounds to various futures that may come to pass in thousands or millions of years.

^{^}

Meaning it's difficult to imagine a world in which 100% of human resources go toward a specific intervention for the basic reason that maintaining ourselves and our society consumes a considerable chunk of total resources.

^{^}

One could imagine scenarios. There have allegedly been outbreaks of deadly disease caused by agents escaping from a lab whose mission is to save humans from those very agents. Alternately, In advertising the risk to humanity of bioengineered threats (or advanced AI, military use of drone swarms), one could inadvertently alert a malicious actor to the possibility.

^{^}

Technically, V_srepresents the difference in expected value per star per unit time between worlds where we are in S versus the complement state NS.

^{^}

More precisely, min EV(L) = X% * EV(L|favorable values, point estimates for others) + (100-X)% * EV(L|Conservative steady-state model in 4.2)

^{^}

Orders of Magnitude. Mathematically: $E V (L) > 10^{11} * E V (N) .$

^{^}

This assumption ensures that the minimum EV calculated is in-fact a minimum since if it is the sum of two terms weighted appropriately, the expected value of L when V_s > 10²⁰ (V/yr)/star and the expected value of L when V_s = 0 or the conservative steady-state remain on Earth indefinitely scenario of our long-run future, which is strictly less than the true expected value only if the expected value of V_s when V_s < 10²⁰ (V/yr)/star is positive.

^{^}

This refers to a scenario where we remain a physical species and live on planet.

^{^}

Another scenario is the spread of uncontrolled ecosystems to other planets in which wild animal suffering outweighs other flourishing. It also seems possible for humanity to spread into the universe but have net-negative lives on average due to societal reasons like totalitarian regimes, widespread poverty, etc.

^{^}

Of course, this is also just one of multiple arguments that could be made on why Tarsney’s assumption is unjustified.

^{^}

It is possible for the conditional EV of V_s to be negative but for the EV of L in the non-optimist scenarios to still be positive if the steady-state (we remain on Earth indefinitely) EV outweighs the negative cubic space colonization EV. (i.e. V_e outweighs V_s).

^{^}

Tarsney: "I will call any challenge to longtermism that does not require rejecting expectational utilitarianism an empirical challenge, since it does not rely on normative claims unfavorable to longtermism" (p. 4).

^{^}

Tarsney: "The case for longtermism may depend either on plausible but non-obvious empirical claims or on a tolerance for Pascalian fanaticism" (abstract).

^{^}

Note that the relevant probability is actually the probability that they are telling the truth, minus the probability that they will actually swap the rewards and pay out only if you deny them. It still seems like the probability that they are truthful should be higher, even if only by a very small amount.

^{^}

One could disagree on the grounds that our credence should decrease superlinearly in the proposed payoff, but assuming that the mugger would actually pay out Graham's number of utils, it seems implausible to believe they have less than a 10% chance of paying out Graham's number * 10.

One could further believe that having a decision procedure that leads to paying the mugger would in itself cause enough muggings to be net negative in expectation. If one actually believed that mugging scenarios could provide more EV than the entire future of humanity would otherwise generate, then this view is implausible as well.

^{^}

In this example, the probability would have to be higher than the inverse of Graham's number times the EV of the rest of the model. Given the magnitude of Graham's number, we contend that this is effectively 0 probability, and that it would be unreasonably overconfident to assert the non-existence of Operators of the Seventh Dimension with such probability.

^{^}

Tarsney was kind enough to provide some additional comments on this: ”My own tentative view here is described in my paper "Exceeding Expectations" -- in short, and very roughly, I think that the only real requirement of ethical decision-making under risk is first-order stochastic dominance; that in virtue of background risk, this requires us to be de facto EV-maximizers, more or less, when the relevant probabilities aren't too small”.

^{^}

Anjay: I personally equate ‘significantly’ with a greater than 10% chance of extinction but others likely have different intuitions. This is for the reason that if the likelihood is very high, the expected value can outweigh interventions focused on the nearterm just considering the people alive today.

^{^}

Anjay: This is based on the claim that a truly misaligned, superintelligent power-seeking AI would likely be very persistent and thus the rate of ENEs that remove it from this state would likely be very very low.

^{^}

Anjay: One example here could be something like the “You get what you measure” scenario from Paul Christiano’s, What Failure Looks Like where alignment efforts can help make our measures slightly better and more correlated with what we actually care about despite still falling short of a flourishing future.

Show all footnotes

Effective Altruism Forum
EA Forum

Red teaming a model for estimating the value of longtermist interventions - A critique of Tarsney's "The Epistemic Challenge to Longtermism"

21

Summary (2 mins)

Introduction

A brief introduction to Tarsney's paper and model

Anecdotal illustration

Assumptions

Critiques inside the model

A priori, the skeptical position seems strong

Is forecasting required for predictable influence?

The current form of the model assumes 1,000 years of persistence

Biased incorporation of uncertainty

Summary of section 6.1

An unjustified assumption in the author’s minimum expected value reasoning

Critiques outside the model

Challenging the usage of expectational utilitarianism

The given model implicitly assumes bounded fanaticism

The conclusions require a potentially-contentious degree of fanaticism

Suspicion of negative side-effects in first-order expected value calculations

Relation to AI risk and the longtermist movement

Conclusion

21

Reactions

More posts like this