Abstract from the paper
Longtermists claim that what we ought to do is mainly determined by how our actions might affect the very long-run future. A natural objection to longtermism is that these effects may be nearly impossible to predict— perhaps so close to impossible that, despite the astronomical importance of the far future, the expected value of our present options is mainly determined by short-term considerations. This paper aims to precisify and evaluate (a version of) this epistemic objection to longtermism. To that end, I develop two simple models for comparing “longtermist” and “short-termist” interventions, incorporating the idea that, as we look further into the future, the effects of any present intervention become progressively harder to predict. These models yield mixed conclusions: If we simply aim to maximize expected value, and don’t mind premising our choices on minuscule probabilities of astronomical payoffs, the case for longtermism looks robust. But on some prima facie plausible empirical worldviews, the expectational superiority of longtermist interventions depends heavily on these “Pascalian” probabilities. So the case for longtermism may depend either on plausible but non-obvious empirical claims or on a tolerance for Pascalian fanaticism.
Why I'm making this linkpost
- I want to draw a bit more attention to this great paper
- I think this is one of the best sources for people interested in arguments for and against longtermism
- For people who are interested in learning about longtermism and are open to reading (sometimes somewhat technical) philosophy papers, I think the main two things I'd recommend they read are The Case for Strong Longtermism and this paper
- Other leading contenders are The Precipice, Existential Risk Prevention as Global Priority, and some of the posts tagged Longtermism
- For people who are interested in learning about longtermism and are open to reading (sometimes somewhat technical) philosophy papers, I think the main two things I'd recommend they read are The Case for Strong Longtermism and this paper
- I think this is one of the best sources for people interested in arguments for and against longtermism
- I want to make it possible to tag the post so that people see it later when it's relevant to what they're looking for via tags (e.g., I'd want people who check out the Longtermism tag to see a pointer to this paper to come up prominently)
- I want to make it easier for people to get a quick sense of whether it's worth their time to engage with this paper, given their goals (because people can check this post's karma, comments, and/or tags)
- I want to give people a space to discuss the paper in a way that other people can see and build on
- I'll share a bunch of my own comments below
- (I'll try to start each one with a tl;dr for that comment)
- I'll share a bunch of my own comments below
In case anyone is interested, Rob Wiblin will be interviewing Tarsney on the 80,000 Hours podcast next week. Rob is accepting question suggestions on Facebook (I think you can submit questions to Rob on Twitter or by email too).
tl;dr: Tarsney's model updates me towards thinking reducing non-extinction existential risks should be a little less of a priority than I previously thought.
Here's a quote from Tarsney (which makes more sense after reading the rest of the paper):
See also Greaves and MacAskill's concept of "attractor states".
This indeed seems like an interesting implication of Tarsney's model, and indeed updates me towards placing a bit less emphasis on reducing non-extinction existential risks - e.g., reducing the chance of lock-in of a bad governmental system or set of values.
(I already considered this a lower priority from longtermists as a whole than reducing extinction risks. But I also thought that longtermists should prioritise investigating this potential priority more than they currently do. I still think that, but now with a bit lower confidence.)
---
That said, I also think Tarsney's phrasing is a bit misleading. He compares "interventions focused on reducing existential risk" to "interventions aimed at reforming institutions or changing social values". But interventions may be aimed at doing the latter as a means to doing the former; one could try to change institutions or social values with the primary goal of ultimately reducing existential risk (or extinction risk specifically). And Tarsney's model doesn't seem to push against those interventions relative to other means of reducing existential risk.
I think Tarsney really wants to compare interventions aimed at reducing extinction risk to interventions ultimately aimed at changing aspects of the long-term future other than whether humanity goes extinct - e.g., again, reducing the chance of lock-in of a bad governmental system or set of values.
This highlights another way in which Tarsney's phrasing seems a bit misleading: existential risk itself already includes non-extinction existential risk. So I think Tarsney should use the term "extinction risk" here.
Surely "lock-in" implies stability and persistence?
Greaves and MacAskill introduce the concept of the 'non-extinction attractor state' to capture interventions that can achieve the persistence Tarsney says is so important, but that don't rely on extinction to do so.
This includes institutional reform:
Yeah, definitely. I see now that I didn't clearly explain what I meant. It's not that I changed my views on how how important the difference between lock-in of a bad governmental system or set of values and a future without such a lock-in is.
It's more like I somewhat updated my views regarding:
And as a result, I somewhat updated me views regarding how much we should focus on preventing these outcomes. Analogous to how I'd update my prioritisation of biorisk if I learned the relevant catastrophes were less likely than I thought, even if no less bad.
(I'm still not sure that explanation is 100% clear.)
And yeah, Greaves and MacAskill's "non-extinction attractor state" concept is relevant here, and I liked that section of their paper :)
OK that's clearer, although I'm not immediately sure why the paper would have achieved the following:
I think Tarsney implies that institutional reform is less likely to be a true lock-in, but he doesn't really back this up with much argument. He just implies that this point is somewhat obvious. Under this assumption, I can understand why his model would lead to the following update:
In other words, if Tarsney had engaged in a discussion about why institutional change isn't actually likely to be stable/persistent, providing object-level reasons for why (which may involve disagreeing with Greaves and MacAskill's points), I think I too would update away from thinking institutional change is that important, but I don't think he really engages in this discussion.
I should say that I haven't properly read through the whole paper (I have mainly relied on watching the video and skimming through the paper), so it's possible I'm missing some things.
[Writing this comment quickly]
I think it makes sense to be a bit confused about what claim I'm making and why. I read the paper and made the initial version of these note a few weeks ago, so my memory of what the paper said and how it changed my views is slightly hazy.
But I think the key point is essentially the arguably obvious point that the rate of ENEs can be really important, and that that rate seems likely to be much higher when the target state is something like "a very good system of government or set of values" or "a very bad system of government or set of values" (compared to when the target state is whether an intelligent civilization exists). It does seem much more obvious that extinction or non-extinction are each stronger attractor states that particularly good or particularly bad non-extinction outcomes are.
This is basically something I already knew, but I think Tarsney's models and analysis made the point a bit more salient, and also made it clearer how important it is (since the rate of ENEs seems like probably one of the most important factors influencing the case for longtermism).
But what I've said above kind-of implicitly accepts Tarsney's focus (for the sake of his working example) on simply whether there is an intelligent civilization around, rather than what it's doing. In reality, I think that what the civilization is doing is likely also very important.[1] So the above point about particularly good or particularly bad non-extinction outcomes maybe being only weak attractor states might also undermine the significance of keeping an intelligent civilization around.
But here's one way that might not be true: Maybe we think it's easier to have a lock-in of - or natural trends that maintain - a good non-extinction outcome than a bad non-extinction outcome. (I think Ord essentially implies this in The Precipice. I might soon post something related to this. It's also been discussed in some other places, e.g. here.) If so, then the point about the rate of ENEs suggests the case for avoiding unrecoverable dystopias and unrecoverable collapses might be weak, but it wouldn't as strongly suggest the case for avoiding extinction is weak.
...but this all seems rather complicated, and I'm still not sure my thinking is clear, and even less sure my explanation is clear!
[1] Tarsney does acknowledge roughly this point later in the paper:
OK thanks I think that is clearer now.
tl;dr: Tarsney seems to me to understate the likelihood that accounting for non-human animals would substantially affect the case for longtermism.
Tarsney includes a helpful appendix listing the simplifications made in his model/paper, and the rationales for these simplifications. Here's a passage from that:
I appreciate Tarsney's caveat that "this is far from obvious", and, given that caveat, I don't strongly disagree with this sentence. But it seems quite plausible to me[1] that considering those effects would strengthen or weaken the case for paradigmatic longtermist interventions by more than 1-2 orders of magnitude, or even that it would flip the sign of the expected value of those interventions.
Relatedly, I also think that considering those effects should plausibly change which longtermist interventions we support (not just whether we support them vs non-longtermist interventions).
(I'm not sure how likely I see these things as, so maybe I actually agree with Tarsney that this "seems unlikely [but with that being far from obvious]".)
See also Non-Humans and the Long-Term Future.
[1] We could operationalise "it seems quite plausible to me that X" as something like "there's at least a 20% chance that I would think X if I spent another 100 hours of thinking about the topic".
On his estimate of the difference in probability we can achieve promoting one state over its complement, it's worth mentioning that this does not consider the possibility of doing more harm than good, e.g. AI safety work advancing AGI more than it aligns it, and with the very low (but in his view, extremely conservative) probabilities that he uses in his argument, the possibility of backfire effects outweighing them becomes more plausible.
Furthermore, it does not argue that we can effectively predict that any particular state is better than its complement, e.g. is extinction good or bad? How should we deal with moral uncertainty, especially around population ethics?
For these reasons, it may be difficult to justifiably identify robustly positive expected value longtermist interventions ahead of time, which the case for longtermism depends on. I mean this even with subjective probabilities, since such probabilities supporting longtermist interventions tend to be particularly poorly-informed (largely for absence of good evidence) and so seem more prone to biases and whims, e.g. wishful thinking and the non-rational particulars of people's brains and priors. This is just deep uncertainty and moral cluelessness.
For what it's worth, I don't think it makes much sense for this paper to address such issues in detail given its current length already, although they seem worth mentioning.
(Also, I read the paper a while ago, so maybe it did discuss these issues and I missed it.)
In line with your comment:
But Tarsney does acknowledge roughly that second point in one place:
He says "low-value" rather than "negative value", but I assume he actually meant negative value, because random wandering between high and low positive values wouldn't produce an EV (for civilization existing rather than not existing) of close to 0.
tl;dr: Tarsney writes "resources committed at earlier time should have greater impact, all else being equal". I think that this is misleading and an oversimplification. See Crucial questions about optimal timing of work and donations and other posts tagged Timing of Philanthropy.
(But that claim was not necessary for any of Tarsney's arguments; he just gave it as one reason why the actual case for longtermism might be stronger than his deliberately conservative estimates suggest.)
Context and explanation:
A core part of Tarsney's model is - roughly speaking - the amount by which spending $1 million on mitigating existential risks changes the probability of being in the target state at a given time, relative to the probability that would occur if the short-termist intervention was used. This parameter is represented by p. The target state means something like "The accessible region of the Universe contains an intelligent civilization”.
Tarsney makes:
Tarsney writes that "This is an extremely conservative lower bound", and that "I think it would be justifiable to adjust p upward from this lower-bound estimate by a several-order-of-magnitude “fudge factor”, if we were so inclined" (though he doesn't do this for his paper). He gives two reasons for this.
The first has to do with diminishing marginal returns and the fact that we'll by default spend far less than all our collective time and resources over the next 1000 years to reducing existential risk. Thus, spending an extra $1 million on the current margin will probably achieve far more than one would expect by "simply by computing the fraction of humanity’s resources over the next thousand years that can be bought for 1 million". This argument makes sense to me, and I do think it suggests Tarsney's estimate for p is a very conservative one (as he intends).
But then he writes:
It's definitely true that there are many reasons why resources committed at an earlier time could have a greater impact. And the reason Tarsney raises is a valid one; we could describe this as discounting for the possibility that the later use of resources would be "too late". This is an extreme example of how we might miss "windows of opportunity" if we wait too long.
But there are also many reasons why resources committed at a later time could have a greater impact. This is especially true if we don't count resources as committed to a problem when they're used in an investment-like way in order to generate more resources that can be committed later, but it's even true if we do count resources as already committed to a problem when they're "merely invested".
In particular, it's possible that "leverage over the future" (or hingyness, pivotality, etc.) will increase in future. This could occur if:
(For explanation and discussion of the above points, see here.)
Of course, the opposite effects could also occur. My point is merely that "resources committed at [an] earlier time should have greater impact, all else being equal" seems to be either false or misleading.
(I think it'd be reasonable for Tarsney to merely claim that his all-things-considered view is that resources committed at an earlier time will in practice probably have a greater impact. But this more uncertain stance would then weaken the case for a several-order-of-magnitude upwards adjustment of p.)
A final quick thing that came to mind: I think that Ord's concept of existential security could be represented in Tarsney's models as the value of r asymptotically decreasing towards 0 over time. I'd be interested to hear people's thoughts on whether that seems accurate and, if so:
(I haven't tried to think this through myself yet.)
I think it'd be interesting to run a sensitivity analysis on Tarsney's model(s), and to think about the value of information we'd get from further investigation of:
It seems like the value of information from that might be very high, at least if we think we don't want to accept fanaticism. This is because Tarsney's paper suggests reasonable empirical views could either support the case for longtermism without requiring fanaticism or only support the case for longtermism if we accept fanaticism. So further research on these models, alternative models, and these parameters could perhaps give us a much better sense of how robust the case for longtermism is.
To some extent, this comment can be boiled down to something that was obvious already: "The case for longtermism seems plausible but uncertain, and whether it's true seems very decision-relevant, so maybe investigating whether it's true would be really valuable." But I think Tarsney's paper highlights specific points to look into, and that it would allow for (rough) quantitative estimates of the value of information to be gained by investigating each point.
For a quick and non-quantitative example, it seems that the probability of interstellar settlement has a very large bearing on the results of the model, and it also seems like we should be quite uncertain about that probability.
Some caveats to that:
There's also a talk. https://globalprioritiesinstitute.org/christian-tarsney-the-epistemic-challenge-to-longtermism/
When I reference work by GPI, I usually link to the page with both the talk and the pdf.
Good point, thanks! I'll edit this post to link to that page instead :)
Just a nitpick
I think that this particular sentence is false or misleading. As Tarsney notes earlier and later, his model and parameter estimates[1] suggests that the case for longtermism survives given either acceptance of fanaticism or plausible but non obvious empirical views. That is, on some plausible empirical views, longtermism doesn't require an appeal to minuscule probabilities of astronomical quantities of value.
(Tarsney's sentence may still be technically accurate, since he says potentially-minuscule. But it seems at least a bit misleading to me.)
[1] Along with certain ethical and decision-theoretic assumptions, e.g. total utilitarianism.
I agree with you that Tarsney hasn't been clear, but I think you've got it the wrong way around (please tell me if you think I'm wrong though). The abstract to the paper says:
These two sentences seem to say different things, as you have outlined. The first implies that you need fanaticism, whilst the second implies you need either fanaticism or non-obvious but plausible empirical views. Counter to you I think the former is actually correct.
Tarsney initially runs his model using point estimates for the parameters and concludes that the case for longtermism is "plausible-but-uncertain" if we assume that humanity will eventually spread to the starts, and "extremely demanding" if we don't make that assumption. Therefore longtermism doesn't really "survive the epistemic challenge" when using point estimates.
Tarsney says however that "The ideal Bayesian approach would be to treat all the model parameters as random variables rather than point estimates". So if we're Bayesians we can pretty much ignore the conclusions so far and everything is still to play for.
When Tarsney does incorporate uncertainty for all parameters, the expectational superiority of longtermism becomes clear because "the potential upside of longtermist interventions is so enormous". In other words the use of random variables allows for fanaticism to take over and demonstrates the superiority of longtermism.
So it seems to me that it really is fanaticism that is doing the work here. Would be interested to hear your thoughts.
EDIT: On a closer look at his paper Tarsney does say that it isn't clear how Pascalian the superiority of longtermism is because of the "tremendous room for reasonable disagreement about the relevant probabilities". Perhaps this is what you're getting at Michael?
I actually think that those two sentences are consistent with each other. And I think that, as Tarsney says, his models and estimates do not show that fanaticism is necessarily required for the case for longtermism to hold.
Basically (from memory and re-skimming), Tarsney gives two model structures, some point estimates for most of the parameters, and then later some probability distributions for the parameters. He intends both models to represent plausible empirical views. He intends his point estimates and probability distributions to represent beliefs that are reasonable but at the pessimistic end for longtermism (so it's not crazy to think those things, but his all-things-considered beliefs about those parameters would probably be more favourable to longtermism). And he finds that the case for longtermism holds given the following assumptions:
(There are various complications, caveats, and additional points, but this stuff is key.)
So his reasoning is consistent with it being that case that the most reasonable empirical position would support longtermism without requiring any minuscule probabilities of extremely huge payoffs, or with that not being the case.
E.g., that could be the case is if we should have a non-minuscule credence in the cubic growth model and that "prima facie plausible" value for the long-run rate of ENEs.
Incorporating uncertainty, and this suggesting that the potential upside of one thing makes that the thing we should go for, doesn't necessarily mean fanaticism is involved. E.g., I made many job applications that I expected would turn out to have not been worth the time they took, due to the potential upside, and without having a clear point estimate for my odds of getting the job or how valuable that'd be (so I sort-of implicitly had a probability distribution over possible credences). This'd only be fanatical if the probabilities involved were minuscule and the payoffs huge enough to "make up for that", and Tarsney's analysis suggests that that may or may not be the case when it comes to longtermism.
Here's a relevant section from the paper:
I think maybe a useful framing to have in mind is that Tarsney's paper was not aimed at actually working out the likelihood of each model structure relative to the other, or working out what precise parameter estimates would be most appropriate. And those are things we should be very uncertain about.
So perhaps our 90% credible interval (or something like that) for what we'd believe after some years of further research should include both probability estimates/distributions in which the case for longtermism survives without fanaticism and probability estimates/distributions in which the case for longtermism would survive only if we accept fanaticism.
Thanks yeah, I saw this section of the paper after I posted my original comment. I might be wrong but I don't think he really engages in this sort of discussion in the video, and I had only watched the video and skimmed through the paper.
So overall I think you may be right in your critique. It might be interesting to ask Tarsney about this (although it might be a fairly specific question to ask).
Yeah, I plan to suggest some questions for Rob to ask Tarsney later today. Perhaps this'll be one of them :)
tl;dr: The paper ignores 2 factors that could strengthen the case for longtermism - namely, possible increases in how efficiently resources are used and in what extremes of experiences can be reached.
Tarsney writes:
I essentially agree with all those points. Furthermore, given my current moral and empirical views, I think those factors are probably the main factors driving the case for longtermism.
But I think there are at least two other factors that are relevant and that might substantially add to the case for longtermism. (Though it's possible that they add so little relative to the other factors that they won't really be decision-relevant.)
---
The first factor is possible increases in efficiency of resource usage. For a given quantity and type of matter or energy, future civilizations may be able to more efficiently convert that into moral value or disvalue than current civilization can. For example, if we can create simulated humans or animals (or artificial sentiences) that are morally relevant, these may be able to experience the same pleasures or pains we can with substantially less energy required.
Thus, the factor by which total quantity of moral (dis)value in the long-term future is expected to be larger than that in the present + near-term future may be even larger than one would think if one considered only the duration, spatial extent, and resources used in the future.
(Tarsney's term "resource utilization" might seem like it should capture this idea, but his description suggests that he has in mind only changes in how much resources we use, not changes in how efficiently we use them.)
---
The second factor is possible increases in the extremes of experience that can be reached. It seems plausible that future civilizations will be able to create experiences more extremely good or bad than experiences that we can create today or that are experienced in nature. If so, this might increase the importance of the long-term future, if either of the following things are true:
I'd guess that this factor is much less important than the efficiency factor, but it seems very hard to say.
The same basic point might also apply to non-experience things that might be morally good or bad. (E.g., if art has intrinsic moral value, perhaps future civilization could create art that is more extremely good than current art.)
---
I've seen roughly those ideas idea discussed in various places before, though I can't remember precisely where. The concept of hedonium can be seen as a special case of the efficiency factor.
Chapter 8 of The Precipice, on "Our Potential", is also relevant here. Ord splits that chapter into discussion of the future's potential duration, its potential scale, and its potential quality. I imagine that the points I raised above were covered in that chapter, but I can't remember for sure (I read the book a year ago, and foolishly enough I had not yet converted to using Anki as I read).
---
I think it'd be interesting for someone to think about how Tarsney's models or parameter estimates could be tweaked to account for these factors, and maybe to see how much difference this makes (after plugging in some reasonable-seeming distributions for the parameters).
I think these would basically be just constant factors multiplying the whole impacts, assuming we remain near the peaks for far longer than we spend making significant moves towards the peaks.
The difference between intentionally optimizing for hedonistic welfare and a default with human-like minds could itself be on the scale of an existential catastrophe for a classical utilitarian, and more important than extinction, although it could also be far less tractable and not really an attractor state at all if it's not stable/persistent. This could also generalize to other theories of welfare, just with different targets.
tl;dr: Tarsney slightly misrepresents an existential risk estimate.
Tarsney writes:
But what Rees actually writes is:
(Here's one online source quoting Rees. I've seen the same quote elsewhere too.)
Whether "our present civilisation on Earth" survives is very different from whether humanity survives. I haven't read Rees' book, so I don't know what he intended that quote to mean, but I'd guess he'd include things like a major population collapse that lasts a few decades as "our present civilisation on Earth not surviving". Arguably, his forecast could even be seen as capturing the chance that we just very substantially change our political, cultural, and economic systems, in the same way as how Europe in the 1900s was arguably a "different civilisation" to Europe in the year 100CE.
Also, Rees doesn't give a 0.5 probability; he gives a probability no better than 0.5.
Also, Tarsney writes:
I think it'd be better to direct people to the appendix of Beard et al. (2020), since that's more comprehensive and up-to-date. (I also really like the article itself.)
Perhaps unsurprisingly, I also think it'd be even better-er to direct people to my database, since that's even more comprehensive and up-to-date (and people can and do make suggestions to it, which I process, such that it should presumably remain the most comprehensive resource, rather than being frozen in time). But I can understand Tarsney preferring to refer readers to an academic source.
(Incidentally, if there's anyone who'd in theory like to cite my database, but can't do so unless it's hosted somewhere else - e.g., a preprint server - or needs it to look different, please let me know and I'll see what I can do.)
tl;dr I'm aware of 1-3 other things that might count as more pessimistic estimates of near-term existential risk in the academic literature.
Specifically:
For further details and sources, see my Database of existential risk estimates (or similar) (see here for the accompanying post).
But:
So Tarsney's claim is reasonable on this front; I'm just adding some extra info.
There are also some more pessimistic estimates in sources that aren't academic but do seem similarly worth paying attention to to Rees' estimate; see my database.
Thanks for posting this! Your linkpost actually got me to watch the talk for the first time, even though I was aware of this paper for a while.
I think some variant of the cubic growth model could be useful for figuring out whether trying to reduce x-risk is better than trying to make durable changes to the long-term "trajectory" of the social welfare curve. I spent some time a few months ago trying to address this by modeling the trajectory of humanity, so I appreciate this paper for proposing even a simpler toy model.
I have rough thoughts about how the utility from economic growth could be incorporated: Assume that each star system has a growth rate g that the residents of that star system can influence (e.g. through policy). The economy of each star system tends to grow exponentially, but GDP per capita has logarithmic utility, so the utility of the star system us(t) grows roughly linearly.
If the economy of each star system starts at a steady state, then grows exponentially at g starting at time t0, the time at which humanity arrives at the star system, we get us(t)=u0+max(0,g(t−t0)). If the star system's GDP is capped at exp(umax), then we get us(t)=u0+min(max(0,g(t−t0)),umax).
To incorporate economic growth into the trajectory model used in the paper, we can replace n(s⋅(t−tℓ)) with the cross-correlation of us(t) and n(s⋅(t−tℓ)) (this assumes that all star systems have the same growth rate). Since us(t) is piecewise linear and n(s⋅(t−tℓ)) is cubic, the cross-correlation is piecewise quintic (it's the integral of a cubic function times a linear function). My gut tells me that having a piecewise quintic term in the trajectory function instead of a cubic term isn't going to change much about the implications of the model.
Note: I realize that by using GDP per capita, I'm leaving out the population of each star system. This would result in multiplying us(t) by a function that models the population over time, starting at time t0.