Working to reduce extreme suffering for all sentient beings. Author of 'Suffering-Focused Ethics: Defense and Implications'.

Wiki Contributions


Are we "trending toward" transformative AI? (How would we know?)

I must admit that I’m quite confused about some of the key definitions employed in this series, and, in part for that reason, I’m often confused about what claims are being made. Specifically, I’m confused about the definitions of “transformative AI” and “PASTA”, and find them to be more vague and/or less well-chosen than what sometimes seems assumed here. I'll try to explain below.

1. Transformative AI (TAI)

1.1 The simple definition

The simple definition of TAI used here is "AI powerful enough to bring us into a new, qualitatively different future". This definition seems quite problematic given how vague it is. Not that it is entirely meaningless, of course, as it surely does give some indication as to what we are talking about, yet it is far from meeting the bar that someone like Tetlock would require for us to track predictions, as a lot of things could be argued to (not) count as “a new, qualitatively different future.”

1.2 The Industrial Revolution definition

A slightly more elaborate definition found elsewhere, and referred to in a footnote in this series, is “software (i.e. a computer program or collection of computer programs) that has at least as profound an impact on the world’s trajectory as the Industrial Revolution did.” Alternative version of this definition: “AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.”

This might be a bit more specific, but it again seems to fall short of the Tetlock bar: what exactly do we mean by the term “the world’s trajectory”, and how would we measure an impact on it that is “at least as profound” as that of the Industrial Revolution?

For example, the Industrial Revolution occurred (by some definitions) roughly from 1760 to 1840, about 80 years during which the world economy got almost three times bigger, and we began to see the emergence of a new superpower, the United States. This may be compared to the last 80 years, from 1940 to 2020, what we may call “The Age of the Computer”, during which the economy has doubled almost five times (i.e. it’s roughly 30 times bigger). (In fact, by DeLong’s estimates, the economy more than tripled, i.e. surpassed the relative economic growth of the IR, in just the 25 years from 1940 to 1965.) And we saw the fall of a superpower, the Soviet Union; the rise of a new one, China; and the emergence of international institutions such as the EU and the UN.

So doesn’t “The Age of the Computer” already have a plausible claim to having had “at least as profound an impact on the world’s trajectory as the Industrial Revolution did”, even if no further growth were to occur? And by extension, could one not argue that the software of this age already has a plausible claim to having “precipitated” a transition comparable to this revolution? (This hints at the difficulty of specifying what counts as sufficient “precipitation” relative to the definition above: after all, we could not have grown the economy as much as we have over the last 80 years were it not for software, so existing software has clearly been a necessary and even a major component; yet it has still just been one among a number of factors accounting for this growth.)

1.3 The growth definition

A definition that seems more precise, and which has been presented as an operationalization of the previous definition, is phrased in terms of growth of the world economy, namely as “software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere [and] that it would be economically profitable to use it).”

I think this definition is also problematic, in that it fails in significant ways to capture what people are often worried about in relation to AI.

First, there is the relatively minor point that it is unclear in what cases we could be justified in attributing a tenfold acceleration in the economy to software (were such an acceleration to occur), rather than to a number of different factors that may all be similarly important, as was arguably the case in the Industrial Revolution.

For instance, if the rate of economic growth were to increase tenfold without software coming to play a significantly larger role in the economy than it does today, i.e. if its share of the world economy were to remain roughly constant, yet with software still being a critical component for this growth, would this software qualify as TAI by the definition above? (Note that our software can get a lot more advanced in an absolute sense even as its relative role in the economy remains largely the same.) It’s not entirely clear. (Not even if we consult the more elaborate “Definition #2” of TAI provided here.) And it’s not entirely irrelevant either, since economic growth appears to have been driven by an interplay of many different factors historically, and so the same seems likely to be true in the future.

But more critical, I think, is that the growth definition seems to exclude a large class of scenarios that would appear to qualify as “transformative AI” in the qualitative sense mentioned above, and scenarios that many concerned about AI would consider “transformative” and important. It is, after all, entirely conceivable, and arguably plausible, that we could get software that “would bring us into a new, qualitatively different future" without growth rates changing much. Indeed, growth rates could decline significantly, such that the world economy only grows by, e.g., one percent a year, and we could still — if such growth were to play out for another, say, 150 years — end up with “transformative AI” in the sense(s) that people are most worried about, and which could in principle entail a “value drift” and “lock-in” just as much as more rapidly developed AI.

I guess a reply might be that these are just very rough definitions and operationalizations, and that one shouldn’t take them to be more than that. But it seems that they often are taken to be more than that; for instance, the earlier-cited document that provides the growth definition appears to say about it that it “best captures what we ultimately care about as philanthropists”.

I think it is worth being clear that the definitions discussed above are in fact very vague and/or that they diverge in large and important ways from the AI scenarios people often worry about, including many of the scenarios that seem most plausible.


PASTA was defined as: “AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement.”

This leaves open how much of a speed-up we are talking about. It could be just a marginal speed-up (relative to previous growth rates), or it could be a speed-up by orders of magnitude. But in some places it seems that the latter is implicitly assumed.

One might, of course, argue that automating all human activities related to scientific and technological progress would have to imply a rapid speed-up, but this is not necessarily the case. It is conceivable, and in my view quite likely, that such automation could happen very gradually, and that we could transition to fully or mostly automated science in a manner that implies growth rates that are similar to those we see today.

We have, after all, automated/outsourced much of science today, to an extent that past scientists might say that we have, relative to their perspective, already automated the vast majority of science, with scientifically-related calculations, illustrations, simulations, manufacturing, etc. that are, by their standards, mostly done by computers and other machines. And this trend could well continue without being more explosive than the growth we have seen so far. In particular, the step from 90 percent to 99 percent automated science (or across any similar interval) could happen over years, at a familiar and fairly steady growth rate.

I think it’s worth being clear that the intuition that fully automated science is in some sense inevitable (assuming continued technological progress) does not imply that a growth explosion is inevitable, or even that such an explosion is more likely to happen than not.

Forecasting transformative AI: what's the burden of proof?

Thanks for your reply :-)

Most of your post seems to be arguing that current economic trends don't suggest a coming growth explosion.

That's not quite how I'd summarize it: four of the six main points/sections (the last four) are about scientific/technological progress in particular. So I don't think the reasons listed are mostly a matter of economic trends in general. (And I think "reasons listed" is an apt way to put it, since my post mostly lists some reasons to be skeptical of a future growth explosion — and links to some relevant sources — as opposed to making much of an argument.)

This post is arguing not "Current economic trends suggest a growth explosion is near" but rather "A growth explosion is plausible enough (and not strongly enough contraindicated by current economic trends) that we shouldn't too heavily discount separate estimates implying that transformative AI will be developed in the coming decades."

I get that :-) But again, most of the sections in the cited post were in fact about scientific and technological trends in particular, and I think these trends do support significantly lower credences in a future growth explosion than the ones you hold. For example, the observation that new scientific insights per human have declined rapidly suggests that even getting digital people might not be enough to get us to a growth explosion, as most of the insights may have been plugged already. (I make some similar remarks here.)

Additionally, one of the things I had in mind with my remark in the earlier comment relates to the section on economic growth, which says:

My main response is that the picture of steady growth - "the world economy growing at a few percent per year" - gets a lot more complicated when we pull back and look at all of economic history, as opposed to just the last couple of centuries. From that perspective, economic growth has mostly been accelerating, and projecting the acceleration forward could lead to very rapid economic growth in the coming decades.

In relation to this point in particular, I think the observation mentioned in the second section of my post seems both highly relevant and overlooked, namely that if we take a nerd-dive into the data and look at doublings, we have actually seen an unprecedented deceleration (in terms of how the growth rate has changed across doublings). And while this does not by any means rule out a future growth explosion, I think it is an observation that should be taken into account, and it is perhaps the main reason to be skeptical of a future growth explosion at the level of long-run growth trends. So that would be the kind of reason I think should ideally have been discussed in that section. Hope that helps clarify a bit where I was coming from.

Forecasting transformative AI: what's the burden of proof?

I don't feel this post engages with the strongest reasons to be skeptical of a growth explosion. The following post outlines what I would consider some of the strongest such reasons:

MagnusVinding's Shortform

An argument in favor of (fanatical) short-termism?

[Warning: potentially crazy-making idea.]

Section 5 in Guth, 2007 presents an interesting, if unsettling idea: on some inflationary models, new universes continuously emerge at an enormous rate, which in turn means (maybe?) that the grander ensemble of pocket universes consists disproportionally of young universes.

More precisely, Guth writes that, "in each second the number of pocket universes that exist is multiplied by a factor of exp{10^37}." Thus, naively, we should expect earlier points in a given pocket universe's timeline to vastly outnumber later points — by a factor of exp{10^37} per second!

(A potentially useful way to visualize the picture Guth draws is in terms of a branching tree, where for each older branch, there are many more young ones, and this keeps being true as the new, young branches grow and spawn new branches.)

If this were true, or even if there were a far weaker universe generation process to this effect (say, one that multiplied the number of pocket universes by two for each year or decade), it would seem that we should, for acausal reasons, mostly prioritize the short-term future, perhaps even the very short-term future.

Guth tentatively speculates whether this could be a resolution of sorts to the Fermi paradox, though he also notes that he is skeptical of the framework that motivates his discussion:

Perhaps this argument explains why SETI has not found any signals from alien civilizations [because if there were an earlier civ at our stage, we would be far more likely to be in that civ], but I find it more plausible that it is merely a symptom that the synchronous gauge probability distribution is not the right one.

I'm not claiming that the picture Guth outlines is likely to be correct. It's highly speculative, as he himself hints, and there are potentially many ways to avoid it — for example, contra Guth's preferred model, it may be that inflation eventually stops, cf. Hawking & Hertog, 2018, and thus that each point in a pocket universe's timeline will have equal density in the end; or it might be that inflationary models are not actually right after all.

That said, one could still argue that the implication Guth explores — which is potentially a consequence of a wide variety of (eternal) inflationary models — is a weak reason, among many other reasons, to give more weight to short-term stuff (after all, in EV terms, the enormous rate of universe generation suggested by Guth would mean that even extremely small credences in something like his framework could still be significant). And perhaps it's also a weak reason to update in favor of thinking that as yet unknown unknowns will favor a short(er)-term priority to a greater extent than we had hitherto expected, cf. Brian Tomasik's discussion of how we might model unknown unknowns.

AMA: Tobias Baumann, Center for Reducing Suffering

Concerning how EA views on this compare to the views of the general population, I suspect they aren’t all that different. Two bits of weak evidence:


Brian Tomasik did a small, admittedly unrepresentative and imperfect Mechanical Turk survey in which he asked people the following:

At the end of your life, you'll get an additional X years of happy, youthful, and interesting life if you first agree to be covered in gasoline and burned in flames for one minute. How big would X have to be before you'd accept the deal?

More than 40 percent said that they would not accept it “regardless of how many extra years of life” they would get (see the link for some discussion of possible problems with the survey).


The Future of Life Institute did a Superintelligence survey in which they asked, “What should a future civilization strive for?” A clear plurality (roughly a third) answered “minimize suffering” — a rather different question, to be sure, but it does suggest that a strong emphasis on reducing suffering is very common.

1. Do you know about any good articles etc. that make the case for such views?

I’ve tried to defend such views in chapter 4 and 5 here (with replies to some objections in chapter 8). Brian Tomasik has outlined such a view here and here.

But many authors have in fact defended such views about extreme suffering. Among them are Ingemar Hedenius (see Knutsson, 2019); Ohlsson, 1979 (review); Mendola, 1990; 2006; Mayerfeld, 1999, p. 148, p. 178; Ryder, 2001; Leighton, 2011, ch. 9; Gloor, 2016, II.

And many more have defended views according to which happiness and suffering are, as it were, morally orthogonal.

2. Do you think such or similar views are necessary to prioritize S-Risks?

As Tobias said: No. Many other views can support such a priority. Some of them are reviewed in chapter 1, 6, and 14 here.

3. Do you think most people would/should vote in such a way if they had enough time to consider the arguments?

I say a bit on this in footnote 23 in chapter 1 and in section 4.5 here.

4 For me it seems like people constantly trade happiness for suffering ... Those are reasons for me to believe that most people ... are also far from expecting 1:10^17 returns or even stating there is no return which potentially could compensate any kind of suffering.

Many things to say on this. First, as Tobias hinted, acceptable intrapersonal tradeoffs cannot necessarily be generalized to moral interpersonal ones (cf. sections 3.2 and 6.4 here). Second, there is the point Jonas made, which is discussed a bit in section 2.4 in ibid. Third, tradeoffs concerning mild forms of suffering that a person agrees to undergo do not necessarily say much about tradeoffs concerning states of extreme suffering that the sufferer finds unbearable and is unable to consent to (e.g. one may endorse lexicality between very mild and very intense suffering, cf. Klocksiem, 2016, or think that voluntarily endured suffering occupies a different moral dimension than does suffering that is unbearable and which cannot be voluntarily endured). More considerations of this sort are reviewed in section 14.3, “The Astronomical Atrocity Problem”, here.

AMA: Tobias Baumann, Center for Reducing Suffering

[Warning: potentially disturbing discussion of suicide and extreme suffering.]

I agree with many of the points made by Anthony. It is important to control for these other confounding factors, and to make clear in this thought experiment that the person in question cannot reduce more suffering for others, and that the suicide would cause less suffering in expectation (which is plausibly false in the real world, also considering the potential for suicide attempts to go horribly wrong, Humphry, 1991, “Bizarre ways to die”). (So to be clear, and as hinted by Jonas, even given pure NU, trying to commit suicide is likely very bad in most cases, Vinding, 2020, 8.2.)

Another point one may raise is that our intuitions cannot necessarily be trusted when it comes to these issues, e.g. because we have an optimism bias (which suggests that we may, at an intuitive level, wholly disregard these tail risks); because we evolved to prefer existence almost no matter the (expected) costs (Vinding, 2020, 7.11); and because we intuitively have a very poor sense of how bad the states of suffering in question are (cf. ibid., 8.12).

Intuitions also differ on this matter. One EA told me that he thinks we are absolutely crazy for staying alive (disregarding our potential to reduce suffering), especially since we have no off-switch in case things go terribly wrong. This may be a reason to be less sure of one's immediate intuitions on this matter, regardless of what those intuitions might be.

I also think it is important to highlight, as Tobias does, that there are many alternative views that can accommodate the intuition that the suicide in question would be bad, apart from a symmetry between happiness and suffering, or upside-focused views more generally. For example, there is a wide variety of harm-focused views, including but not restricted to negative consequentialist views in particular, that will deem such a suicide bad, and they may do so for many different reasons, e.g. because they consider one or more of the following an even greater harm (in expectation) than the expected suffering averted: the frustration of preferences, premature death, lost potential, the loss of hard-won knowledge, etc. (I say a bit more about this here and here.)

Relatedly, one should be careful about drawing overly general conclusions from this case. For example, the case of suicide does not necessarily say much about different population-ethical views, nor about the moral importance of creating happiness vs. reducing suffering in general. After all, as Tobias notes, quite a number of views will say that premature deaths are mostly bad while still endorsing the Asymmetry in population ethics, e.g. due to conditional interests (St. Jules, 2019; Frick, 2020). And some views that reject a symmetry between suffering and happiness will still consider death very bad on the basis of pluralist moral values (cf. Wolf, 1997, VIII; Mayerfeld, 1996, “Life and Death”; 1999, p. 160; Gloor, 2017; 1, 4.3, 5).

Similar points can be made about intra- vs. interpersonal tradeoffs: one may think that it is acceptable to risk extreme suffering for oneself without thinking that it is acceptable to expose others to such a risk for the sake of creating a positive good for them, such as happiness (Shiffrin, 1999; Ryder, 2001; Benatar & Wasserman, 2015, “The Risk of Serious Harm”; Harnad, 2016; Vinding, 2020, 3.2).

(Edit: And note that a purely welfarist view entailing a moral symmetry between happiness and suffering would actually be a rather fragile basis on which to rest the intuition in question, since it would imply that people should painlessly end their lives if their expected future well-being were just below "hedonic zero", even if they very much wanted to keep on living (e.g. because of a strong drive to accomplish a given goal). Another counterintuitive theoretical implication of such a view is that one would be obliged to end one's life, even in the most excruciating way, if it in turn created a new, sufficiently happy being, cf. the replacement argument discussed in Jamieson, 1984; Pluhar, 1990. I believe many would find these implications implausible as well, even on a purely theoretical level, suggesting that what is counterintuitive here is the complete reliance on a purely welfarist view — not necessarily the focus on reducing suffering over increasing happiness.)

The case of the missing cause prioritisation research

Thanks for writing this post! :-)

Two points:

i. On how we think about cause prioritization, and what comes before

2. Consideration of different views and ethics and how this affects what causes might be most important.

It’s not quite clear to me what this means. But it seems related to a broader point that I think is generally under-appreciated, or at least rarely acknowledged, namely that cause prioritization is highly value relative.

The causes and interventions that are optimal relative to one value system are unlikely to be optimal relative to another value system (which isn't to say that there aren't some causes and interventions that are robustly good on many different value systems, as there plausibly are, and identifying novel such causes and interventions would be a great win for everyone; but then it is also commensurately difficult to identify new such causes and have much confidence in them given both our great empirical uncertainty and the necessarily tight constraints).

I think it makes sense that people do cause prioritization based on the values, or the rough class of values, that they find most plausible. Provided, of course, that those values have been reflected on quite carefully in the first place, and scrutinized in light of the strongest counterarguments and alternative views on offer.

This is where I see a somewhat mysterious gap in EA, more fundamental and even more gaping than the cause prioritization gap highlighted here: there is surprisingly little reflection on and discussion of values (something I also noted in this post, along with some speculations as to what might explain this gap).

After all, cause prioritization depends crucially on the fundamental values based on which one is trying to prioritize (a crude illustration), so this is, in a sense, the very first step on the path toward thoroughly reasoned cause prioritization.

ii. On the apparent lack of progress

As hinted in Zoe's post, it seems that much (most?) cutting edge cause prioritization research is found in non-public documents these days, which makes it appear like there is much less research than there in fact is.

This is admittedly problematic in that it makes it difficult to get good critiques of the research in question, especially from skeptical outsiders, and it also makes it difficult for outsiders to know what in fact animates the priorities of different EA agents and orgs. It may well be that it is best to keep most research secret, all things considered, but I think it’s worth being transparent about the fact that there is a lot that is non-public, and that this does pose problems, in various ways, including epistemically.

Moral Anti-Realism Sequence #2: Why Realists and Anti-Realists Disagree
The way I think about it, when I'm suffering, this is my brain subjectively "disvaluing" (in the sense of wanting to end or change it) the state it's currently in.

This is where I see a dualism of sorts, at least in the way it's phrased. There is the brain disvaluing (as an evaluating subject) the state it's in (where this state is conceived of as an evaluated object of sorts). But the way I think about it, there is just the state your mind-brain is in, and the disvaluing is part of that mind-brain state. (What else could it be?)

This may just seem semantic, but I think it's key: the disvaluing, or sense of disvalue, is intrinsic to that state. It relates back to your statement that reality simply is, and interpretation adds something to it. To which I'd still say that interpretations, including disvaluing in particular, are integral parts of reality. They are intrinsic to the subset of reality that is our mind-brains.

This is not the same as saying that there exists a state of the world that is objectively to be disvalued.

I think it's worth clarifying what the term "objectively" means here. Cf. my point above, I think it's true to say that there is a state of the world that is disvalued, and hence disvaluable according to that state itself. And this is true no matter where in the universe this state is instantiated. In this sense, it is objectively (i.e. universally) disvaluable. And I don't think things change when we introduce "other" individuals into the picture, as we discussed in the comments on your first post in this sequence (I also defended this view at greater length in the second part of my book You Are Them).

I talk about notions like 'life goals' (which sort of consequentialist am I?), 'integrity' (what type of person do I want to be?), 'cooperation/respect' (how do I think of the relation between my life goals and other people's life goals?), 'reflective equilibrium' (part of philosophical methodology), 'valuing reflection' (the anti-realist notion of normative uncertainty), etc.

Ah, I think we've talked a bit past each other here. My question about bedrock concepts was mostly about why you would question them in general (as you seem to do in the text), and what you think the alternative is. For example, it seems to me that the notions you consider foundational in your ethical perspective in particular do in turn rest on bedrock concepts that you can't really explain more reductively, i.e. with anything but synonymous concepts ("goals" arguably being an example).

From one of your replies to MichaelA:

I should have chosen a more nuanced framing in my comment. Instead of saying, "Sure, we can agree about that," the anti-realist should have said "Sure, that seems like a reasonable way to use words. I'm happy to go along with using moral terms like 'worse' or 'better' in ways where this is universally considered self-evident. But it seems to me that you think you are also saying that for every moral question, there's a single correct answer [...]"

It seems to me your conception of moral realism conflates two separate issues:

1. Whether there is such a thing as (truly) morally significant states, and

2. Whether there is a single correct answer for every moral question.

I think these are very different questions, and an affirmative answer to the former need not imply an affirmative answer to the latter. That is, one can be a realist about 1. while being a non-realist about 2.

For example, one can plausibly maintain that a given state of suffering is intrinsically bad and ought not exist without thinking that there is a clear answer, even in principle, concerning whether it is more important to alleviate this state or some other state of similarly severe suffering. As Jamie Mayerfeld notes, even if we think states of suffering occupy a continuum of (genuine) moral importance, the location of any given state of suffering on this continuum "may not be a precise point" (Mayerfeld, 1999, p. 29). Thus, one can be a moral realist and still embrace vagueness in many ways.

I think it would be good if this distinction were more clear in this discussion, and if these different varieties of realism were acknowledged. After all, you seem quite sympathetic to some of them yourself.

New book — "Suffering-Focused Ethics: Defense and Implications"

Thanks for sharing your reflections :-)

This is because of imagining and seeing examples as in the book and here.

Just wanted to add a couple of extra references like this:

The Seriousness of Suffering: Supplement

The Horror of Suffering

Preventing Extreme Suffering Has Moral Priority

To be more specific, I think that one second of the most extreme suffering (without subsequent consequences) would be better than, say, a broken leg.

Just want to note, also for other readers, that I say a bit about such sentiments involving "one second of the most extreme suffering" in section 8.12 in my book. One point I make is that our intuitions about a single second of extreme suffering may not be reliable. For example, we probably tend not to assign great significance, intuitively, to any amount of one-second long chunks of experience. This is a reason to think that the intuition that one second of extreme suffering can't matter that much may not say all that much about extreme suffering in particular.

If that holds, than any extreme suffering can be overcome by mild suffering.

I think this is a little too quick, at least in the way you've phrased it. A broken leg hardly results in merely mild suffering, at least by any common definition. And a lexical threshold has, for example, been defended between "mere discomfort" and "genuine pain" (see Klocksiem, 2016), where a broken leg would clearly entail the latter.

There are also other reasons why this argument (i.e. "one second of extreme suffering can be outweighed by mild suffering, hence any amount of extreme suffering can") isn't valid.

Note also that even if one thinks that aggregates of milder forms of suffering can be more important than extreme suffering in principle, one may still hold that extreme suffering dominates profusely in practice, given its prevalence.

Now, many people would trade mild tradeoff for other things they hold important.

I just want to flag here that the examples you give seem to be intrapersonal ones, and the permissibility of intrapersonal tradeoffs like these (which is widely endorsed) does not imply the permissibility of similar tradeoffs in the interpersonal case (which more people would reject, and which there are many arguments against, cf. chapter 3).

The following is neither a request nor a complaint, but in relation to the positions you express, I see little in the way of counterarguments to, or engagement with, the arguments I've put forth in my book, such as in chapters 3 and 4, for example. In other words, I don't really see the arguments I present in my book addressed here (to be clear, I'm not claiming you set out to do that), and I'm still keen to see some replies to them.

New book — "Suffering-Focused Ethics: Defense and Implications"

Thanks for your comment. I appreciate it! :-)

In relation to counterintuitions and counterarguments, I can honestly say that I've spent a lot of time searching for good ones, and tried to include as many as I could in a charitable way (especially in chapter 8).

I'm still keen to find more opposing arguments and intuitions, and to see them explored in depth. As hinted in the post, I hope my book can provoke people to reflect on these issues and to present the strongest case for their views, which I'd really like to see. I believe such arguments can help advance the views of all of us toward greater levels of nuance and sophistication.

Load More