AI risk

# 338

Cross-posted from Foxy Scout

# Overview

What We Owe The Future (WWOTF) by Will MacAskill has recently been released with much fanfare. While I strongly agree that future people matter morally and we should act based on this, I think the book isn’t clear enough about MacAskill’s views on longtermist priorities, and to the extent it is it presents a mistaken view of the most promising longtermist interventions.

1. Underestimates risk of misaligned AI takeover. more
2. Overestimates risk from stagnation. more
3. Isn’t clear enough about longtermist priorities. more

I highlight and expand on these disagreements in part to contribute to the debate on these topics, but also make a practical recommendation.

While I like some aspects of the book, I think The Precipice is a substantially better introduction for potential longtermist direct workers, e.g. as a book given away to talented university students. For instance, I’m worried people will feel bait-and-switched if they get into EA via WWOTF then do an 80,000 Hours call or hang out around their EA university group and realize most people think AI risk is the biggest longtermist priority, many thinking this by a large margin.[1] more

# What I disagree with[2]

## Underestimating risk of misaligned AI takeover

### Overall probability of takeover

In endnote 2.22 (p. 274), MacAskill writes [emphasis mine]:

I put that possibility [of misaligned AI takeover] at around 3 percent this century… I think most of the risk we face comes from scenarios where there is a hot or cold war between great powers.

I think a 3% chance of misaligned AI takeover this century is too low, with 90% confidence.[3] Most of the risk coming from scenarios with hot or cold great power wars may be technically true if one thinks a war between US and China is >50% likely soon which might be reasonable with a loose definition of cold war. That being said, I strongly think MacAskill’s claim about great power war gives the wrong impression of the most probable AI takeover threat models.

My credence on misaligned AI takeover is 35%[4] this century, of which not much depends on a great power war scenario.

Below I’ll explain why my best-guess credence is 35%: the biggest input is a report on power-seeking AI, but I’ll also list some other inputs then aggregate the inputs.

#### Power-seeking AI report

The best analysis estimating the chance of existential risk (x-risk) from misaligned AI takeover that I’m aware of is Is Power-Seeking AI an Existential Risk? by Joseph Carlsmith.[5]

Carlsmith decomposes a possible existential catastrophe from AI into 6 steps, each conditional on the previous ones:

1. Timelines: By 2070, it will be possible and financially feasible to build APS-AI: systems with advanced capabilities (outperform humans at tasks important for gaining power), agentic planning (make plans then acts on them), and strategic awareness (its plans are based on models of the world good enough to overpower humans).
2. Incentives: There will be strong incentives to build and deploy APS-AI.
3. Alignment difficulty: It will be much harder to build APS-AI systems that don’t seek power in unintended ways, than ones that would seek power but are superficially attractive to deploy.
4. High-impact failures: Some deployed APS-AI systems will seek power in unintended and high-impact ways, collectively causing >1 trillion in damage. 5. Disempowerment: Some of the power-seeking will in aggregate permanently disempower all of humanity. 6. Catastrophe: The disempowerment will constitute an existential catastrophe. I’ll first discuss my component probabilities for a catastrophe by 2100 rather than 2070[6], then discuss the implications of Carlsmith’s own assessment as well as reviewers of his report. 1. Timelines: By 2100, it will be possible and financially feasible to build APS-AI: systems with advanced capabilities (outperform humans at tasks important for gaining power), agentic planning (make plans then acts on them), and strategic awareness (its plans are based on models of the world good enough to overpower humans). 80% 1. I explain this probability for Transformative AI (TAI) below. I don’t think my probability changes much between TAI and APS-AI.[7] 2. Incentives: There will be strong incentives to build and deploy APS-AI. 85% 1. I think it’s very likely that APS systems will be much more useful than non-APS systems, as expanded upon in Section 3.1 of the report. It seems like so far systems that are closer to APS and more general have been more effective, and I only see reasons for this incentive gradient to become stronger over time. 3. Alignment difficulty: It will be much harder to build APS-AI systems that don’t seek power in unintended ways, than ones that would seek power but are superficially attractive to deploy. 75% 1. Fundamentally, controlling an agent much more capable than yourself feels very hard to me; I like the analogy of a child having to hire an adult to be their company’s CEO described here. I don’t see much reason for hope based on the progress of existing technical alignment strategies. My current biggest hope is that we can use non-APS AIs in various ways to help automate alignment research and figure out how to align APS-AIs. But I’m not sure how much mileage we can get with this; see here for more. 4. High-impact failures: Some deployed APS-AI systems will seek power in unintended and high-impact ways, collectively causing >1 trillion in damage. 90%

1. Once misaligned APS-AI systems are being deployed, I think we’re in a pretty scary place. If at least one is deployed probably many will be deployed (if the first one doesn’t disempower us) due to correlation on how hard alignment is, and even if we’re very careful at first the systems will get smarter over time and there will be more of them; high-impact failures feel inevitable.
5. Disempowerment: Some of the power-seeking will in aggregate permanently disempower all of humanity. 80%

1. Seems like a narrow “capabilities target” to get something that causes a high-impact failure but doesn’t disempower us, relative to the range of possible capabilities of AI systems. But I have some hope for a huge warning shot that wakes people up, or that killing everyone turns out to be really really hard.
6. Catastrophe: The disempowerment will constitute an existential catastrophe. 95%

1. Conditional on unintentional disempowerment of humanity, it’s likely that almost all possible value in the future would be lost as there’s a large possible space of values, and most of them being optimized lead to ~value-less worlds from the perspective of human values (see also Value is Fragile). I basically agree with Carlsmith’s reasoning in the report.

This gives me a ~35% chance of existential risk from misaligned AI takeover by 2100, based on my rough personal credences.

Carlsmith, the author of the report, originally ended up with 5% risk. As of May 2022 he is up to >10%.

I’ve read all of the reviews and found the ones from Nate Soares (in particular, the sections on alignment difficulty and misalignment outcomes) and Daniel Kokotajlo to be the most compelling.[8] They have p(AI doom by 2070) at >77% and 65% respectively. Some of the points that resonated the most with me:

1. By Soares: The AI may not even need to look all that superficially good, because the actors will be under various pressures to persuade themselves that things look good. Soares expects the world to look more derpy than competent, see our COVID response.
2. By Soares: ‘I suspect I think that the capability band "do a trillion dollars worth of damage, but don't Kill All Humans" is narrower / harder to hit and I suspect we disagree about how much warning shots help civilization get its act together and do better next time.’
3. By Kokotajlo: ‘Beware isolated demands for rigor. Imagine someone in 1960 saying “Some people thought battleships would beat carriers. Others thought that the entire war would be won from the air. Predicting this stuff is hard; we shouldn’t be confident. Therefore, we shouldn’t assign more than 90% credence to the claim that computers will be militarily useful, e.g. in weapon guidance systems or submarine sensor suites. Maybe it’ll turn out that it’s cheaper and more effective to just use humans, or bio-engineered dogs, or whatever. Or maybe there’ll be anti-computer weapons that render them useless. Who knows. The future is hard to predict.” This is what the author sounds like to me; I want to say “Battleships vs. carriers was a relatively hard prediction problem; whether computers would be militarily useful was an easy one. It’s obvious that APS systems will be powerful and useful for some important niches, just like how it was obvious in 1960 that computers would have at least a few important military applications.’ (and in general I think the whole Incentives section of Kokotajlo’s review is great)

#### Other inputs

A few more pieces that have informed my views:

1. Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover by Cotra: I found this fairly compelling by spelling out a fairly plausible story with fairly short AI timelines and an AI takeover. In sum: an AI is trained using human feedback on diverse tasks e.g. generally doing useful things on a computer. The AI learns general skills, deceives humans during the training phase to get higher rewards than an honest strategy, then after deployment of many copies takes over humanity to either “maximize reward forever” or pursue whatever its actual goals are. I think (as does Ajeya Cotra, the author) that something close to the exact story is unlikely to play out, but I think the piece is very useful nonetheless as the xkcd linked in the post indicates.
2. AGI Ruin: A List of Lethalities by Yudkowsky: While I’m more optimistic about our chances than Yudkowsky, I think the majority of the things he’s pointing at are real, non-trivial difficulties that we need to keep in mind and could easily fail to address. The rest of the 2022 MIRI Alignment discussion seems informative as well, though I haven’t had a chance to read most of it in-depth. Replies to AGI Ruin from Paul Christiano and the DeepMind alignment team are also great reads: it’s important to note how much about the situation all 3 parties agree on, despite some important disagreements.

Finally, I’ve updated some based on my experience with Samotsvety forecasters when discussing AI risk. We’ve primarily selected forecasters to invite for having good scores and leaving reasonable comments, rather than any ideological tests. When we discussed the report on power-seeking AI, I expected tons of skepticism but in fact almost all forecasters seemed to give >=5% to disempowerment by power-seeking AI by 2070, with many giving >=10%. I’m curious to see the results of Tetlock group’s x-risk forecasting tournament as well. To be clear, I don’t think we should update too much based on generalist forecasters as opposed to those who have engaged for years of their life and have good judgment; but I do think it’s a relevant data point.

[Edited to add]: I've now published Samotsvety aggregate forecasts here:

A few of the headline aggregate forecasts are:

1. 25% chance of misaligned AI takeover by 2100, barring pre-APS-AI catastrophe
2. 81% chance of Transformative AI (TAI) by 2100, barring pre-TAI catastrophe
3. 32% chance of AGI being developed in the next 20 years

#### Aggregating inputs

So I end up with something like:

1. A ~35% chance of misaligned AI takeover this century based on my independent impression.
2. Many people who I respect are lower, but some are substantially higher, including Kokotajlo and Soares.

I’m going to stick to 35% for now, but it’s a very tough question and I could see ending up at anywhere between 10-90% on further reflection and discussion.

### Treatment of timelines

In endnote 2.22 (p. 274), MacAskill gives 30% to faster-than-exponential growth. My understanding is that this is almost all due to scenarios involving TAI, so MacAskill’s credence on TAI by 2100 is approximately 30% + the 3% from AI takeover = 33%. I think 33% is too low with 85% confidence.

My credence in Transformative AI (TAI) by 2100 is ~80% barring pre-TAI catastrophe.

Some reasoning informing my view:

1. As explained here, the bio anchors report includes an evolution anchor forecasting when we would create TAI if we had to do as many computations as all animals in history combined, to copy natural selection. It finds that even under this conservative assumption, there is a 50% chance of TAI by 2100.[9] While I don’t put that much weight on the report as a whole because all individual AI forecasting methods have serious limitations, I think it’s the best self-contained AI timelines argument out there and does provide some evidence for 50% TAI by 2100 as a soft lower bound.

2. Human feedback on diverse tasks (HFDT) as described here already feels like a somewhat plausible story for TAI; while I am skeptical it will just work without many hiccups (I give ~10-15% to TAI within 10 years), 80 years is a really long time to refine techniques and scale up architecture + data.

3. AI research already is sped up by AI a little, and may soon be sped up substantially more, leading to faster progress.

1. This spreadsheet linked from this post collects “Examples of AI improving AI” including NVIDIA using AI to optimize their GPU designs and Google using AI to optimize AI accelerator chips.

2. One story that seems plausible is: language model (LM) tools are already improving the productivity of Google developers a little. We don’t appear to yet be hitting diminishing returns to better LMs: scale, data, and simple low-hanging fruit like chain-of-thought and collecting a better dataset[10] are yielding huge improvements. Therefore, we should expect LM tools to get better and better at speeding up AI research which will lead to faster progress.

3. AI research might be easy to automate relative to other tasks since humans aren’t very optimized for it by natural selection (see also Moravec’s paradox, point stolen from Ngo in MIRI conversations).

4. For a range of opinions on the chance of APS-AI by 2070, see here. Most reviewers are above 50%, even ones who are skeptical of large risks from misaligned AI.

5. Holden Karnofsky gave ~67% to TAI by 2100, before Ajeya Cotra (author of the bio anchors report) moved her timelines closer.

1. My impression is that Karnofsky has longer timelines than almost all of the people closest to being AI forecasting “experts” (e.g. Cotra, Kokotajlo, Carlsmith) though there are selection effects here.

So overall, the picture looks to me like: 50% is a conservative soft lower bound for TAI by 2100, I have some inside view reasons to think it might be likely much sooner, and many of the most reasonable people who have thought the most about this subject tend to give at least 67%. On the other hand, I give some weight to us being very mistaken. Mashing all these intuitions together gives me a best guess of 80%, though I think I could end up at anywhere between 60% and 90% on further reflection and discussion.

Significantly shorter AI timelines dramatically increase the importance of reducing AI risk relative to other interventions[11], especially preventing stagnation as discussed below.

### Treatment of alignment

The section “AI and Entrenchment” (beginning p. 83) focuses on risks from aligned AIs, and comes before “AI Takeover”. This seems like a mistake of emphasis as the risks of misalignment feel much larger, given that right now we don't know how to align an AGI.[12]

Often the risk of AI takeover is bundled with other risks of human extinction. But this is a mistake. First, not all AI takeover scenarios would result in human extinction.

I agree this is technically true, but I think in practice it’s probably not a substantial consideration. I think with 90% confidence that the vast majority (>90%) of misaligned AI takeovers would cause extinction or would practically have the same effect, given that they would destroy humanity’s potential.

From a moral perspective AI takeover looks very different from other extinction risks… AI agents would continue civilisation… It’s an open question how good or bad such a civilisation would be… What’s at stake when navigating the transition to a world with advanced AI, then, is not whether civilisation continues but which civilisation continues.

I think it’s very likely (>=95%) that the civilisation would be bad conditional on AI misalignment, as described above (see proposition 6: Catastrophe).

Since MacAskill doesn’t give a precise credence in this section it’s hard to say with what confidence I disagree, but I’d guess ~80%.

## Overestimating risk from stagnation

In Chapter 7, MacAskill discusses stagnation from a longtermist perspective. I’ll argue that he overestimates both the chance of stagnation and the expected length of stagnation.

Since I give a ~7x lower chance of stagnation and ~1.5x shorter expected length, I think MacAskill overrates extinction risk from stagnation extending the time of perils by ~10x.[13]

I then explain why this matters.

### Overestimating chance of stagnation

In endnote 2.22 (p. 274), MacAskill writes [emphasis mine]:

This century, the world could take one of approximately four trajectories… I think the stagnation scenario is most likely, followed by the faster-than-exponential growth scenario… If I had to give precise credences, I’d say: 35 percent[14]

Note that stagnation is defined here as GDP growth slowing, separate from global catastrophes.[15]

Kudos to MacAskill for giving a credence here! I’ll argue that the credence is an extremely implausible overestimate. My credence in stagnation this century is 5%, and the aggregated forecast of 5 Samotsvety forecasters (one of which is me) is also 5%, with a range of 1-10%. I think MacAskill’s 35% credence is too high with 95% confidence.

I’ll explain some intuitions behind why MacAskill’s 35% credence is way too high below: AI timelines, non-AI tech, and overestimating the rate of decline in researcher productivity.

#### AI timelines

As discussed above, my credence on TAI by 2100 barring catastrophe is 80%, while MacAskill’s is ~33%. My TAI timelines put an upper bound of 20% on stagnation this century, so this is driving a substantial portion of the disagreement here.

However, there is some remaining disagreement from non-TAI worlds as well. MacAskill gives 35% to stagnation this century and ~33% to TAI this century, implying 35/67 = ~52% chance of stagnation in non-TAI worlds. I think MacAskill gives too much credence to stagnation in non-TAI worlds, with 75% confidence.

I give 5/20 = 25% to stagnation in non-TAI worlds; in the next 2 sections I’ll explore the reasons behind this disagreement.

#### Non-AI tech

It seems like there’s other tech that’s already feasible or close that could get us out of stagnation, which are mainly not applied due to taboos: in particular human cloning[16] and genetic enhancement[17]. I think if stagnation started to look likely, at least some countries would experiment with these technologies. I’d guess there would be strong incentives to do so, as any country that did would have the chance at becoming a world power.

#### Overestimating rate of decline in research productivity

Beginning on p. 150, MacAskill discusses two competing effects regarding whether scientific progress is getting easier or harder:

There are two competing effects. On the one hand, we “stand on the shoulders of giants”: previous discoveries can make future progress easier. The invention of the internet made researching this book, for example, much easier than it would have been in the past. On the other hand, we “pick the low-hanging fruit”: we make the easy discoveries first, so those that remain are more difficult. You can only invent the wheel once, and once you have, it’s harder to find a similarly important invention.

MacAskill then argues that “picking the low-hanging fruit” predominates, meaning past progress makes future progress harder. I agree that it almost certainly dominates but am significantly less confident on to what extent it still does in the Internet age, which leads me to expect stagnation to occur more gradually.

Qualitatively, MacAskill gives the examples of Einstein and the Large Hadron Collider:

In 1905, his “miracle year,” Albert Einstein revolutionized physics… while working as a patent clerk. Compared to Einstein’s day, progress in physics is now much harder to achieve. The Large Hadron Collider cost about 5 billion, and thousands of people were involved. While there may be diminishing returns within physics in a somewhat similar paradigm, an important point left out here is paradigm shifts: there may very often be strong diminishing returns to research effort within a paradigm, but when a new paradigm arises many new low-hanging fruit pop up.[18] We can see this effect recently in deep learning: since the GPT-3 paper came out, many low-hanging fruit have been picked quickly such as chain-of-thought prompting and correcting language model scaling laws. The stronger argument that MacAskill presents is quantitative. He argues on p.151 and in endnote 7.26 that since 1800, research effort has grown by at least 500x while productivity growth rates have stayed the same or declined a bit. This would mean research productivity has declined at least 500x. I roughly agree that research effort has grown at least 500x. The key question in my mind is to what extent “effective” productivity growth rates have actually approximately stayed the same (I agree they very likely haven’t increased a corresponding 500x, but 5x-50x seems plausible), due to measurement issues. How much can we trust statistics like TFP and even GDP to capture research productivity in the Internet age? My current take is that they’re useful, but we should take them with a huge grain of salt. I have two concerns: (a) lack of capturing the most talented people’s output, and (b) lags between research productivity and impact on GDP. First, many of the most talented people now work in sectors such as software where their output is mostly not captured by either GDP or TFP. For example, I’d predict many of the most productive people would rather give up 50% of their salary than give up internet access or even search engines (see this paper which studies this empirically but not targeted at the most productive people); but this consumer surplus is barely counted in GDP statistics.[19] MacAskill attempts to address considerations like this in endnote 7.10. I don’t find his argument that GDP was likely mismeasuring progress even more greatly before 1970 than it is now convincing but am open to having my mind changed. I’d guess that the Internet and services on top of it are a much bigger deal in terms of research productivity than e.g. the telephone, providing a bigger surplus than previous technologies.[20] Second, there’s likely a significant lag between research productivity and impact on GDP.[21] The Internet is still relatively new, but we can see with e.g. AI research that big things seem to be on the horizon that haven’t yet made a dent in GDP. ### Overestimating length of stagnation On p.159, MacAskill suggests that the expected length of stagnation is likely over 1,000 years: Even if you think it's 90% likely that stagnation would last only a couple of centuries and just 10 percent likely that it would last ten thousand years, then the expected length of stagnation is still over a thousand years. I disagree. I think the expected length of stagnation is <1,000 years with 70% confidence. I’d put the expected length of stagnation at 600 years, and give <2% to a stagnation of at least 10,000 years. In the section “How Long Would Stagnation Last” (beginning p. 156), MacAskill gives a few arguments for why stagnation might “last for centuries or even millennia”. Getting out of stagnation requires only that one country at one point in time, is able to reboot sustained technological progress. And if there are a diversity of societies, with evolving cultures and institutional arrangements over time, then it seems likely that one will manage to restart growth… However, … to a significant extent we are already living in a global culture. If that culture develops into one that is not conducive to technological progress, that could make stagnation more persistent. We’ve already seen the homogenising force of modern secular culture for some high-fertility religious groups… A single global culture could be especially opposed to science and technology if there were a world government. I agree that our culture is much more globally homogeneous than it used to be; but it still feels fairly heterogeneous to me, and becoming more homogeneous seems to be strongly correlated with more transformative technologies. Additionally, one needs to think not only that society will become homogeneous but that it will stay homogeneous for millenia; I’d give this <5%, especially conditioning on stagnation having occurred which would make technologies enabling enforcement of homogeneity less powerful. This includes the possibility of strong world governments; I’m very skeptical of their likelihood or stability without transformative technologies. A second reason why stagnation might last a long time is population decline… In this situation, the bar for an outlier culture to restart technological progress is much higher. This point feels stronger, but I think I likely still assign less weight to it than MacAskill since persistent population decline feels more likely in worlds which are very homogenous. If the world is somewhat heterogeneous, all you need are a few remaining cultures with high fertility rates which will dominate over time. Thus this point is very correlated to the above point. The world population could also decrease dramatically as a result of a global catastrophe… Perhaps a stagnant future is characterised by recurrent global catastrophes that repeatedly inhibit escape from stagnation. This scenario feels the most plausible to me. But it seems like the best interventions to prevent this type of long stagnation are targeted at avoiding catastrophes or rebuilding after them (e.g. civilizational refuges), rather than trying to ensure that tech progress will persist through the catastrophes. ### Why this matters MacAskill argues that risk of stagnation is high so contributing to technological progress is good and is unsure if even speeding up AI progress would be good or bad, e.g. on p.244: We need to ensure that the engine of technological progress keeps running. On p.224: I just don’t know… Is it good or bad to accelerate AI development… speeding it up could help reduce the risk of technological stagnation. I think contributing to general tech progress is ~neutral and speeding up AI progress is bad (also related to my disagreements about AI risk). Another implication of differences in AI and future tech timelines specifically is that it discounts the value of working on stuff aimed at very long-run effects substantially. For example, MacAskill discusses how long surface coal will last on p. 140, concluding that it will “probably last longer” than ~50-300 years depending on the region. He writes “But from a longterm point of view, we need to take these sorts of timescales seriously.” I think AI + other tech will very likely (>90%) have changed the world so massively by then that it mostly doesn’t make much sense to think about from this sort of lens. Similarly, I put much less weight than MacAskill on the importance of increasing fertility rates (though it seems net positive). ## Lack of clarity on longtermist priorities My impression is that if someone read through this book, they would get a mistaken impression of what longtermist priorities actually are and should be. For example, as far as I remember there is little in the book that makes it clear that reducing AI and bio-risk is likely a higher priority on the margin than maintaining technological progress or fighting climate change. This is in contrast to The Precipice, which has a table summarizing the author’s estimations of the magnitude of risks. MacAskill at times seemed reluctant to quantify his best-guess credences, especially in the main text. His estimates for the likelihood of stagnation vs. catastrophe vs. growth scenarios as well as AI risk and biorisk are buried in the middle of endnote 2.22. There are some bio-risk estimates on p. 113, but these are citing others rather than giving MacAskill’s own view. MacAskill doesn’t directly state his best guess credence regarding AI timelines, instead putting it in endnote 2.22 in a fairly indirect way (credence on faster-than-exponential growth, “perhaps driven by advances in artificial intelligence”); but as mentioned above this is an extremely important parameter for prioritization. He introduces the SPC framework but in few places provides numerical estimates of significance, persistence, and contingency, preferring to make qualitative arguments. I think this will leave readers with a mistaken sense of longtermist priorities and also makes it harder to productively disagree with the book and continue the debate. For example, I realized via feedback on this draft that MacAskill’s AI timelines actually are indirectly given in endnote 2.22, but at first I missed this which made it harder to identify cruxes. For further thoughts on the extent to which longtermist prioritization is emphasized in the book framed as a reply to MacAskill’s comments regarding his reasoning, see the appendix. # What I like ## Discussing the importance of value lock-in from AI While as described above I think the current biggest problem we have to solve is figuring out how to reliably align a powerful AI at all, I did appreciate the discussion of value lock-in possibilities even when it’s aligned. I liked the section “Building a Morally Exploratory World” (beginning p.97) and would be concerned about AI safety researchers who think of alignment more as getting the AGI to have our current values rather than something like making the AI help us figure out what our ideal values are before implementing them. I think this has been discussed for a while among AI safety researchers, e.g. the idea of Coherent Extrapolated Volition. But it sometimes gets lost in current discussions and I think it’s a significant point in favor of AI safety researchers and policymakers being altruistically motivated.[22] ## Introducing the SPC framework In the section “A Framework for Thinking About the Future” (beginning p. 31), MacAskill introduces the SPC (Significance, Persistence, Contingency) Framework for assessing the long-term value of an action: 1. Significance: What’s the average value added by bringing about a certain state of affairs? 2. Persistence: How long will this state of affairs last once it has been brought about? 3. Contingency: If not for the action under consideration, how briefly would the world have been in this state of affairs (if ever)? I found the framework interesting and it seems like a fairly useful decomposition of the value of longtermist interventions, particularly ones that aren’t aimed at reducing x-risk. ## Presenting an emotionally compelling case for longtermism I personally find the idea that future people matter equally very intuitive, but I still found the case for longtermism in Chapter 1 (especially the first section of the book “The Silent Billions”) relatively emotionally compelling and have heard others also did. I also loved the passage on page 72 about being a moral entrepreneur: Lay was the paradigm of a moral entrepreneur: someone who thought deeply about morality, took it very seriously, was utterly willing to act in accordance with his convictions, and was regarded as an eccentric, a weirdo, for that reason. We should aspire to be weirdos like him. Others may mock you for being concerned about people who will be born in thousands of years’ time. But many at the time mocked the abolitionists. We are very far from creating the perfect society, and until then, in order to drive forward, moral progress, we need morally motivated heretics who are able to endure ridicule from those who wish to preserve the status quo. # Why I prefer The Precipice for potential longtermist direct workers I would strongly prefer to give The Precipice rather than WWOTF as an introduction to potential longtermist direct workers. An example of a potential longtermist direct worker is a talented university student. I’m not making strong claims about other groups like the general public or policymakers, though I’d tentatively weakly prefer The Precipice for them as well since I think it’s more accurate and we need strong reasons to sacrifice accuracy. I prefer The Precipice because: 1. It generally seems more likely to make correct claims, especially on AI risk. 1. Even though I think it still underestimates AI risk, it focuses to a larger extent on misalignment. 2. While I don’t think accuracy is the only desirable quality in a book that’s someone’s intro to EA, I think it’s fairly important. And The Precipice doesn’t seem much worse on other important axes. 2. It makes the author’s view on longtermist priorities much more clear. 1. Someone could read WWOTF and come out having little preference between reducing AI risk, maintaining tech progress, speeding up clean energy, etc. 2. I predict ~25% of people will feel bait-and-switched if they get into EA via WWOTF then do an 80,000 Hours call or hang out around their EA university group and realize most people think AI risk is the biggest longtermist priority, many thinking this by a large margin. 3. I’d push back against a counter-argument that it’s nice to have a gentle introduction for people uncomfortable with subjective probabilities, prioritization, and AI takeover scenarios. 1. Prioritization and a willingness to use numbers even when they feel arbitrary (while not taking them too seriously) are virtues close to the core of EA, that I’d want many longtermist direct workers to have. 2. I’d much prefer people who are willing to take ideas that initially seem weird seriously, I think we should usually filter for these types of people when doing outreach.[23] 3. [Edited to add]: I think being up front about our beliefs and prioritization mindset might be important for maintaining and improving community epistemics, which I think is an extremely important goal. See this comment for further thoughts. 4. One big difference between The Precipice and WWOTF that I’m not as sure about is the framing of reducing x-risks as interventions as opposed to trajectory changes and safeguarding civilization. I lean toward The Precipice and x-risks here but this belief isn’t very resilient. 1. See the appendix for more on the likelihood of the best possible future which is one crux on whether it makes sense to primarily use the x-risk framing, though another important one is which is more intuitive/motivating to potential high-impact contributors. # Acknowledgments Thanks to Nuño Sempere, Misha Yagudin, Alex Lawsen, Michel Justen, Max Daniel, Charlie Griffin, and others for very helpful feedback. Thanks to Tolga Bilge, Greg Justice, Jonathan Mann, and Aaron Ho for forecasts on stagnation likelihood. Thanks to a few others for conversations that helped me refine my views on the book. The claims in this piece ultimately represent my views and not necessarily those of anyone else listed above unless explicitly stated otherwise. # Appendix ## Defining confidence in unresolvable claims Throughout the review I argue for several unresolvable (i.e. we’ll never find out the correct answer), directional claims like: MacAskill arrives at [credence/estimate] X for claim Y, and I think this is too [high/low] with Z% confidence At first I wasn’t going to do this because this type of confidence is often poorly defined and misinterpreted, but I decided it was worth it to propose a definition and go ahead and use as many made-up numbers as possible for transparency. I define Z% confidence that X is too [high/low] as meaning: I have a credence of Z% that an ideal reasoning process instantiated in today’s world working with our current evidence would end up with a best-guess estimate for claim Y that is [lower/higher] than X. The exact instantiation of an ideal reasoning process is open to interpretation/debate, but I’ll gesture at something like “take some combination of many (e.g. 100) very reasonable people (e.g. generalist forecasters with a great track record) and many domain experts who have a scout mindset, freeze time for everyone else, then give them a very large amount of time (e.g. 1000 years) to figure out their aggregated best guess”. The people doing the reasoning can only deliberate about current evidence rather than acquire new evidence (by e.g. doing object-level work on AI to better understand AI timelines). A property of making directional claims like this is that MacAskill always has 50% confidence in the claim I’m making, since I’m claiming that his best-guess estimate is too high/low. (Edit: this actually isn't right, see this comment for why) ## Thoughts on likelihood of the best possible future In endnote 2.22, MacAskill writes [emphasis mine]: I think that the expected value of the continued survival of civilisation is positive, but it’s very far from the best possible future. If I had to put numbers on it, I’d say that the expected value of civilisation’s continuation is less than 1 percent that of the best possible future (where “best possible” means “best we could feasibly achieve”). The biggest difference between us regards how good we expect the future to be. Toby thinks that, if we avoid major catastrophe over the next few centuries, then we have something like a fifty-fifty chance of achieving something close to the best possible future. I think the odds are much lower. Primarily for this reason, I prefer not to use the language of “existential risk” (for reasons I spell out in Appendix 1) and prefer to distinguish between improving the future conditional on survival (“trajectory changes,” like avoiding bad value lock-in) and extending the lifespan of civilisation (“civilisational safeguarding,” like reducing extinction risks). Despite generally agreeing with The Precipice much more than WWOTF, I’m less sure who I agree with on this point and therefore whether the x-risk framing is better. I lean toward Ord and would give a ~15% chance of achieving the best possible future if we avoid catastrophe, but this credence has low resilience. ### AMA post and responses MacAskill wrote a post announcing WWOTF and doing an AMA. In the post, he writes: The primary aim is to introduce the idea of longtermism to a broader audience, but I think there are hopefully some things that’ll be of interest to engaged EAs, too: there are deep dives on moral contingency, value lock-in, civilisation collapse and recovery, stagnation, population ethics and the value of the future. It also tries to bring a historical perspective to bear on these issues more often than is usual in the standard discussions. I think there are some things of interest to engaged EAs, but as I’ve argued I think the book isn’t a good introduction for potential highly engaged EAs. I understand the appeal in gentle introductions (I got in through Doing Good Better), but I think it’s 90% likely I would have also gotten very interested if I had gone straight to The Precipice and so would most highly engaged longtermist EAs. MacAskill wrote in some comments: Highlighting one aspect of it: I agree that being generally silent on prioritization across recommended actions is a way in which WWOTF lacks EA-helpfulness that it could have had. This is just a matter of time and space constraints. For chapters 2-7, my main aim was to respond to someone who says, “You’re saying we can improve the long-term future?!? That’s crazy!”, where my response is “Agree it seems crazy, but actually we can improve the long-term future in lots of ways!” I wasn’t aiming to respond to someone who says “Ok, I buy that we can improve the long-term future. But what’s top-priority?” That would take another few books to do (e.g. one book alone on the magnitude of AI x-risk), and would also be less “timeless”, as our priorities might well change over the coming years. It would be a very different book if the audience had been EAs. There would have been a lot more on prioritisation (see response to Berger thread above), a lot more numbers and back-of-the-envelope calculations, a lot more on AI, a lot more deep philosophy arguments, and generally more of a willingness to engage in more speculative arguments. I’d have had more of the philosophy essay “In this chapter I argue that..” style, and I’d have put less effort into “bringing the ideas to life” via metaphors and case studies. Chapters 8 and 9, on population ethics and on the value of the future, are the chapters that are most similar to how I’d have written the book if it were written for EAs - but even so, they’d still have been pretty different. I don’t think it’s obvious, but I’m still pretty skeptical of this sort of reasoning. What do we expect the people who are very skeptical about prioritizing and putting numbers on things to actually do to have a large impact? And I’m still concerned about bait-and-switches as mentioned above; even Doing Good Better talked heavily about prioritization, while WWOTF might leave people feeling weird when they actually join the community and realize that there are strong prioritization opinions not discussed in the book. It seems important for EA’s community health that we can be relatively clear about our beliefs and what they imply. To the extent we’re not, I think we’ll potentially turn off some of the most talented potential contributors. If I try to put myself in the shoes of someone getting into EA through WWOTF then realizing that in fact there is a ton of emphasis in the longtermist movement on a relatively small portion of actions described in the book, I think I’d be somewhat turned off. I don’t think this is worth potentially appealing a bit more to the general public. ### Response to Scott Alexander Similarly, MacAskill wrote a forum comment in response to Scott Alexander which might provide some insight into the framing of the book. Some reactions to the theme I found most interesting: message testing from Rethink suggests that longtermism and existential risk have similarly-good reactions from the educated general public, and AI risk doesn’t do great. A lot of people just hate the idea of AI risk (cf Twitter), thinking of it as a tech bro issue, or doomsday cultism. This has been coming up in the twitter response to WWOTF, too, even though existential risk from AI takeover is only a small part of the book. And this is important, because I’d think that the median view among people working on x-risk (including me) is that the large majority of the risk comes from AI rather than bio or other sources. So “holy shit, x-risk” is mainly, “holy shit, AI risk”. I’m skeptical that we should give much weight to message testing with the “educated general public” or the reaction of people on Twitter, at least when writing for an audience including lots of potential direct work contributors. I think impact is heavy-tailed and we should target talented people with a scout mindset who are willing to take weird ideas seriously. And as mentioned above, it seems healthy to have a community that’s open about our beliefs including the weird ones, and especially about the cause area that is considered by many (including perhaps MasAskill?) to be the top longtermist priority. Being less open about weird beliefs may turn off some of the people with the most potential. ## Thoughts on the balance of positive and negative value in current lives In the section “How Many People Have Positive Wellbeing” (beginning p.195), MacAskill uses self-reports to attempt to answer whether “the world is better than nothing for the human beings alive today.” First, I’m very skeptical of self-reports as evidence here and think that it’s extremely hard for us to answer this question given our current understanding. When I think about some good experiences I had and some bad experiences I had and whether I’d prefer both of them to non-existence, I mostly think “idk? seems really hard to weigh these against each other.” I’m skeptical that others really have much more clarity here.[24] Second, I think even given the results of the self-reports it’s puzzling to me that on p. 201, MacAskill concludes that if he were given an option to live a randomly chosen life today he would. The most convincing evidence to me that he describes is a survey that asked people whether they’d skip parts of their day if they could (discussed on p. 198). MacAskill writes: Taking both duration and intensity into account, the negative experiences were only bad enough to cancel out 58 percent of people’s positive experiences… the right conclusion is actually more pessimistic… participants in these studies mainly lived in the United States or in other countries with comparatively high income levels and levels of happiness MacAskill then goes on to describe a survey of people in both India and the United States whether their whole life has been net good. Due to the difficulty weighing experiences I described above and rosy retrospection biases, I put very little weight on this follow-up study. If we mainly focus on the “skipping” study, I think the result should make us somewhat pessimistic about the overall value of the current world. In addition to the consideration MacAskill brought up, the study likely didn’t include many examples of extreme suffering, which seem quite bad compared to the range of positive experiences accessible to us today even through a classical utilitarian lens. I feel like I basically have no idea, but if I had to guess I’d say ~40% of current human lives are net-negative, and the world as a whole is worse than nothing for humans alive today because extreme suffering is pretty bad compared to currently achievable positive states. This does not mean that I think this trend will continue into the future; I think the future has positive EV due to AI + future tech. 1. Edited in for clarity: my concern is not that people won't toe the "party line" of longtermism and think AI is the most important; I'm very in favor of people forming their own views and encouraging new people to share their perspectives. My primary concern here is the effects of the lack of transparency in WWOTF about MacAskill's views on longtermist prioritization (and to the extent people interpret the book as representing longtermism in some sense, lack of clarity on the movement's opinions). ↩︎ 2. In this section I do my best to give my all-things-considered beliefs (belief after updating on what other people believe), rather than or in addition to my (explicitly flagged when given) independent impressions. That being said, I think it’s pretty hard to separate out independent impressions vs. all-things-considered beliefs on complex topics when you’re weighing many forms of evidence, and beliefs are formed over a long period of time. When initially writing this review I thought MacAskill was attempting to give his all-things-considered credences in WWOTF, but from discussing with reviewers it seems MacAskill is giving something closer to his independent impression when possible, or something like: “independent impressions and [MacAskill is] confused about how to update in light of peer disagreement". Though note that MacAskill shares a similar sentiment about it being difficult to separate between these, and his credences should likely be interpreted as somewhere in between all-things-considered beliefs and independent impressions. To the extent MacAskill isn’t trying to take into account peer disagreement, he isn’t necessarily trying to predict what an ideal reasoning process would output as described in the appendix. ↩︎ 3. See this appendix section for how I’m defining confidence in directional, unresolvable claims. ↩︎ 4. Formerly had 40% in this post, corrected to 35% due to correcting the mistake of failing to multiply 6 probabilities together correctly. ↩︎ 5. I strongly recommend reading it or, if you’re short on time, watching this presentation by the author. ↩︎ 6. I’m intending to think more about this and flesh these out further in a post, hopefully within 1-2 months. ↩︎ 7. The 80% below was assuming no catastrophes; I’ll also assume no other catastrophes here for simplicity, and because I think people often do this when estimating non-AI risks so it seems good to be consistent. ↩︎ 8. MacAskill mentions in a forum comment he liked Ben Garfinkel’s review of the report. I personally didn’t find it that persuasive and generally agreed with Carlsmith’s counterpoints more than Garfinkel’s points, but it might be a good source for those who want the best arguments that Carlsmith is overestimating rather than underestimating AI risk. ↩︎ 9. See also A concern about the “evolutionary anchor” of Ajeya Cotra’s report on AI timelines for some pushback and discussion. ↩︎ 10. See also AI Forecasting: One Year In: “Specifically, progress on ML benchmarks happened significantly faster than forecasters expected. But forecasters predicted faster progress than I did personally, and my sense is that I expect somewhat faster progress than the median ML researcher does.” ↩︎ 11. See also this comment by Carl Shulman: “There are very expensive interventions that are financially constrained… so that e.g. twice the probability of AGI in the next 10 years justifies spending twice as much for a given result by doubling the chance the result gets to be applied” ↩︎ 12. I think there’s some chance aligning AGIs turns out to not be that hard (e.g. if we just tell it to do good stuff and not to do bad stuff plus somewhat intensive adversarial training + red-teaming, it mostly works) but it’s <50%. ↩︎ 13. In earlier drafts I gave a 10x lower chance of stagnation and a 2.5x shorter expected length for 25x less weight overall, but after some discussion I now think I was underselling the evidence in the book for decline in researcher productivity (I previously thought it perhaps hadn’t declined at all, and now just disagree about the degree) as well as the arguments for long expected length of stagnation. While I still place significantly less weight on stagnation than MacAskill, I’m less skeptical than I initially was (and differing AI timelines are doing about half of the work on my skepticism). ↩︎ 14. Another place where this is brought up is p.162: “Even if stagnation has only a one-in-three chance of occurring…” I’m not sure where the jump from a best guess of 35% to a lower reasonable bound of 33% comes from. ↩︎ 15. Though I’m a bit confused as the definition in the endnote seems to conflict with this quote on p.159, also discussed below: “Perhaps a stagnant future is characterized by recurrent global catastrophes that repeatedly inhibit escape from stagnation” ↩︎ 16. This is mentioned in the book on p. 156. See endnote 7.50: ‘Mu-Ming Poo, said in 2018 that “technically, there is no barrier to human cloning”’ ↩︎ 17. See Predicting Polygenic Selection for IQ by Beck ↩︎ 18. This point is inspired by this section of a rebuttal essay to Bloom et al.'s "Are Ideas Getting Harder to Find". ↩︎ 19. This point taken from this section of a rebuttal to Bloom et al. ↩︎ 20. A reviewer mentioned The Rise and Fall of American Growth: The U.S. Standard of Living Since the Civil War as a further reference defending the argument in endnote 7.10. I didn’t get a chance to look into the book. ↩︎ 21. I’m particularly unsure about this point, feel free to tear it to shreds if it’s invalid. ↩︎ 22. As opposed to e.g. wanting to avoid their own death. I think this is fine to have as a supplementary motivation if it helps increase productivity and grasp the urgency of the problem, but on reflection the motivation being altruistic seems good to me. That being said, if someone is really talented and not that altruistic but wants to work on AI safety I’d likely still be excited about them working on it, given how neglected the problem is. ↩︎ 23. To be clear, I’m excited about people who aren’t quickly convinced by weird ideas like a substantial probability of AI takeover in the next 50 years! But I’d prefer people who engage critically and curiously with weird ideas rather than disengage. ↩︎ 24. On why I think this objection is reasonable even though I don’t really provide a great alternative: Relying on self-reports to a significant extent feels like streetlight fallacy to me. I don't think we should update much based on self-reports, especially the type that involves asking people if their whole life has been good or bad rather than looking moment to moment. I don't think I need to provide a better alternative besides intuition / philosophical reasoning to make this claim. MacAskill does caveat self-reports some but I think the vibe is much more authoritative than I'd prefer, and the caveats much less strongly worded. I'd go for the vibe of "we have no idea wtf is going on here, here's the best empirical evidence we have but DON'T TAKE IT SERIOUSLY AT ALL IT'S REALLY SHITTY" ↩︎ # 338 New Comment 51 comments, sorted by Click to highlight new comments since: Hi Eli, thank you so much for writing this! I’m very overloaded at the moment, so I’m very sorry I’m not going to be able to engage fully with this. I just wanted to make the most important comment, though, which is a meta one: that I think this is an excellent example of constructive critical engagement — I’m glad that you’ve stated your disagreements so clearly, and I also appreciate that you reached out in advance to share a draft. Thanks Will! My dad just sent me a video of the Yom Kippur sermon this year (relevant portion starting roughly here) at the congregation I grew up in. It was inspired by longtermism and specifically your writing on it, which is pretty cool. This updates me emotionally toward your broad strategy here, though I'm not sure how much I should update rationally. Hi Will, really hope you can find time to engage. I think the points discussed are pretty cruxy for overall EA strategy! 3% chance of AI takeover, and 33% chance of TAI, by 2100, seems like it would put you in contention for winning your own FTX AI Worldview Prize[1] arguing for <7% chance of P(misalignment x-risk|AGI by 2070) (assuming ~2 of the 9% [3%/33%] risk is in the 2070-2100 window). 1. ^ If you were eligible Thanks for writing this! One thing I really agreed with. For instance, I’m worried people will feel bait-and-switched if they get into EA via WWOTF then do an 80,000 Hours call or hang out around their EA university group and realize most people think AI risk is the biggest longtermist priority, many thinking this by a large margin. I particularly appreciate your point about avoiding 'bait-and-switch' dynamics. I appreciate that it's important to build broad support for a movement, but I ultimately think that it's crucial to be transparent about what the key considerations and motivations are within longtermism. If, for example, the prospect of 'digital minds' is an essential part of how leading people in the movement think about the future, then I think that should be part of public outreach, notwithstanding how offputting or unintuitive it may be. (MacAskill has a comment about excluding the subject here). One thing I disagreed with. MacAskill at times seemed reluctant to quantify his best-guess credences, especially in the main text. I agree it's good to be transparent about priorities, including regarding the weight placed on AI risk within the movement. But I tend to disagree that it's so important to share subjective numerical credences and it sometimes has real downsides, especially for extremely speculative subjects. Making implicit beliefs explicit is helpful. But it also causes people to anchor on what may ultimately be an extremely shaky and speculative guess, hindering further independent analysis and leading to long citation trails. For example, I think the "1-in-6" estimate from The Precipice may have led to premature anchoring on that figure, and likely is relied upon too much relative to how speculative it necessarily is. I appreciate that there are many benefits of sharing numerical credences and you seem like an avid proponent of sharing subjective credences (you do a great job at it in this post!), so we don't have to agree. I just wanted to highlight one substantial downside of the practice. Hey Joshua, appreciate you sharing your thoughts (strong upvoted)! I think we actually agree about the effects of sharing numerical credences more than you might think, but disagree about the solution. But it also causes people to anchor on what may ultimately be an extremely shaky and speculative guess, hindering further independent analysis and leading to long citation trails. For example, I think the "1-in-6" estimate from The Precipice may have led to premature anchoring on that figure, and likely is relied upon too much relative to how speculative it necessarily is. I agree that this is a substantial downside of sharing numerical credences. I saw it first-hand with people taking the numbers in my previous post more seriously than I had intended (as you also mentioned!) However, I think there are large benefits to sharing numerical credences, such that the solution isn't to share credences less but instead to improve the culture around them. I think we should shift EA's culture be more favorable of sharing numerical credences even (especially!) when everyone involved knows they're tentative, brittle, etc. And we should be able to have discussions involving credences and worry less that others will take them too seriously I've been hopefully contributing to this some, e.g. by describing my motivation for including confidence numbers as: "I decided it was worth it to propose a definition and go ahead and use as many made-up numbers as possible for transparency." And I've attempted to push back when I've perceived others as having taken credences/BOTECs I've given too seriously in the past. Some more ideas for shifting the culture around numerical credences: 1. Use resilience to demonstrate how brittle your beliefs are. 2. Highlight how much other reasonable people disagree with your credences. 3. Openly change your mind and publicly shift your credences when new evidence comes in, or someone presents a good counter-argument. 4. Explicitly encourage others not to cite your numbers, if you believe they are too brittle (you mention this in your other comment). I'd love to get others' ideas for shifting the culture here! Oh, and I also quite liked your section on 'the balance of positive vs negative value in current lives'! This is a very thoughtful critique. What do you make of the argument that The Precipice and WWOTF work well together as a partnership that target different markets and could be introduced at different stages as people get into EA? Thanks John! My first-order reaction is that I'm somewhat skeptical but would need to hear the claim and argument for it fleshed out a little more to have a strong opinion. Below I'll list some reasons I'm initially skeptical (I maybe buy WWOTF could be better for like ~10-20% of people), though let me know if I'm misunderstanding your question as I don't understand the details. First, repeating a line from the post: I’m not making strong claims about other groups like the general public or policymakers, though I’d tentatively weakly prefer The Precipice for them as well since I think it’s more accurate and we need strong reasons to sacrifice accuracy. Note that this argument doesn't apply if you think WWOTF is more accurate than The Precipice, which I somewhat confidently believe is wrong but I know some reasonable people disagree. Second, I'm not sure what "target different markets" means exactly (which markets and how will they contribute to longtermist impact?), and am somewhat skeptical that it would outweigh the benefits of transparency and accuracy. I identify as consequentialist but have always had pretty strong intuitions toward transparency, being very up-front about things, etc. which could potentially be biasing my consequentialist assessment here. Third, on "introduced at different stages as people get into EA", first I'll repeat a line from the appendix: I think there are some things of interest to engaged EAs, but as I’ve argued I think the book isn’t a good introduction for potential highly engaged EAs. I understand the appeal in gentle introductions (I got in through Doing Good Better), but I think it’s 90% likely I would have also gotten very interested if I had gone straight to The Precipice and so would most highly engaged longtermist EAs. I'll also expand a bit on my personal experience here a bit to give a sense of what's informing my intuition. To caveat the below, I'm not claiming that MacAskill wasn't clear about his beliefs at the time in Doing Good Better (DGB) as I'm not sure if that's the case. I'll also caveat that my episodic memory isn't that great so some of this might be revisionist/inaccurate. I read DGB as required reading for a class in college in Spring 2018, and loved it. I was very excited to save many lives by earning to give, and started cutting out high-suffering foods from my diet. I then quickly started listening to 80,000 Hours podcasts and found out more about the diversity of causes in EA, and read some books that were discussed on the podcast which I really enjoyed (ones I remember are Superforecasting, Elephant in The Brain, The Case Against Education, and Superintelligence). I attended my first EAG in summer 2019, and it was overall a positive experience but I was somewhat struck by the prevalence of AI risk, compared to the diversity of DGB and the 80,000 Hours podcast. I also chatted with at least one person there who was very critical of the focus on AI risk, which made me pretty hesitant about it. I came out of the EAG mostly more excited but also a bit hesitant about AI stuff. I also went vegan after it. Over the course of the last 3 years, I've gotten progressively more convinced (especially on a gut level, but also intellectual) that it's actually reasonable to worry about high levels of AI risk and this might just actually be the most important thing in the world to work on. See my recent post, especially this section for more background on my experience here. So basically, overall I understand the appeal of gentle introductions then ramping up. But my personal experience makes me feel like I could have gotten on board with the arguments and implications a bit faster if they had been more straightforwardly introduced to me earlier. I still think a period of skepticism is healthy, but I think I could have easily been a bit more turned off and felt more bait-and-switched than I did and disengaged from EA and AI risk. And I worry that other promising people who tend to be skeptical but interested in claims that sound wild/weird might be turned off to a larger extent. Enjoyed the post but I'd like to mention a potential issue with points like these: I’m skeptical that we should give much weight to message testing with the “educated general public” or the reaction of people on Twitter, at least when writing for an audience including lots of potential direct work contributors. I think impact is heavy-tailed and we should target talented people with a scout mindset who are willing to take weird ideas seriously. I would put nontrivial weight on this claim: the support of the general public matters a lot in TAI worlds, e.g., during 'crunch time' or when trying to handle value lock-in. If this is true and WWOTF helps achieve this, it can justify writing a book focusing less on people who are already prone to react in ways we typically assoicate with a scout mindset. Increasing direct work in the usual sense is one thing to optimise for; another is creating an enviroment receptive to proposals and cooperation with those who do direct work. So although I understand that you're not making strong claims about other groups like the general public or policymakers, I think it's worth mentioning that "I'd rather recommend The Precipice to people who might do impactful work" and "WWOTF should have been written differently" are very importantly distinct claims. So although I understand that you're not making strong claims about other groups like the general public or policymakers, I think it's worth mentioning that "I'd rather recommend The Precipice to people who might do impactful work" and "WWOTF should have been written differently" are very importantly distinct claims. I agree with this. The part you quoted is from the appendix and an ideal world it would be more rigorously argued with the claims you identified separated more cleanly. But in practice it should probably be thought more of as "stream-of-consciousness reactions from Eli as he read Will's posts/comments" (which is part of why I put it in the appendix). I would put nontrivial weight on this claim: the support of the general public matters a lot in TAI worlds, e.g., during 'crunch time' or when trying to handle value lock-in. If this is true and WWOTF helps achieve this, it can justify writing a book focusing less on people who are already prone to react in ways we typically assoicate with a scout mindset. Increasing direct work in the usual sense is one thing to optimise for; another is creating an enviroment receptive to proposals and cooperation with those who do direct work. Epistemic status: speculation about something I haven't thought about that much (TAI governance and public opinion) I appreciate you making the benefits more concrete. However, I'm still not really sure I fully understand the scenario where WWOTF moves the needle here and how it will help much compared to alternatives. I'll list my best guess as to more explicit steps on the path to impact (let me know if I'm assuming wrong, a lot of this is guessing!), and my skepticisms about each step: 1. Many in the general public read WWOTF and over time through ideas spreading in various ways many people become much more on board with the general idea of longtermism. 1. I'm skeptical that > ~25% of the general public both (a) has the bandwidth/slack to care about the long-term future as opposed to their current issues and (b) is philosophically inclined enough to think about morality in this way. Maybe this could happen as a cultural shift over the course of several generations, but it feels like <5% to me in <40 year timelines worlds. 2. We either (a) convince the general public to care a specifically large amount about misaligned AI risk and elect politicians who care about it, or (b) get politicians on board with general longtermist platforms but actually the thing we care about most is misaligned AI risk. 1. My skepticism about (a) is that if you really believe the general public is savvy enough to get on board with a large amount of misaligned AI risk, I feel like you should also believe they're savvy enough to feel bait-and-switched by this two-step conversion process rather than us being more upfront about our beliefs. 2. My skepticism about (b) is that it feels intellectually dishonest to not be upfront about what we actually care the most about by far, and this will probably backfire in some way even if hard to predict how in advance (one possibility is the most savvy journalists figuring out what's going on, then write hit pieces and turn the public). 3. The politicians who care about AI risk then help avoid value lock-in during TAI crunch time. 1. This seems good if we can actually achieve the first two steps and I'm wrong. I'm uncertain about how good; not sure how much influence politicians will have here vs. people at top AI labs. Some alternatives to the 3 step plan I've interpreted above that feel higher EV per effort spent, and often more direct: 1. Outreach to ML academics, like Vael Gates is doing. 2. Write a book containing a very high-quality treatment of alignment, to get some of the most savvy public intellectuals / journalists on board with alignment in particular. 3. Make higher quality resources to convince ML researchers in top industry labs, like Richard Ngo is doing. This was helpful; I agree with most of the problems you raise, but I think they're objecting to something a bit different than what I have in mind. Agreement: 1a,1b,2a • I am also very sceptical that >25% of the general public satisfies (1a) or (1b). I don't think these are the main mechanisms through which the general public could matter regarding TAI. The same applies to (2a). Differences: 2b,3a,alternatives • On (2b): I'm a bit sceptical that politicians or policymakers are sufficiently nitpicky for this to be a big issue, but I'm not confident here. WWOTF might just have the effect of bringing certain issues closer to the edges of the Overton window. I find it plausible that the most effective way to make AI risk one of these issues is in the way WWOTF does it: get mainstream public figures and magazines talking about it in a very positive way. I could see how this might've been far harder with a book that allows people to brush it off as tech-bro BS more easily. On there being intellectually dishonesty: I worry a bit about this, but maybe Will is just providing his perspective and that's fine. We can still have others in the longtermist community disagree on various estimates. Will for one has explicitly tried not to be seen as a leader of a movement of people who just follow his ideas. I'd be surprised if differences within the community become widely seen as intellectual dishonesty from the outside (though of course isolated claims like these have been made already). So, maybe what we want from politicians and policymakers during important moments is for them to be receptive to good ideas. The perceived prioritisation of AI within longtermist writing might just not turn out to be that crucial. I'm open to change my mind on this but I don't expect there to be much conflict between different longtermist priorities such that policymakers will in fact need to choose between them. That's a reason I'd expect that the best we can do is to make certain problems more palatable so that when an organisation tells policymakers "we need policy X, else we raise the risk of AI catastrophe" they are more likely to listen. • On (3a): I'm also very uncertain here but conditional on some kind of intent alignment, it becomes a lot more plausible to me that coordination with the world outside top labs becomes valuable, e.g., on values, managing transitions, etc. (especially if takeoff is slow). • On alternative uses of time: Those three project seem great and might be better EV per effort spent, but that's consistent with great writers and speakers like Will having a comparative advantage in writing WWOTF. The mechanism I have in mind is a bit nebulous. It's in the vein of my response to (2a), i.e., creating intellectual precedent, making odd ideas seem more normal, etc. to create an environment (e.g., in politics) more receptive to proposals and collaboration. This doesn't have to be through widespread understanding of the topics. One (unresearched) analogue might be antibiotic resistance. People in general, including myself, know next to nothing about it, but this weird concept has become respectable enough that when a policymaker Googles it, they know it's not just some kooky fear than nobody outside strangely named research centres worry about or respectfully engage with. Background on my views on the EA community and epistemics Epistemic status: Passionate rant I think protecting and improving the EA community's epistemics is extremely important and we should be very very careful about taking actions that could hurt it to improve on other dimensions. First, I think that the EA community's epistemic advantage over the rest of the world in terms of both getting to true beliefs via a scout mindset, and taking the implications seriously is extremely important for the EA community's impact. I think it might be even more important than the moral difference between EA and the rest of the world. See Ngo and Kwa for more here. In particular, in seems like we're very bottlenecked on epistemics in AI safety, perhaps the most important cause area. See Muelhauser and MIRI conversations. Second, I think the EA community's epistemic culture is an extremely important thing to maintain as an attractor for people with a scout mindset and taking-ideas-seriously mentality. This is a huge reason that I and I'm guessing many others love spending time with others in the community, and I'm very very wary about sacrificing it at all. This includes people being transparent and upfront about their beliefs and the implications. Third, the EA community's epistemic advantage and culture are extremely rare and fragile. By default, they will erode over time as ~all cultures and institutions do. We need to try really hard to maintain them. Fourth, I think we really need to be pushing the epistemic culture to improve rather than erode! There is so much room for improvement in quantification of cost-effectiveness, making progress on long-standing debates, making it more socially acceptable and common to critique influential organizations and people, etc. There's a long way to go and we need to move forward not backwards. On (2b): I'm a bit sceptical that politicians or policymakers are sufficiently nitpicky for this to be a big issue, but I'm not confident here. WWOTF might just have the effect of bringing certain issues closer to the edges of the Overton window. I find it plausible that the most effective way to make AI risk one of these issues is in the way WWOTF does it: get mainstream public figures and magazines talking about it in a very positive way. I could see how this might've been far harder with a book that allows people to brush it off as tech-bro BS more easily. I think this is a fair point, but even if it's right I'm worried about trading off some community epistemic health to appear more palatable to this crowd. I think it's very hard to consistently present your views in a fairly different way publicly than they are presented in internal conversations, and it hinders intellectual progress of the movement. I think we need to be going in the other direction; Rob Bensinger has a twitter thread on how we need to be much more open and less scared of saying weird things in public, to make faster progress. On there being intellectually dishonesty: I worry a bit about this, but maybe Will is just providing his perspective and that's fine. We can still have others in the longtermist community disagree on various estimates. Will for one has explicitly tried not to be seen as a leader of a movement of people who just follow his ideas. I'd be surprised if differences within the community become widely seen as intellectual dishonesty from the outside (though of course isolated claims like these have been made already). Sorry if I wasn't clear here: I'm most worried about Will not being fully upfront about the implications of his own views. On alternative uses of time: Those three project seem great and might be better EV per effort spent, but that's consistent with great writers and speakers like Will having a comparative advantage in writing WWOTF. Seems plausible, though I'm concerned about community epistemic health from the book and the corresponding big media push. If a lot of EAs get interested via WWOTF they may come in with a very different mindset about prioritization, quantification, etc. The mechanism I have in mind is a bit nebulous. It's in the vein of my response to (2a), i.e., creating intellectual precedent, making odd ideas seem more normal, etc. to create an environment (e.g., in politics) more receptive to proposals and collaboration. This doesn't have to be through widespread understanding of the topics. One (unresearched) analogue might be antibiotic resistance. People in general, including myself, know next to nothing about it, but this weird concept has become respectable enough that when a policymaker Googles it, they know it's not just some kooky fear than nobody outside strangely named research centres worry about or respectfully engage with. Seems plausible to me, though I'd strongly prefer if we could do it in a way where we're also very transparent about our priorities. (also, sorry for just bringing up the community epistemic health thing now. Ideally I would have brought it up earlier in this thread and discussed it more in the post but have been just fleshing out my thoughts on it yesterday and today.) Nodding profusely while reading; thanks for the rant. I'm unsure if there's much disagreement left to unpack here, so I'll just note this: • If Will was in fact not being fully honest about the implications of his own views, then I doubt pretty strongly that this could be worth any potential benefit. (I also doubt there'd be much upside anyway given what's already in the book.) • If the claim is purely about framing, I can see very plausible stories for costs regarding people entering the EA community, but I can also see stories for the benefits I mentioned before. I find it non-obvious that a lack of prioritisation/quantification in WWOTF leads to a notably lower-quality EA community as misconceptions may be largely corrected when people try to engage with the existing community. Though I could very easily change my mind on this; e.g., it would worry me to see lots of new members with similar misconceptions enter at the same time. The magnitude of the pros and cons of the framing seems like an interestingly tough empirical question. Roughly agree with both of these bullet points! I want to be very clear that I have no reason to believe that Will wasn't being honest and on the contrary believe he very likely was, my concerns are about framing. And I agree the balance of costs and benefits regarding framing aren't super obvious but I am pretty concerned about the possible costs. Not sure the maths is right on those 6 probabilities? 0.8 * 0.85 * 0.75 * 0.9 * 0.8 * 0.95 = 35% chance. Sorry if I've misunderstood what you were trying to do there. I think it's pretty embarrassing for us as a community that this post got to >200 karma and ~30 comments before such a simple math error was found. (I'm a culprit here too; I upvoted this post upon skimming it when it first came out, without even doing the most basic of sanity checks) Yeah I’m personally a bit embarrassed by not catching it myself as well. I’m thankful that at least it changed very little about the thrust of the post. I don't think you need be embarrassed - you responded politely & not defensively. That's definitely rarer on the internet than minor arithmetic errors! Really appreciate the sanity check. You're totally right, oops. Looks like when I multiplied I somehow missed the .8. I will edit the post. [For people reading this after I've edited: I previously had 44% from my independent impression due to messing up the multiplication, then adjusted down to 40% for my all-things-considered view. Now I have 35% for both independent impression and all-things-considered view.] (Disclaimer: I didn't read the whole post.) It sounds like at least part of your argument could be summarized as: "Will MacAskill underrates x-risk relative to most longtermists; The Precipice is a better introduction because it focuses on x-risk." IMO, x-risk reduction is not the only (potential) way of influencing the long-term future, so it's good for an introductory book on longtermism to be fairly agnostic on whether to prioritize x-risk reduction. (More on the object level, I believe the longtermist community has been too quick to settle on x-risk as the main thing worth working on, and it would be good to have more work on other areas, although I still think x-risk should be the top longtermist priority.) It sounds like at least part of your argument could be summarized as: "Will MacAskill underrates x-risk relative to most longtermists; The Precipice is a better introduction because it focuses on x-risk." I don't have a strong view about the focus on x-risk in general. I care most about the lack of clarity on which areas are highest priority, and what I believe is a mistake in not focusing on AI enough (and the wrong emphasis within AI). In the post I wrote: One big difference between The Precipice and WWOTF that I’m not as sure about is the framing of reducing x-risks as interventions as opposed to trajectory changes and safeguarding civilization. I lean toward The Precipice and x-risks here but this belief isn’t very resilient. I also discussed this further in the appendix. IMO, x-risk reduction is not the only (potential) way of influencing the long-term future, so it's good for an introductory book on longtermism to be fairly agnostic on whether to prioritize x-risk reduction. (More on the object level, I believe the longtermist community has been too quick to settle on x-risk as the main thing worth working on, and it would be good to have more work on other areas, although I still think x-risk should be the top longtermist priority.) I think we have an object-level disagreement here, likely due to different beliefs about AI. I agree reducing x-risk isn't the only possible longtermist intervention, but I am not convinced that many others are relatively promising. I think influencing AI stuff directly or indirectly is likely the most important lever for influencing the future, whether this is framed as x-risk or s-risk or a trajectory change or safeguarding civilization (x-risk still seems most natural to me, but I'm fine with other phrasings and am also concerned about value lock-in and s-risks though I think these can be thought of as a class of x-risks). We also might have different beliefs about the level of agnosticism appropriate in an introductory book. I agree it shouldn't give too strong vibes of "we've figured out the most effective things and they are A, B, and C" but I think it's valuable to be pretty clear about our current best guesses. I'm fine with other phrasings and am also concerned about value lock-in and s-risks though I think these can be thought of as a class of x-risks I'm not keen on classifying s-risks as x-risks because, for better or worse, most people really just seem to mean "extinction or permanent human disempowerment" when they talk about "x-risks." I worry that a motte-and-bailey can happen here, where (1) people include s-risks within x-risks when trying to get people on board with focusing on x-risks, but then (2) their further discussion of x-risks basically equates them with non-s-x-risks. The fact that the "dictionary definition" of x-risks would include s-risks doesn't solve this problem. I think this is a valid concern. Separately, it's not clear that all s-risks are x-risks, depending on how "astronomical suffering" and "human potential" are understood. What do you think about the concept of a hellish existential catastrophe? It highlights both that (some) s-risks fall under the category of existential risk and that they have an additional important property absent from typical x-risks. The concept isolates a risk the reduction of which should arguably be prioritized by EAs with different moral perspectives. Nitpicking: A property of making directional claims like this is that MacAskill always has 50% confidence in the claim I’m making, since I’m claiming that his best-guess estimate is too high/low. This isn't quite right. Conservation of expected evidence means that MacAskill's current probabilities should match his expectation of the ideal reasoning process. But for probabilities close to 0, this would typically imply that he assigns higher probability to being too high than to being too low. For example: a 3% probability is compatible with 90% probability that the ideal reasoning process would assign probability ~0% and a 10% probability that it would assign 30%. (Related.) This is especially relevant when the ideal reasoning process is something as competent as 100 people for 1000 years. Those people could make a lot of progress on the important questions (including e.g. themselves working on the relevant research agendas just to predict whether they'll succeed), so it would be unsurprising for them to end up much closer to 0% or 100% than is justifiable today. Great point, I'm a bit disappointed in myself for not catching this! I'll strike this out of the post and link to your comment for explanation. On the point about working on the relevant research agendas, I hadn’t thought about that and kind of want to disallow that from the definition. But I feel the line would then get fuzzy as to what things exactly count as object level work on research agendas. Edit: After thinking more, I will edit the definition to clarify that the people doing the reasoning can only deliberate about current evidence rather than acquire new evidence. This might still be a bit vague but it seems better than not including. I'm curating this post. [Disclaimer: writing quickly.] Given the amount of attention that What We Owe the Future is getting, it's important to have high-quality critical reviews of it. Here are some things I particularly liked about this one: • I appreciate the focus on clarity around beliefs. [Related: Epistemic legibility.] • I think the section on "What I like" is great (and I agree that the Significance/Persistence/Contingency framework is a really useful tool), and having that section was important. More broadly, the review is quite generous and collaborative. • I really like that there are concrete forecasts and credences and that the criticisms of or disagreements with the book are specific. Relatedly, the post is careful to link to specific sources and cite relevant passages. • The review doesn't nitpick and look for minor errors; it focuses on serious disagreements. • The post is action-relevant (do you give someone The Precipice or WWOTF?). • I really like the discussion in the comments of this post. (And I like that commenters pointed out errors that the author then edited.) • There are summaries and headings, which makes the post skimmable and easier to navigate for people who want to only read a particular section. I should clarify that I disagree with some aspects of the review, but still think it makes lots of true and relevant claims, and appreciate it overall. (E.g. I agree with a commenter that WWOTF does a great job "bringing longtermism into the Overton window," and that this is more important than the review acknowledges it to be.) I really apprechiate seeing your credences and your reasoning. It was also cool to see Samotsvety forecast stagnation. It looks like Samotsvety also forecasted AI timelines and AI takeover risk - are you willing and able to provide those numbers as well? Glad the forecasts and reasoning were useful! We did do some forecasting of timelines and takeover based on the Carlsmith report, but the data isn't recorded super well (and some people could only make some of the multiple sessions, etc.) which is why I was a bit vague in the post. Also the forecasts were for 2070 rather than 2100. I'll see if I can get some updated forecasts within the next week or so and get back to you. Just published some up-to-date forecasts here! A few of the headline aggregate forecasts are: 1. 25% chance of misaligned AI takeover by 2100, barring pre-APS-AI catastrophe 2. 81% chance of Transformative AI (TAI) by 2100, barring pre-TAI catastrophe 3. 32% chance of AGI being developed in the next 20 years I'll edit a link to these into the post soon but am getting an error right now when trying to edit. Thanks for writing this, Eli. I haven't read WWOTF and was hoping someone would produce an analysis like this (especially comparing The Precipice to WWOTF). I've seen a lot of people posting enthusiastically about WWOTF (often before reading it) and some of the press that it has been getting (e.g., cover of TIME). I've felt conflicted about this. On one hand, it's great that EA ideas have the opportunity to reach more people. On the other hand, I had a feeling (mostly based on quotes from newspaper articles summarizing the book) that WWOTF doesn't feature AI safety and doesn't have a sense of "hey, a lot of people think that humanity only has a few more decades [or less] to live." I hope that EAs concerned about AIS champion resources that accurately reflect their sense of concern, feature AI safety more prominently, and capture the emotion/tone felt by many in the AIS community. (List of Lethalities is a good example here, though it has its own flaws and certainly isn't optimizing for widespread appeal in the same way that WWOTF seems to be). Thanks for the post. Off-the-cuff reflection on communication (mainly triggered by the appendix): 1. It is very difficult to figure out what is most useful to say to an individual person, let alone a given group of people. We should expect to face great uncertainty and huge room for reasonable disagreement about many, perhaps most of the big decisions here. 2. While strict adherence to “say what you believe is true” is nearly always the right approach in spirit, you can’t avoid big decisions about emphasis, sequencing, premises you make explicit or implicit, ideas you argue for or take as given, the terminology you use, the tone, use of metaphor or example or logical notation, and so on. 3. There are surely some things that most would agree on as a big mistake (e.g. tasteless or unnecessarily upsetting examples), and lots of general rules of thumb people agree on too. But for something like how to talk about AI risk and X-risk estimates and timelines… that’s always going to be hard. 4. My impression is that Will did an unusually large amount of work to test arguments, examples and messages with different audiences while writing the book (think: lots of seminars and talks, not just asking contractors to message test using Amazon Turk). 5. Even if authors are occasionally willing to put serious effort into this kind of “user research”, my guess is that we need to heavily rely upon a portfolio approach. 6. On a portfolio approach, the stuff that we expect to get the biggest audience should probably be the most risk-averse, i.e. the bigger the reach, the more concerned it should be with not hard-bouncing people, and leaving most of the readers with at least a somewhat positive impression of the ideas and the people working on them. Oh cool I just saw that Will MacAskill wrote a long comment on this topic in reply to Scott Alexander. I’m skeptical that we should give much weight to message testing with the “educated general public” or the reaction of people on Twitter, at least when writing for an audience including lots of potential direct work contributors. Yes, if the purpose of the book is to persuade talented readers to start working on AIS. Yet it could be more valuable to reap the indirect positive effects of bringing longtermism into the Overton window. As a crude example, it's now more likely that Terrence Tao will feel fine about working on alignment; an AI-focused MacAskill book might have failed to accomplish that due to lower popularity. EDIT: You've somewhat addressed this in response to another comment. I'll add that there was a nontrivial chance of WWOTF becoming a NYT #1 bestseller for 30 weeks and giving longtermism a Silent Spring moment. More targeted "let's start working on AI" outreach is good, but I'm not so sure that it's higher EV. I agree that to the extent WWOTF robustly improves the general public attitude toward longtermism this is an upside. There are of course potential downsides to popularizing the movement too quickly and growing too fast as well (edited to add: for more on potential downsides, see this comment). As I mention in the piece, I'm significantly more confident in the claim that The Precipice is better for potential longtermist direct workers than about WWOTF vs. The Precipice for the general public. This also goes for claims about WWOTF's effects in general, compared to no new book being released; I didn't intend to make strong claims about that in this post. In your X-risk estimation, you give 6 probability estimates, all of which are in the 75-95% range, despite being about wildly different questions. This seems suspiciously narrow given the wide range of possible estimates (especially if considering a logarithmic scale). Did you arrive at these numbers through any sort of mathematical basis, or are you just quantifying vibes here? Hi titotal, thanks for engaging with the post. For onlookers, I believe titotal is referring to my forecast for the probability of takeover by misaligned power-seeking AI in this section of the post. you give 6 probability estimates, all of which are in the 75-95% range, despite being about wildly different questions. This seems suspiciously narrow given the wide range of possible estimates (especially if considering a logarithmic scale) I agree it's a little suspicious (see this post for what suspicions I've had over time about high probabilities of AI x-risk). That being said, I'm not as concerned about it as you because the questions in the decomposition aren't drawn from a distribution from which I think we should naively expect the probabilities to be distributed uniformly between 0 and 100%. In particular, the decomposition was generated via taking a claim which Carlsmith was concerned enough about that he thought it might warrant substantial credence, then splitting it into several conjunctive steps (and no disjunctive!). If I were very reluctant to consistently give fairly high probabilities to the steps even when I thought high probabilities were warranted, I think this would bias me too much towards a low estimate. I'd recommend this section of Soares's review of the report for more on this point. My position on the "multi-stage fallacy" is somewhere in between Soares and Carlsmith: I think doing this decomposition is still very useful, but we should be very careful about it biasing us and should potentially try to reconcile disjunctive breakdowns with conjunctive. I might write some more thoughts on this sometime: I think a really cool exercise would be doing a bunch of different decompositions of AI x-risk, making forecasts on them, then trying to reconcile the differences in results. Did you arrive at these numbers through any sort of mathematical basis, or are you just quantifying vibes here? I try to be as transparent in my reasoning as possible, though I think open-ended forecasting of this sort will always involve some level of "quantifying vibes". I strongly disagree that this is a reason to avoid it, but agree this is a reason not take it too seriously. If it's at all reassuring, I've written about my track record on quantifying vibes and I think it's decent overall. Hey, thank you for taking the time to explain your position, I appreciate it. I'm not trying to take a dig at your estimates in particular, this is merely part of my suspicion that EA as a whole has widespread flaws in estimation of unbounded probabilities. Let's start from your six questions again: 1. Timelines: By 2070, it will be possible and financially feasible to build APS-AI: systems with advanced capabilities (outperform humans at tasks important for gaining power), agentic planning (make plans then acts on them), and strategic awareness (its plans are based on models of the world good enough to overpower humans). 2. Incentives: There will be strong incentives to build and deploy APS-AI. 3. Alignment difficulty: It will be much harder to build APS-AI systems that don’t seek power in unintended ways, than ones that would seek power but are superficially attractive to deploy. 4. High-impact failures: Some deployed APS-AI systems will seek power in unintended and high-impact ways, collectively causing >1 trillion in damage.
5. Disempowerment: Some of the power-seeking will in aggregate permanently disempower all of humanity.
6. Catastrophe: The disempowerment will constitute an existential catastrophe.

So starting with the problems of decomposition: You're right that the errors in answers to the six questions will be correlated, so treating them like independent events can lead to compounded error in the final estimate. But they seem to  imply that this necessarily causes an underestimation, when to my mind it's just as likely to cause an overestimation. If there was a systematic bias that caused you to double the estimate for each step, then the actual value would be about two orders of magnitude lower than your estimate.

For example, one source of correlation would be in how powerful you expect an AI to actually be. Hypothetically, if there turned out to be serious obstacles in  power-seeking AI development  that limited it to being "very good" instead of "overpowering", it would simultaneously lower the odds for steps 1,2,4, and  5  together. (Of course, the opposite would apply if it turned out to be even stronger than expected).

Anyway, this partially explains why the estimates are similar, but I don't think it lets you off the hook entirely, as they aren't perfectly  correlated. For example question 3 is fundamentally about the difficulty of coding goal functions into software, while question 5 is fundamentally about the potential capabilities of AI and the ability of society to resist an AI takeover. It still seems weird that you arrived at pretty much the same probability for both of them.

I think a really cool exercise would be doing a bunch of different decompositions of AI x-risk, making forecasts on them, then trying to reconcile the differences in results.

I would be quite interested in this! I think the way you split it can have a significant effect on what seems intuitive. For example, I believe that it will be very hard to program an AI that is not misaligned in some damaging way, but very easy to program one that is not an x-risk or s-risk threat. This objection doesn't really jibe well with the 6 questions above: it might look like high estimates for questions 1-4, then a sudden massively low probability for question 5. It seems like the choice of decomposition will depend on what you think the "key questions" are.

I try to be as transparent in my reasoning as possible, though I think open-ended forecasting of this sort will always involve some level of "quantifying vibes". I strongly disagree that this is a reason to avoid it, but agree this is a reason not take it too seriously. If it's at all reassuring, I've written about my track record on quantifying vibes and I think it's decent overall.

Well done on the impressive forecasting performance! I'm certainly not against forecasting in general, but I do have concerns about forecasting for low-probability and unbounded probability events. I'm not convinced that expertise at forecasting things with lots of data and evidence surrounded them such as the inflation rate next quarter will transfer to a question like "what is the probability that the universe is a simulation and will shut down within the next century". The former seems mostly evidence based, with a little vibes thrown in, while the latter is almost entirely vibes based, with only the barest hint of evidence thrown in. I view AI safety as somewhere in between the two.

Anyway, I'll probably be spinning this off into a whole post,  so discussion is highly welcome!

tl;dr I agree with a decent amount of this. I'd guess our disagreements are mainly on the object-level arguments for and against AI risk.

So starting with the problems of decomposition: You're right that the errors in answers to the six questions will be correlated, so treating them like independent events can lead to compounded error in the final estimate. But they seem to  imply that this necessarily causes an underestimation, when to my mind it's just as likely to cause an overestimation. If there was a systematic bias that caused you to double the estimate for each step, then the actual value would be about two orders of magnitude lower than your estimate.

I agree that correlations in particular could cause overestimation rather than underestimation, and didn't mean to imply otherwise.

My primary point above was not about correlations between premises though; it was about the process of taking a claim we think might warrant substantial credence, then splitting it into several steps that are all conjunctive (then assigning probabilities, for which people are often reluctant to seem overconfident on).

Anyway, this partially explains why the estimates are similar, but I don't think it lets you off the hook entirely, as they aren't perfectly  correlated. For example question 3 is fundamentally about the difficulty of coding goal functions into software, while question 5 is fundamentally about the potential capabilities of AI and the ability of society to resist an AI takeover. It still seems weird that you arrived at pretty much the same probability for both of them.

I'm a little confused here; it seems like since I gave 6 probabilities, it would on the contrary be surprising if 2 of them weren't pretty close to each other, even the more uncorrelated ones?

I would be quite interested in this! I think the way you split it can have a significant effect on what seems intuitive. For example, I believe that it will be very hard to program an AI that is not misaligned in some damaging way, but very easy to program one that is not an x-risk or s-risk threat. This objection doesn't really jibe well with the 6 questions above: it might look like high estimates for questions 1-4, then a sudden massively low probability for question 5. It seems like the choice of decomposition will depend on what you think the "key questions" are.

Thanks for linking your reasoning for thinking it might be easy to create an AI that isn't an x-risk or s-risk threat. I skimmed it and agree with the top comment: I think the strategy you described would likely strongly limit the capabilities of the AI system a lot even if (and I think it's a big if) it succeeded in alignment. I think we need an alignment solution which doesn't impose as big of a capabilities penalty.

I think it's fine if you have a much lower probability for one of the questions in the decomposition than others! I don't see what's inherently wrong with that.

I'm not convinced that expertise at forecasting things with lots of data and evidence surrounded them such as the inflation rate next quarter will transfer to a question like "what is the probability that the universe is a simulation and will shut down within the next century". The former seems mostly evidence based, with a little vibes thrown in, while the latter is almost entirely vibes based, with only the barest hint of evidence thrown in. I view AI safety as somewhere in between the two.

I agree with you directionally, but I'd argue that my track record has many questions that don't have much data and evidence behind them, especially in comparison to your example about the inflation rate. As a quick example, I did well in the Salk Tournament for SARS-CoV-2 Vaccine R&D. I think for many of the questions, we did have some past data to go on but not much and the right choice of reference class is very unclear compared to your inflation example.

Anyway, I'll probably be spinning this off into a whole post,  so discussion is highly welcome!

Looking forward to it, let me know if feedback would be useful :)

Yeah, no worries! I think this is helping me to figure out what my issue is, which i think is related to what probability ranges are "reasonable".

I'm a little confused here; it seems like since I gave 6 probabilities, it would on the contrary be surprising if 2 of them weren't pretty close to each other, even the more uncorrelated ones?

That's the thing. I do think it's surprising. When we are talking about speculative events, the range of probabilities should be enormous. If I estimate the odds that vladimir putin is killed by a freak meteor strike, the answer is not in the 1-99% range, it's in the 1 in a billion range. What are the odds that paraguay becomes a world superpower in the next 50 years?  1 in a million? 1 in a trillion?  Conversely, what are the odds that the sun will rise on the earth in 2070? About as close to 1 as it's possible for an estimate to get.

When we consider question 5, we are asking about the winner of a speculative war between a society that we are very uncertain about and an AI that we know next to nothing about. In question 3, you are asking for an estimate of the competence of future AI engineers at constraining an as yet unknown AI design. In both cases, I see the "reasonable range" of estimates as being extremely broad. I would not be surprised if the "true estimate" (if thats even a coherent concept) for Q5 was 1 in a billion, or 1 in a thousand, or nearly 1 in 1. This is what I strikes me as off. Out of all the possible answers for these highly speculative questions in logarithmic space, why would both of them end up in the 75-85% range?

Consider, by contrast, your covid-19 predictions. these seem to be bounded  in a way that the examples above aren't. There was uncertainty about whether the vaccine would be implemented in 2021 or 2023, perhaps you could make a reasonable case for predicting it would take until like 2025. But if I gave an answer of "2112 ad", you would look at me like a crazy person.  It seems like the AI estimates are unbounded in their ranges in a way the metacalculus questions aren't.

This is where the object level and the meta level get kinda hard to untangle. I think if you accept my meta level reasoning, it also necessitates lowering your object level estimates. If you try and make the estimates for the individual steps vary more (by putting a 0.1% in there or something), the total probability will then end up being as low as that step. But i'm not sure if this is necessarily wrong? If your case relies on a chain of at least somewhat independent unbounded speculative events, placing odds as high as 40% seems like it's an error on it's face.

The way I think about what range of probabilities is reasonable is mostly by considering reference classes for (a) the object-level prediction being made and (b) the success rate of relatively similar predictions in the past. I agree that a priori most claims that feel very speculative we'd expect to have little confidence in, but I think we can get a lot of evidence from considering more specific reference classes.

Let's take the example of determining whether AI would be disempower humanity:

For (a), I think looking at the reference class of "do more intelligent entities disempower less intelligent entities? (past a certain level of intelligence)” is reasonable and would give a high baseline (one could then adjust down from the reference class forecast based on how strong the considerations are that we will potentially able to see it coming to some extent, prepare in advance, etc.).

For (b), I think a reasonable reference class would be previous long-term speculative forecasts made my futurists. My read is that these were right about 30-50% of the time.

Also:

In both cases, I see the "reasonable range" of estimates as being extremely broad. I would not be surprised if the "true estimate" (if thats even a coherent concept) for Q5 was 1 in a billion, or 1 in a thousand, or nearly 1 in 1.

I agree that we shouldn't be shocked if it's the case that the "true"estimate for at least one of the questions is very confident, but I don't think we should be shocked if the "best realistically achievable" estimates for all of them aren't that confident. Where "best realistically achievable" estimates are subject to our very limited time and reasoning capacities.

I think the choice of reference class is itself a major part of the object level argument. For example, instead of asking "do more intelligent entities disempower less intelligent entities", why not ask "does the side of a war starting off with vastly more weapons, manpower and resources usually win?". Or "do test subjects usually escape and overpower their captors?" Or " Has any intelligent entity existed without sufficient flaws to prevent them from executing world domination?". These reference classes intuit a much lower estimation.

Now, all of these reference classes are flawed in that none of them correspond 1 to 1 with the actual situation at hand. But neither does yours! For example, in none of the previous cases of higher intelligence overpowering lower intelligences has the lower intelligence had the ability to write the brain of the higher intelligence. Is this a big factor or a small factor? Who knows?

As for b), I just don't agree that predictions about the outcome of future AI wars are in a similar class to questions like "will there be manned missions to mars" or "predicting the smartphone".

Anyway, I'm not too interested in going in depth on the object level right now.  Ultimately I've only barely scratched the surfaces of the flaws leading to overestimation of AI risk, and it will take time to break through, so I thank you for your illuminating discussion!

I agree that the choice of reference class matters a lot and is non-obvious (and hope I didn’t imply otherwise!).

I think impact is heavy-tailed and we should target talented people with a scout mindset who are willing to take weird ideas seriously. [As opposed to general public opinion]

I think even if the goal is to directly increase the number of talented people directly engaged in AI x-risk, general public opinion still matters. Public opinion is the default Overton window all these talented people have, and if "dedicate your career to AI risk" is closer to that window you get more talented people making the leap. Imagine if talented people didn't need a great scout mindset to want to work on AI risk! You'd get so many! (And merely partially-scout-y people can still be highly effective.)

(And that's not to mention how public opinion affects the environment these talented people have to navigate in)

Edit: I didn't read all the comments, jskatt else made the same point above. Seems like you mildly disagree. 🤷

What with the potential for this kind of bait-and-switch you're concerned about, for the relative degree of prioritization of longtermism or AI x-risk in years past in EA, that was a major problem for years.

There are some who might complain that leading organizations in EA over over-rating how much long-termist causes should be prioritized relative to near-termist ones, though that's not the same problem of leading organizations in EA majorly misrepresenting their own priorities (or those of the movement at large). (For what it's worth, in my opinion, the former problems of misleading marketing and easily avoidable communication errors have mostly been resolved.)

Given how much of a bait-and-switch there was for AI x-risk in the past in general, there is significant reason to suspect it will re-occur. However preferable it might be people are introduced to longtermism by The Precipice instead of WWOTF, the latter is a bestseller that will be read by more people. The Centre for Effective Altruism could start giving out way more free copies of The Precipice and encouraging everyone to read it first instead of WWOTF tomorrow. That could take years to work before more people than are introduced to longtermism by The Precipice than a bestseller like WWOTF.

Assuming it's a major enough problem Will needs to immediately change his mind about it or set the record straight, a better solution would be for Will or another scholar to publish a paper rectifying what was published in WWOTF, or even a post on the EA Forum as a reference. Another patch to the problem could be for Will to re-write the relevant sections of WWOTF for a 2nd edition (I don't know much about when 2nd editions are published like that for technical subject matter written for a popular readership, though I presume it can happen in as short a time as a couple years or less if the 1st edition sells out).

Thanks a lot for this post; it seems really valuable for you to share in general, and I really appreciate it for my own knowledge about AI risk because I know how good of a forecaster you are and respect your judgement.

I'm commenting to say that I really like your operationalization of what you mean when you say "I think a 3% chance of misaligned AI takeover this century is too low, with 90% confidence." It seems quite precise in the ways that matter to make an operationalization good.

I did have one uncertainty about what you meant by this part:

The people doing the reasoning can only deliberate about current evidence rather than acquire new evidence (by e.g. doing object-level work on AI to better understand AI timelines).

Does the "deliberate about current evidence" part includes thinking a lot about AI alignment to identify new arguments or considerations that other people on Earth may not have thought of, or would that count as new evidence?

It seems like if that would not count as new evidence, that the team you described might be able to come up with much better forecasts than we have today, and I'd think their final forecast would be more likely to end up much lower or much higher than e.g. your forecast. One consequence of this might then be be that your 90% confidence about MacAskill's misaligned AI takeover credence is too high, even if your 35% point estimate is reasonable.

Does the "deliberate about current evidence" part includes thinking a lot about AI alignment to identify new arguments or considerations that other people on Earth may not have thought of, or would that count as new evidence?

It seems like if that would not count as new evidence, that the team you described might be able to come up with much better forecasts than we have today, and I'd think their final forecast would be more likely to end up much lower or much higher than e.g. your forecast. One consequence of this might then be be that your 90% confidence about MacAskill's misaligned AI takeover credence is too high, even if your 35% point estimate is reasonable.

I added this line in response to Lukas pointing out that the researchers could just work on the agendas to get information. As I mentioned in a comment, the line between deliberating and identifying new evidence is fuzzy but I think it's better to add a rough clarification than nothing.

I intend identifying new arguments or considerations based on current evidence to be allowed, but I'm more skeptical than you that this would converge that much closer to 0% or 100%. I think there's a ton of effectively irreducible uncertainty in forecasting something as complex as whether misaligned AI will takeover this century.

Thanks for the response! This clarifies what I was wondering well:

I intend identifying new arguments or considerations based on current evidence to be allowed

I have some more thoughts regarding the following, but want to note up front that no response is necessary--I'm just sharing my thoughts out loud:

I'm more skeptical than you that this would converge that much closer to 0% or 100%. I think there's a ton of effectively irreducible uncertainty in forecasting something as complex as whether misaligned AI will takeover this century.

I agree there's a ton of irreducible uncertainty here, but... what's a way of putting it... I think there are lots of other strong forecasters who think this too, but might look at the evidence that humanity has today and come to a significantly different forecast than you.

Like who is to say that Nate Soares and Daniel Kokotajlo's forecasts are wrong? (Though actually it takes a smaller likelihood ratio for you to update to reach their forecasts than it does for you to reach MacAskill's forecast.) Presumably they've thought of some arguments and considerations that you haven't read or thought of before. I think it wouldn't surprise me if this team deliberating on humanity's current evidence for a thousand years would come across those arguments or considerations (or some other ones) in their process of logical induction (to use a term I learned from MIRI that roughly means updating without new evidence) and ultimately decide on a final forecast very different than yours as a result.

Perhaps another way of saying this is that your current forecast may be 35% not because that's the best forecast that can be made with humanity's current evidence, given the irreducible uncertainty in the world, but rather because you don't currently have all of humanity's current evidence. Perhaps your 35% is more reflective of your own ignorance than the actual amount of irreducible uncertainty in the world.

Reflecting a bit more, I'm realizing I should ask myself what I think is the appropriate level of confidence that 3% is too low. Thinking about it a bit more, 90% actually doesn't seem that high, even given what I just wrote above. I think my main reason for thinking it may be too high is that 1000 years is a long time for a team of 100 reasonable people to think about the evidence humanity currently has and I'd expect such a team to get a much better understanding of what the actual risk of misaligned AI takeover is than anyone alive today has, even without new evidence. And because I feel like we're in a state of relative ignorance about the risk still, it wouldn't surprise me if after the 1000 years they justifiably believed they could be much more confident one way or the other about the amount of risk.

On your second point on overestimating stagnation I also had a few issues:

Understating effect of AI (not AGI):

This section does not acknowledge that AI (narrow, not AGI or superintelligence) is likely going to be a significant productivity booster on innovation.

• AlphaFold is not AGI, won’t cause any catastrophes, but will likely contribute to the productivity of researchers in a variety of fields.
• AI enables better imaging and control, possibly allowing some breakthroughs in plasma control thus harnessing fusion, as well as breakthroughs in areas where traditional control is just not good enough.
• A language model focussed on formal mathematics might speed up all sorts of mathematical research.

It’s kind of pointless to wait for an “aligned AGI” to speed up scientific progress. We already have very very powerful specialised (thus aligned) tools for that.

economic growth vs development of technological capabilities

To a large extent, economic growth is driven by consumption, not necessarily technological or scientific progress that is useful for humanity. The type of innovation that drives the economy is about new flavours of candies, how to sell more candies, how to advertise more candies, How to build apps where candies can be advertised more.

Thus, when people say we might have to slow down economic growth as part of solving some problems, they typically mostly mean halting the pointless growth in some consumer product sales, not halting scientific progress leading to intellectual stagnation.

Great post, thanks for writing! I especially agree with the worry that the book will leave readers with a mistaken sense of longtermist priorities. I would also recommend The Precipice faster than WWOTF, unless people specifically ask for a book on longtermism.

I also want to say I appreciate your breakdown of your AI x-risk estimation so much. I've never seen a breakdown that easy to grasp before. I feel like I finally found a tool to make a better prediction of my own. Thanks!

I haven't read WWOTF yet (my copy came yesterday) but I wanted to share a blog post I wrote which is of relevance to your "balance of positive and negative value in current lives" section.

I write about how we can figure out what the zero level of wellbeing is and hypothesise that this level is much "better" than we intuitively think.

I suspect I agree that Will MacAskill is currently overestimating the value of human lives given what I have heard about his analysis.