Hide table of contents

Crossposted to LessWrong.

TL/DR: We developed an interactive guide to AI safety arguments, based on researcher interviews. Go check it out! Please leave a comment and let us know what you think.

Introduction

Vael Gates interviewed 97 AI researchers on their beliefs about the future. These interviews were quite broad, covering researchers’ hopes and concerns for the field in general, as well as advanced AI safety specifically. Full transcripts are now available.

Lukas Trötzmüller interviewed 22 EAs about their opinions on AI safety. They were pre-selected to be skeptical either about the classical “existential risk from AI” arguments, or about the importance of work on AI safety. The focus of this research was on distilling specific arguments and organizing them.

This guide builds mostly on Vael’s conversations, and it aims to replicate the interview experience. The goal is not necessarily to convince, but to outline the most common arguments and counterarguments, and help readers gain a deeper understanding of their own perspective.

Design Goals

Our previous research has uncovered a wide range of arguments that people hold about AI safety. We wanted to build a resource that talks about the most frequently mentioned of those.

Instead of a linear article (which would be quite long), we wanted to create an interactive format. As someone goes through the guide, they should be presented with the content that is most relevant to them.

Even though we had a clear target audience of AI researchers in mind, the text turned out to be surprisingly accessible to a general audience. This is because most of the classical AI risk arguments do not require in-depth knowledge of present-day AI research.

Format

Our guide consists of a collection of short articles that are linked together. There are five main chapters:

  1. When will Generally Capable AI Systems be developed?
  2. The Alignment Problem (“AI systems don’t always do what we want”)
  3. Instrumental Incentives
  4. Threat Models (“how might AI be dangerous?”)
  5. Pursuing Safety Work

Each chapter begins with an argument, after which the reader is asked for their agreement or disagreement. If they disagree, they can select between several objections that they may have.

Each one of these objections links to a separate article, presenting possible counterarguments, that they can optionally read. Most of the objections and counterarguments are directly taken from Vael’s interviews with AI researchers.

After reading the counterargument, the reader can indicate whether they find it plausible, then is guided back to the introduction of the chapter. The reader may advance to the next main chapter at any time.

The following diagram illustrates this structure:

Agreement and disagreement is shown visually in the table of contents.

The “Threat Models” chapter is meant as a short interlude and does not present any counterarguments - we might expand upon that in the future.

Polling and Commenting

It is also possible to leave comments on individual pages. These are displayed publicly at the end of the guide.

On the last page, you can also see a visual summary of your responses, and how they compare to the average visitor:

Requesting Feedback

We are releasing this within the EA and alignment communities. We would like to gather additional feedback before presenting it to a wider audience. If you have feedback or suggestions, please leave a comment below. We welcome feedback on the structure as well as the language and argumentation.

Creating Interactive Guides for Other EA Cause Areas

Our goal was to enable anyone to put complex arguments into an interactive format - without requiring experience in web development. The guide is written in a Google Doc. It contains all the pages separated by headlines, and some special code for defining the structure. Our system converts this document into an interactive website, and updates can be made through the document.

Nothing about the interactive system we developed is special to AI safety. This could be used for other purposes - for example: an introduction to longtermism, the case for bio security, or explaining ethical arguments. If you would like to use this for your project, please get in touch with Lukas.

Related Projects

The Stampy project aims to provide a comprehensive AI safety FAQ. We have given the Stampy team permission to re-use our material as they see fit.

Conclusion & Downside Risk

If you haven’t opened the guide yet, go ahead and check it out. We are really interested in your comments. How is the language and the argumentation? Are we missing important arguments? Could we make this easier to use or improve the design? Would you actually recommend this as a resource to people, if not why?

Looking at the result of our work, we notice positives and negatives.

Vael likes that the content is pretty clear and comprehensive.

Lukas likes the visual presentation and the overall look & feel. However, he has some reservations about the level of rigour in the argumentation - there are definitely parts that could be made more solid.

We both like the interactive format. We are unsure whether this is the best way to talk to people, from a fieldbuilding perspective. The reason is this: Even though the guide is interactive, it is not a replacement for a real conversation. People only have a limited number of options to choose from, and then they get lots of text trying to counter their arguments. Indeed, we wonder if this might create resistance in some readers, and if the downside risks might be worse than the upsides.

Contributions

The guide was written by Lukas Trötzmüller, with guidance and additional writing from Vael Gates.

Technical implementation by Michael Keenan and Lukas Trötzmüller.

Copy Editing: David Spearman, Stephen Thomas.

We would like to thank everyone who gave feedback.

This work was funded by the AI Safety Field Building Hub.

Comments5


Sorted by Click to highlight new comments since:

Thanks for this! I liked it and found it helpful for understanding the key arguments for AI risk.

It also felt more engaging than other presentations of those arguments because it is interactive and comparative.

I think that the user experience could be improved a little but that it's probably not worth making those improvements until you have a larger number of users.

One change you could make now is to mention the number of people who have completed the tool (maybe on the first page) and also change the outputs on the conclusion page to percentages.

How do you imagine using this tool in the future? Like what are some user stories (e.g., person x wants to do y, so they use this)?

Here are some quick (possibly bad) ideas I have for potential uses (ideally after more testing):

  • As something that advocates like Robert Miles can refer relevant people to
  • As part of a longitudinal study where a panel of say 100 randomly selected AI safety researchers do this annually, and you report on changes in their responses over time.
  • Using a similar approach/structure, with new sections and arguments, to assess levels of agreement and disagreement with different AI safety research agendas within the AI Safety community and to identify the cruxes
  • As a program that new AI Safety researchers, engineers and movement builders do to understand the relevant arguments and counterarguments.

I also like the idea of people making something like this for other cause areas and appreciate the effort invested to make that easy to do.

I tried to comment on the page https://ai-risk-discussions.org/perspectives/test-before-deploying, but instead got an error message telling me to use the contact mail.

Thanks for the bug report, checking into it now. 

Update: Michael Keenan reports it is now fixed!

Curated and popular this week
 ·  · 5m read
 · 
This work has come out of my Undergraduate dissertation. I haven't shared or discussed these results much before putting this up.  Message me if you'd like the code :) Edit: 16th April. After helpful comments, especially from Geoffrey, I now believe this method only identifies shifts in the happiness scale (not stretches). Have edited to make this clearer. TLDR * Life satisfaction (LS) appears flat over time, despite massive economic growth — the “Easterlin Paradox.” * Some argue that happiness is rising, but we’re reporting it more conservatively — a phenomenon called rescaling. * I test rescaling using long-run German panel data, looking at whether the association between reported happiness and three “get-me-out-of-here” actions (divorce, job resignation, and hospitalisation) changes over time. * If people are getting happier (and rescaling is occuring) the probability of these actions should become less linked to reported LS — but they don’t. * I find little evidence of rescaling. We should probably take self-reported happiness scores at face value. 1. Background: The Happiness Paradox Humans today live longer, richer, and healthier lives in history — yet we seem no seem for it. Self-reported life satisfaction (LS), usually measured on a 0–10 scale, has remained remarkably flatover the last few decades, even in countries like Germany, the UK, China, and India that have experienced huge GDP growth. As Michael Plant has written, the empirical evidence for this is fairly strong. This is the Easterlin Paradox. It is a paradox, because at a point in time, income is strongly linked to happiness, as I've written on the forum before. This should feel uncomfortable for anyone who believes that economic progress should make lives better — including (me) and others in the EA/Progress Studies worlds. Assuming agree on the empirical facts (i.e., self-reported happiness isn't increasing), there are a few potential explanations: * Hedonic adaptation: as life gets
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal