We Need Breadth-First AI Safety Plans

MichaelDickens

We Need Breadth-First AI Safety Plans

MichaelDickens

4 min readJun 1

Comments 1

Sorted by

New & upvoted

SummaryBot

1mo

Executive summary: The author argues that AI safety planning is dangerously over-reliant on long chains of conjunctive conditions, and calls for "breadth-first" plans that maintain multiple independent paths to success so that the overall effort survives even when individual assumptions fail.

Key points:

"Depth-first" AI safety plans fail entirely if any single condition in their chain is false, and the author counts at least eight such conditions in Google's April 2025 safety plan alone.
The author argues that disjunctive conditions (where success requires A or B or C) are preferable to conjunctive ones, because fewer simultaneous assumptions need to hold.
A "breadth-first" plan instead pursues multiple actions X, Y, and Z, each depending on different conditions, so the overall plan can succeed even if two out of three conditions fail.
The author identifies Barnett & Scher's AI Governance to Avoid Extinction as the broadest published plan, noting it explicitly maps four possible future scenarios and the conditions required for success in each.
The author sees two main benefits to breadth-first planning: identifying which paths to success depend on the fewest conditions, and making it easier to spot the biggest holes in a plan.
The author calls on AI companies to publish breadth-first plans addressing what they will do if a step in their mainline plan fails, and on governments to legislate that companies cover a defined list of possible future scenarios.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Comments

More from the author

A frontier AI company should shut down

MichaelDickens·1mo ago·3m read

Worlds where we solve AI alignment on purpose don't look like the world we live in

MichaelDickens·3mo ago·6m read

The Future Will Be Weirder Than That

MichaelDickens·3mo ago·8m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·4d ago·Curated 21h ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

151

Let's taboo the V-word

lincolnq·4d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·1d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

EA Organisation Updates thread: July 2026

Dane Valerie·3d ago·1m read

Help us launch AI safety university groups by referring potential founders

Jason Chin🔸·10h ago·4m read

Save the date: Swiss AI Safety Days 2026 (7-8 November, ETH Zurich)

Andre Santos 🔸, patrickwidmann, mariuswenk·12h ago·1m read

My rough attempt at categorizing plans

I made a quick flowchart to categorize AI safety plans at a high level.

A blue circle indicates an action

A blue square indicates an outcome

A red hexagon indicates a necessary condition to achieve an outcome

A red pentagon indicates a condition that is helpful but not necessary

The idea is that we need a broad set of overlapping plans such that some plan will work, even if many conditions (red nodes) turn out to be false.

(Click here to see the full-size image.)

Is this flowchart comprehensive? Definitely not. Is it even accurate? Maybe. My point is that, to make AI safe, we need multiple plans that cover all the ways the other plans could go wrong, and this flowchart is a quick attempt at representing some of those plans.

Future work I'd like to see

AI companies should publish breadth-first plans. What will they do if a step in their mainline plan fails?

Governments should pass legislation requiring AI companies to have plans that cover every item on a list of possible future scenarios.

For example, mandate that companies have different plans for different takeoff speeds.
AI safety researchers should do research to inform what future scenarios need to be covered.

I originally wrote this article shortly after April 2025, but I procrastinated for a year on finishing it, so I'm not sure about the current state of AI companies' plans. ↩︎
I am skeptical that a bootstrapped-aligned AI will behave morally in ways in which most humans do not behave morally, e.g. eating factory-farmed animals; or that it will be able to correctly resolve the internal inconsistencies in common-sense ethics. For example, in the mere addition paradox, most people accept a set of premises but reject the conclusion that necessarily follows from those premises.^[4] ↩︎
Technically, what we want isn't paths that depend on few conditions. We want paths where the joint probability of every condition is as high as possible. But generally speaking, fewer conditions means the probability of success is higher. ↩︎
Philosophy Experiments' Philosophical Health Check asks you a series of questions and purports to identify inconsistencies in your beliefs. I think the questions leave some wiggle room to argue that supposed inconsistencies aren't truly inconsistent, but a more rigorous test would be harder to construct. ↩︎

We Need Breadth-First AI Safety Plans

We Need Breadth-First AI Safety Plans

How breadth-first plans can inform what we do

Root-level breadth matters most

My rough attempt at categorizing plans

Future work I'd like to see