Slowing AI is many-dimensional. This post presents variables for determining whether a particular kind of slowing improves safety. Then it applies those variables to evaluate some often-discussed scenarios.
- Time until critical systems are deployed. More time seems good for alignment research, governance, and demonstrating risks of powerful AI.
- Length of crunch time. In this post, "crunch time" means the time near critical systems before they are deployed. More time until critical systems are deployed is good; more such time near critical systems is especially good. A lab is more likely to (be able to) pay an alignment tax for a critical system if it has more time to pay the tax for that system. Time near critical systems also seems especially good for alignment research and potentially for demonstrating risks of powerful AI and doing governance.
- Safety level of labs that develop critical systems. This can be improved both by making labs safer and by differentially slowing unsafe labs.
- Propensity to coordinate or avoid racing. This is associated with many factors, but plausible factors relevant to slowing AI seem to be there are few leading labs, they like/trust each other, and they are all in the same country (or at least allied countries) (in part because regulation is one possible cause of not-racing).
One lab's progress, especially on the frontier, tends to boost other labs. Labs leak their research both intentionally (publishing research and deploying models) and unintentionally.
Some interventions would differentially slow relatively safe labs (relevant to 3). Some interventions (especially policies that put a ceiling on AI capabilities or inputs) would differentially slow leading labs (relevant to 4). Both outcomes are worse than uniform slowing and potentially net-negative.
If something slows progress temporarily, after it ends progress may gradually partially catch up to the pre-slowing trend, such that powerful AI is delayed but crunch time is shortened (relevant to 1 and 2).
Coordination may facilitate more coordination later (relevant to 4).
Current leading labs (Google DeepMind, OpenAI, and maybe Anthropic) seem luckily safety-conscious (relevant to 3). Current leading labs seem luckily concentrated in America (relevant to 4).
Some endogeneities in AI progress may give rise to considerations about the timing of slowing. For example, the speed at which the supply of (ML training) compute responds to (expected) demand determines the effect of slowing soon on future supply. Or perhaps slowing affects the distribution of talent between dangerous AI paths, safe AI paths, and non-AI stuff. Additionally, some kinds of slowing increase or decrease the probability of similar slowing later.
Magic uniform slowing of all dangerous AI: great. This delays dangerous AI and lengthens crunch time. It has negligible downside.
A leading safety-conscious lab slows now, unilaterally: bad. This delays dangerous AI slightly. But it makes the lab irrelevant, thus making the labs that develop critical systems less safe and making the lab unable to extend crunch time by staying at the frontier for now and slowing later.
All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.
All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs' lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs' safety practices would be irrelevant).
Strong global treaty: great. A strong global agreement to stop dangerous AI, with good operationalization of 'dangerous AI' and strong verification, would seem to stop labs from acting unsafely and thus eliminate AI risk. The downside is the risk of the treaty collapsing and progress being faster and distributed among more labs and jurisdictions than otherwise.
Strong US regulation: good. Like "strong global treaty," this stops labs from acting unsafely—but not in all jurisdictions. Insofar as this differentially slows US AI progress, it could eventually cause AI progress to be driven by labs outside the regulation's reach. If so, the regulation—and the labs it slowed—would cease to be relevant, and it would likely have been net-negative: it would cause critical systems to be created by labs other than the relatively-safety-conscious currently-leading ones and cause leading labs to be more globally diffuse.
US moratorium now: bad. A short moratorium (unless succeeded by a strong policy regime) would slightly delay dangerous AI on net, but also cause progress to be faster for a while after it ends (when AI is stronger and so time is more important), increase the number of leading labs (especially by adding leading labs outside the US), and result in less-safe leading labs (because current leading labs are relatively safety-conscious). A long moratorium would delay dangerous AI, but like in "strong US regulation" the frontier of AI progress would eventually be surpassed by labs outside the moratorium's reach.
Which scenarios are realistic; what interventions are tractable? These questions are vital for determining optimal actions, but I will not consider them here.
Thanks to Rose Hadshar, Harlan Stewart, and David Manheim for comments on a draft.
This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.
That is, slowing progress toward dangerous AI, or AI that would cause an existential catastrophe. Many kinds of AI seem safe, such as vision, robotics, image generation, medical imaging, narrow game-playing, and prosaic data analysis—maybe everything except large language models, some bio/chem stuff, and some reinforcement learning. Note that in this post, I assume that AI safety is sufficiently hard that marginal changes in my variables are very important.
This post is written from the perspective that powerful AI will eventually appear and AI safety is mostly about increasing the probability that it will be aligned. Note that insofar as other threats arise before powerful AI or intermediate AI systems pose threats, it's better for powerful AI to arrive faster—but I ignore this here.
See my Slowing AI: Foundations for more.
In this post, a critical system is one whose deployment would cause an existential catastrophe if misaligned or be able to execute a pivotal act if aligned. This concept is a simplification: capabilities that could cause catastrophe are not identical to capabilities that could execute a pivotal act, 'cause catastrophe' and 'execute a pivotal act' depend on not just the system but also the world, 'catastrophe or not' and 'pivotal act or not' aren't really binary, and deployment is not binary. Nevertheless, it is a useful concept.
This concept is a simplification insofar as "near critical systems" is not binary. Separately, note that some interventions could lengthen total time to critical systems but reduce crunch time or vice versa. For example, slowing now in a way that causes progress to partially catch up to the old trend later would lengthen total time but reduce crunch time.
Separately, I believe we are not currently in crunch time. I expect we will be able to predict crunch time decently well (say) a year in advance by noticing AI systems' near-dangerous capabilities.
This concept is a simplification: non-lab actors may be central to safety, especially the creators of tools/plugins/scaffolding/apps to integrate with ML models.
The other variables are implicitly by default, without much coordination.
See my Cruxes for overhang.
Coordination seems easier if leading labs are concentrated in a single state, in part because it can be caused by regulation. (Additionally, the AI safety community has relatively more influence over government in the US, so US regulatory effectiveness and thus US lead is good, all else equal.)
Observations about current leads are relevant insofar as (1) those leads will be sustained over time and (2) dangerous AI is sufficiently close that current leaders are likely to be leaders in crunch time by default.
On the risk of differentially slowing US labs, see my Cruxes on US lead for some domestic AI regulation.
Or in terms of the above variables, a strong global treaty would delay dangerous AI, cause labs to be safer, and (insofar as it discriminates between safe and unsafe labs) differentially slow unsafe labs.
I imagine "strong global treaty" and "strong US regulation" as including miscellaneous safety standards/regulations but focusing on oversight of large training runs, enforcing a ceiling on training compute and/or doing model evals during large training runs and stopping runs that fail an eval until the lab can ensure the model is safe.
Labs outside US regulation's reach could eventually dominate AI progress due to some combination of the following (overlapping):
- The US fails to get a large coalition to join it
- Labs in coalition states can effectively move to non-coalition states to escape the regulation
- Labs in non-coalition states can quickly catch up to the frontier given slowed progress in the coalition
- Coalition export controls fail to deny compute to labs in non-coalition states
- Other attempted extraterritorialization of the regulation fails
- (Also just there being a substantial tradeoff between speed and (legible) safety, such that the regulation substantially slows the labs it affects)
- (Also just powerful AI being far off, such that outside labs have longer to catch up to the slowed coalition labs)