Hide table of contents

I agree with David Manheim's post at a high level. I especially agree that a pause on large training runs is needed, that “We absolutely cannot delay responding”, and that we should be focusing on a pause mediated by “a multilateral agreement centered on countries and international corporations”. I also agree that if we can’t respond to the fire today, we should at least be moving fast to get a “sprinkler system”.

The basic reason we need a (long) pause, from my perspective, is that we are radically unprepared on a technical level for smarter-than-human AI. We have little notion of how to make such systems reliable or safe, and we’ll predictably have very little time to figure this out once smarter-than-human AI is here, before the technology proliferates and causes human extinction.

We need far, far more time to begin building up an alignment field and to develop less opaque approaches to AI, if we’re to have a realistic chance of surviving the transition to smarter-than-human AI systems.

My take on AI risk is similar to Eliezer Yudkowsky’s, as expressed in his piece in TIME and in the policy agenda he outlined. I think we should be placing more focus on the human extinction and disempowerment risks posed by AGI, and should be putting a heavy focus on the arguments for that position and the reasonably widespread extinction fears among ML professionals.

I have disagreements with some of the specific statements in the post, though in many cases I’m unsure of exactly what Manheim’s view is, so the disagreement might turn out to be non-substantive. In the interest of checking my understanding and laying out a few more of my views for discussion, I’ll respond to these below.[1]

 So the question of whether to stop and how to do so depends on the details of the proposal - but these seem absent from most of the discussion.

This is not apparent to me. I think it would take a pretty unusual proposal in order for me to prefer the status quo over it, assuming the proposal actually pauses progress toward smarter-than-human AI.

It’s important to get this right, and the details matter. But if a proposal would actually work then I’m not picky about the additional implementation details, because there’s an awful lot at stake, and “actually working” is already an extremely high bar.

An immediate, temporary pause isn’t currently possible to monitor, much less enforce, even if it were likely that some or most parties would agree.

A voluntary and temporary moratorium still seems like an obviously good idea to me; it just doesn't go far enough, on its own, to macroscopically increase our odds of surviving AGI. But small movements in the right direction are still worthwhile.

Similarly, a single company, or country announcing a unilateral halt to building advanced models is not credible without assurances,


“Not credible” sounds too strong here, though maybe I’m misunderstanding your claim. Scientists have voluntarily imposed restrictions on their own research in the past (e.g., Asilomar), and I don’t think this led to widespread deception. Countries have banned dangerous-but-profitable inventions without pursuing those inventions in secret.

I don't think it would be that hard for many companies or countries to convince me that they're not building advanced models. It might be hard for me to (for example) get to 95% confidence that DeepMind has suspended frontier AI development, merely on DeepMind's say-so; but 75% confidence seems fairly easy to me, if their say-so is concrete and detailed enough.

(Obviously some people will pursue such research in secret, somewhere in the world, given the opportunity. If we rely purely on organizations’ say-so, then eventually this will get us killed. If that’s all you mean by “not credible”, then I agree.)

and is likely both ineffective at addressing the broader race dynamics

If current industry leaders suspended operations, then this would “address the broader race dynamics” in the sense that it would be a very positive step in the right direction. This could potentially buy us years of additional time to develop and implement a global, internationally enforced pause.

It doesn't "address the broader race dynamics" in the sense of instantly saving the world, though. A few years (or even months) of delay could prove decisive, but if so its decisiveness will certainly hinge on whether the world uses that extra time to implement a pause.

and differentially advantages the least responsible actors.

To a first approximation, I don't think this matters. I don't think the future looks any brighter if the most responsible orgs develop AGI first than if the least responsible ones do.

The most responsible orgs might successfully avoid destroying the world themselves — while not being able to safely utilize AGI to address the proliferation of AGI tech.

But in that case they're not helping the world any more than they would have by just shutting down, which is a fool-proof way to not destroy the world yourself.

What Does a Moratorium Include?

There is at least widespread agreement on many things that aren’t and wouldn’t be included. Current systems aren’t going to be withdrawn - any ban would be targeted to systems more dangerous than those that exist.

“Targeted” maybe suggests more precision than may be possible. It’s very hard to predict in advance which systems will be existentially dangerous, and algorithmic progress means that a given compute threshold may be very-likely-safe today while being plausibly unsafe tomorrow.

Regarding rolling back current systems: I think some people at Conjecture have given arguments for rolling back GPT-4, on the basis that we don't yet know what scaffolding we can get out of GPT-4, nor what dangerous insights can be learned by gaining a better grasp of how GPT-4 works internally. This doesn’t seem important enough to me to make it a focus, but a rollback does seem like the kind of policy that would be adopted (or at least be under serious consideration) in a generally well-run world that was seriously grappling with the risk that a GPT-5 or a GPT-8 might get us all killed.

Regardless, if Conjecture staff’s views are relevant then it can’t be said that there’s full consensus here.

The thing I care more about is leaving open that it might be necessary to ban systems at the same scale as GPT-4 at a future date; we can expect algorithms to get more efficient in the future, and it’s hard to predict what will be technologically possible multiple years in the future, which is an argument for some conservatism (with everyone’s lives at risk).

We’re not talking about banning academic research using current models, and no ban would stop research to make future systems safer, assuming that the research itself does not involve building dangerous systems.

If we end up solving the alignment problem at all, then I expect some alignment research to eventually yield significant capabilities insights. (See Nate Soares’ If interpretability research goes well, it may get dangerous.)

On the current margin, I think it's net-positive to pursue the most promising alignment research moonshots; but in the long run we’d definitely need to be asking about how capabilities-synergistic different alignment research directions are, rather than giving a permanent free pass to all research that’s useful for alignment. And I think we should definitely be preparing for that now, rather than treating algorithmic progress as nonexistent or alignment and capabilities research as disjoint categories.

However, I don’t think there’s a concrete proposal to temporarily or permanently pause that I could support - we don’t have clear criteria, we don’t have buy-in from the actors that is needed to make this work, and we don’t have a reasonable way to monitor, much less enforce, any agreement.

As a rule, I don't think it's a good idea to withhold support for policies on the basis that they lack "buy-in" from others. The general policy "only support things once lots of others have publicly supported them" often prevents good ideas from beginning to gain traction, and locks Overton windows in place. I'd instead usually advise people to state their actual beliefs and their rough preference ordering over policy options (including unrealistic-but-great ones). Then we can talk about feasibility and compromise from a standpoint of understanding everyone's actual views.

Part of why I recommend this is that I think any policy that prevents human extinction will need to be pretty extreme and novel. If we limit ourselves to what's obviously politically feasible today, then I think we're preemptively choosing death; we need to take some risks and get more ambitious in order to have any shot at all.

(This is not to say that all small incremental progress is useless, or that everything needs to happen overnight. But a major part of how smaller marginal progress gets parlayed into sufficient progress is via individuals continuously discussing what they think is needed even though it's currently outside the Overton window, throughout the process of iterating and building on past successes.)

Yes, companies could voluntarily pause AI development for 6 months, which could be a valuable signal.

It could also slow progress toward smarter-than-human AI for some number of months, which is useful in its own right. Time is needed to implement effective policy responses, and even more time would be needed to find a solution to the alignment problem.

(Or would be so if we didn’t think it would be a smokescreen for 'keep doing everything and delay releases slightly.')"

It sounds like you're much more cynical about this than I am? I'd be very happy to hear concrete commitments from ML organizations to pause development, and I think they should be encouraged to do so, even though it's not sufficient on its own.

Lying happens, but I don't think it's universal, especially when it would require a conspiracy between large numbers of people to cover up a very clear and concrete public lie. (Obviously if the stated commitment is very vague or clearly insufficient, then that’s another story.)

And acting too soon is costly

Acting too late destroys all of the value in the future. Is there a commensurate cost to acting too quickly? (I’ll assume for now that you don't think there is one, and are just acknowledging nonzero cost.)

Just like a fire in the basement won’t yet burn people in the attic, AI that exists today does not pose immediate existential risks[2] to humanity - but it’s doing significant damage already, and if you ignore the growing risks, further damage quickly becomes unavoidable.

This seems like a weak case for acting now, since it's vulnerable to the obvious response "AI today is doing significant damage, but also producing significant benefits, which very likely outweigh the damage."

The real reason to act now is that future systems will likely disempower and kill humanity, we don't know the threshold at which that will happen (but there's a fair bit of probability on e.g. 'the next 5 years', and quite a lot on 'the next 15 years'), and it may take years of work to develop and implement an adequate policy response.


This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.

  1. ^

     Thanks to Nate Soares for reviewing this post and giving some feedback.

Sorted by Click to highlight new comments since:

I find myself agreeing with Nora on temporary pauses - and I don't really understand the model by which a 6-month, or a 2-year, pause helps, unless you think we're less than 6 months, or 2-years, from doom. 

First, my perception is that progress in AI so far has been combining various advances into very large models. If the companies are working on other things in ML during those 6 months, this creates algorithmic overhang as soon as they put them together into larger models. This is separate from hardware overhang, which I think is even more concerning.

Second, there are lots of parts to putting these models together. If companies are confident the pause is 6 months long, they continue to build infrastructure and curate datasets. (This is in contrast to the status if we're building comprehensive regulatory oversight, where training the planned future large models may not be approved, and further capital investments might be wasted.)

After writing this, another model occurs to me that someone might think makes a short pause useful - if we are playing a PR game, and think that the sudden advances after the pause will galvanize the public into worrying more. (But I don't think that you're proposing playing clever-sounding but fragile strategic moves, and think this type of pause would obviously be worse than trying for a useful governance regime.)

Edit: If the "(long) pause" you're suggesting is actually an indefinite moratorium on larger models, I think we're agreeing - but I think we need to build a global governance regime to make that happen, as I laid out.

I find myself agreeing with Nora on temporary pauses - and I don't really understand the model by which a 6-month, or a 2-year, pause helps, unless you think we're less than 6 months, or 2-years, from doom. 

This doesn't make a lot of sense to me. If we're 3 years away from doom, I should oppose a 2-year pause because of the risk that (a) it might not work and (b) it will make progress more discontinuous?

In real life, if smarter-than-human AI is coming that soon then we're almost certainly dead. More discontinuity implies more alignment difficulty, but on three-year timelines we have no prospect of figuring out alignment either way; realistically, it doesn't matter whether the curve we're looking at is continuous vs. discontinuous when the absolute amount of calendar time to solve all of alignment for superhuman AI systems is 3 years, starting from today.

I don't think "figure out how to get a god to do exactly what you want, using the standard current ML toolbox, under extreme time pressure" is a workable path to humanity surviving the transition to AGI. "Governments instituting and enforcing a global multi-decade pause, giving us time to get our ducks in order" does strike me as a workable path to surviving, and it seems fine to marginally increase the intractability of unworkable plans in exchange for marginally increasing the viability of plans that might work.

If a "2-year" pause really only buys you six months, then that's still six months more time to try to get governments to step in.

If a "2-year" pause buys you zero time in expectation, and doesn't help establish precedents like "now that at least one pause has occurred, more ambitious pauses are in the Overton window and have some chance of occurring", then sure, 2-year moratoriums are useless; but I don't buy that at all.

(Edit to add: The below is operating entirely in the world where we don't get an indefinite moratorium initially. I strongly agree about the preferability of an idefinite governance regime, though I think that during a multi-year pause with review mechanisms we'll get additional evidence that either safety is possible, and find a path, or conclude that we need a much longer time, or it's not possible at all.)

If you grant that a pause increases danger by reducing the ability of society and safety researchers to respond, and you don't think doom is very, very likely even with extreme effort, then it's reasonable that we would prefer, say, a 50% probability of success controlling AI given 3 years over a 10% probability of success given a 2-year pause then only 18 months. Of course, if you're 99.95% sure that we're doomed given 3 years, it makes sense to me that the extra 6 months of survival moving the probability to 99.99% would seem more worth it. But I don't understand how anyone gets that degree of confidence making predictions. (Superforecasters who have really fantastic predictive accuracy and calibration tend to laugh at claims like that.)

That said, I strongly agree that this isn't an acceptable bet to make. We should not let anyone play Russian roulette with all of humanity, and even if you think it's only a 0.05% probability of doom (again, people seem very obviously overconfident about their guesses about the future,) that seems like a reason to insist that other people get to check your work in saying the system is safe.

Finally, I don't think that you can buy time to get governments to step in quite the way you're suggesting, after a pause. That is, if we get a pause that then expires, we are going to need tons of marginal evidence after that point to get an even stronger response, even once it's in the Overton window. But the evidence we'd need is presumably not showing up, or not showing up as quickly, because there isn't as much progress. So either the pause is extended indefinitely without further evidence, or we'll see a capabilities jump, and that increases risks.

And once we see the capabilities jump after a pause expires, it seems plausible that any stronger response will be far too slow. (It might be OK, they might re-implement the pause, but I don't see a reason for confidence in their ability or willingness to do so.) And in general, unless there are already plans in place that they can just execute, governments react on timescales measured in years.

(Note for everyone reading that all of this assumes, as I do, that the risk is large and will become more obvious as time progresses and we see capabilities continue to outpace reliable safety. If safety gets solved during a pause, I guess it was worth it, or maybe even unnecessary. But I'm incredibly skeptical.)

Acting too late destroys all of the value in the future. Is there a commensurate cost to acting too quickly? (I’ll assume for now that you don't think there is one, and are just acknowledging nonzero cost.)

I don't think there is.

For the sake of having a comprehensive list of arguments (at least ones above some quality threshold), I have seen reasonable people consider (though not necessarily strongly defend) the following view: Alignment research is bottlenecked by our still low level of AI capabilities, so there's value in pausing at the right moment

Of course, this just shifts the question to "when is the right moment?" I think it's way more plausible that the answer is "a year ago" rather than "not yet."

I recently wrote a document on why I think slowing down now is very unlikely to be too early, where I try to address some of the counterarguments. I'm linking it here, but please note that I didn't spend much time on it, so the points are a bit disorganized.

So I think the error here is disagreements over:

  1. Is AI as useful as a nuclear weapon. If you think AI is so dangerous it will immediately turn on you and be useless, then it's not like a nuclear weapon that only detonates when it was designed to.

If you think you can make AI systems using variations on current tech that are as useful as nuclear weapons then:

  1. Has a technology even 1 percent as useful as a nuclear weapon ever been successfully "paused". You might bring up gunpowder and China so then the followup question would be "in the last 150 years".

If your answers are:

  1. Yes it's as useful as a nuclear weapon because an AI can be like adding billion of unpaid new workers to an economy

  2. You are not guaranteed doom by using myopia to contain each instance in (1). It is a more dangerous world because the tools will be available to build non myopic systems.

  3. No. The few technologies ever successfully paused barely work or have almost as good alternatives. For instance CFCs have alternatives, nuclear power has alternatives that are cheaper, genetic engineering of humans takes too long for results to be useful, nerve gas is not as good as nukes, bioweapons are uncontrollable and infect everyone or are not as good as nukes.

I personally think from what I know now, the above statement are true. What am I ignorant about? How am I wrong? Please tell me.

Because..if the above are true, then as morally correct as a pause might be, it wouldn't happen unless the world is also one able to abolish nuclear weapons. Which is also morally correct but generally thought to be impossible.

I have read this list. Which element on the list is 1 percent as powerful as a nuclear weapon and not a capability we have a substitute for?

I already responded on nuclear power.

Geoengineering : there is a substantial recent shift on this as climate change has proven to be real and emissions cuts slow. There is a substantial chance it will be done and "unpaused".

Nanotechnology probably has agi as a required precursor tech.

Vaccine challenge trials extend the lives of elderly citizens, this is not "disempower your rivals" level of power.

Airships are not restricted in any way they don't work. Helium shortages, high loss rate to wind, low payload, uncompetitive vs alternatives like trains, trucks, ships. Why is this on the list?

This list is a very weak argument. Each element is not a temptation because each element either doesn't pencil in so there is no pressure to develop it, or is in fact in use.

By "pencil in" I mean create economic value vs an alternative. Current llms can do 20 minutes of work you might pay someone $50/hour for in 10 seconds for 5 cents.

That page lists the value of vaccine challenge trials as $10^12-10^13, which is substantially more than the market capitalization of the three big AI labs combined. 

(I think there is a decent case that society is undervaluing those companies, but the relevant question seems to be their actual valuation, not what value they theoretically should have. I feel fairly confident that if you asked the average American whether they would prefer to have had a vaccine for COVID one year earlier versus GPT 3 one year earlier, they would prefer the vaccine.)

I don't disagree with your valuations viewed from a gods eye view.

But you need to look at where the value flow is. An AI company can sell $20 worth of human labor for $1 and keep that $1. They can probably sell for more than that as stronger models are natural monopolies.

A vaccine company doesn't get to charge the real value for the vaccine.

In addition like I said, there's the arms race mechanic. If country A develops vaccines with challenge trials and country B does not, and assume B can't just buy vaccine access, country A has a slightly older population, more expenditures - it's not a gain to the government.

If country A has AGI, and by AGI I mean a system that does what it's ordered to do at approximately human level across a wide range of tasks, it can depose the government of country B who will be helpless to stop it.

Or just buy all the assets of the country and bribe the governments politicians to force a merge.

Having 10 or 1000 times the resources allows many options. It's not remotely an opportunity someone could forgo.

Ben I'm sorry but your argument is not defensible. Your examples are a joke. Many of them shouldn't even be in the list as they provide zero support for the argument.

This situation looks very much like a nuclear arms race. You win one not by asking your rivals to stop building nukes or to please make them safe but by letting your rival have nuclear accidents and by being ready with your own nukes to attack them.

Same with an AI race. You win by being ready with AI driven weapons to take the offensive on your rivals and any rogue AI they let escape.

I acknowledge that if you had actual empirical evidence that a nuke if used would destroy the planet, which some people claim an AGI willl have these properties, thats a different situation. But you need evidence. Specifically the reason a nuke won't destroy the planet is atmospheric gas is not very fusable and is low pressure. In the AGI case there needs to be similar conditions - there have to be enough insecure computers on the planet for the AGI to occupy, enough insecure financial assets or robotics for the agi to manipulate the world, or intelligence - which itself needs massive amounts of compute - needs to be so useful at high levels that the AGI can substitute for some inputs.

You need evidence for this. Otherwise we can't do anything at all as a species in fear something we do might end us. Anyone could make up any plausible sounding doomsday scenario they like, and convince others, and we would as a species be paralyzed in fear.

In the AGI case there needs to be similar conditions - there have to be enough insecure computers on the planet for the AGI to occupy, enough insecure financial assets or robotics for the agi to manipulate the world

All of these seem true, with the exception that robots aren't needed - there are already plenty of humans (the majority?) that can be manipulated with GPT-4-level generated text.

or intelligence - which itself needs massive amounts of compute - needs to be so useful at high levels that the AGI can substitute for some inputs.

The AI can gain access to the massive amounts of compute via the insecure computers and insecure financial resources.

You need evidence for this.

There are already plenty of sound theoretical arguments and some evidence for things like specification gaming, goal misgeneralisation and deception in AI models. How do you propose we get sufficient empirical evidence for AI takeover short of an actual AI takeover or global catastrophe?

actual empirical evidence that a nuke if used would destroy the planet

How would you get this short of destroying the planet? The Trinity test went ahead based on theoretical calculations showing that it couldn't happen, but arguably nowhere near enough of them, given the stakes! 

But with AGI, half of the top scientists think there's a 10% chance it will destroy the world! I don't think the Trinity test would've gone ahead in similar circumstances.


Ben I'm sorry but your argument is not defensible. Your examples are a joke. Many of them shouldn't even be in the list as they provide zero support for the argument.

Downvoted your comment for it's hostility and tone. This isn't X (Twitter).

It's the same reason you couldn't blow up the atmosphere. If you need several trillion weights for human level intelligence and all modalities, or at least 10 percent of the memory in a human brain, and you need to send the partial tensors between cards (I work on accelerator software presently), nobody not an AI lab has enough hardware. Distributed computers separated by Internet links are useless.

It is possible that Moore's law, if it were to continue approximately 30 more years, could lead to the hardware being common, but that has not happened yet.

This may not be X but I reasoned the information given as evidence was fraudulent. Ben may be well meaning but Ben is trying to disprove basic primate decision making that allowed humans to reach this point with false examples. It's an extraordinary claim. (By basic reasoning I mean essentially primates choosing between multiple clubs available to them the best performing weapon)

Curated and popular this week
Relevant opportunities