Note: This post was crossposted from Planned Obsolescence by the Forum team, with the author's permission. The author may not see or respond to comments on this post.

The single most important thing we can do is to pause when the next model we train would be powerful enough to obsolete humans entirely. If it were up to me, I would slow down AI development starting now — and then later slow down even more.

Many of the people building powerful AI systems think they’ll stumble on an AI system that forever changes our world fairly soon — three years, five years. I think they’re reasonably likely to be wrong about that, but I’m not sure they’re wrong about that. If we give them fifteen or twenty years, I start to suspect that they are entirely right.

And while I think that the enormous, terrifying challenges of making AI go well are very much solvable, it feels very possible, to me, that we won’t solve them in time.

It’s hard to overstate how much we have to gain from getting this right. It’s also hard to overstate how much we have to lose from getting it wrong. When I’m feeling optimistic about having grandchildren, I imagine that our grandchildren will look back in horror at how recklessly we endangered everyone in the world. And I’m much much more optimistic that humanity will figure this whole situation out in the end if we have twenty years than I am if we have five.

There’s all kinds of AI research being done — at labs, in academia, at nonprofits, and in a distributed fashion all across the internet — that’s so diffuse and varied that it would be hard to ‘slow down’ by fiat. But there’s one kind of AI research — training much larger, much more powerful language models — that it might make sense to try to slow down. If we could agree to hold off on training ever more powerful new models, we might buy more time to do AI alignment research on the models we have. This extra research could make it less likely that misaligned AI eventually seizes control from humans.

An open letter released on Wednesday, with signatures from Elon Musk[1], Apple co-founder Steve Wozniak, leading AI researcher Yoshua Bengio, and many other prominent figures, called for a six-month moratorium on training bigger, more dangerous ML models:

We call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in and institute a moratorium.

I tend to think that we are developing and releasing AI systems much faster and much more carelessly than is in our interests. And from talking to people in Silicon Valley and policymakers in DC, I think efforts to change that are rapidly gaining traction. “We should slow down AI capabilities progress” is a much more mainstream view than it was six months ago, and to me that seems like great news.

In my ideal world, we absolutely would be pausing after the release of GPT-4. People have been speculating about the alignment problem for decades, but this moment is an obvious golden age for alignment work. We finally have models powerful enough to do useful empirical work on understanding them, changing their behavior, evaluating their capabilities, noticing when they’re being deceptive or manipulative, and so on. There are so many open questions in alignment that I expect we can make a lot of progress on in five years, with the benefit of what we’ve learned from existing models. We’d be in a much better position if we could collectively slow down to give ourselves more time to do this work, and I hope we find a way to do that intelligently and effectively. As I’ve said above, I think the stakes are unfathomable, and it’s exciting to see the early stages of coordination to change our current, unwise trajectory.

On the letter itself, though, I have a bunch of uncertainties around whether a six month pause right now would actually help. (I suspect many of the letter-signatories share these uncertainties, and I don’t have strong opinions about the wisdom of signing it). Here are some of my worries:

  • Is it better to ask for evaluations rather than a pause? Personally, I think labs should sign on to ongoing commitments to subject each new generation of model to a third-party dangerous capabilities audit. I’m much more excited about requiring audits and oversight before training dangerous models than about asking for ‘pauses’, which are hard to enforce.

  • Is the ask too small? I think I and the letter signers would generally agree that the ideal thing for society to do right now is something more continuous and iterative (and ultimately more ambitious) than a one-time six month pause at this stage. That means one big question is whether this opens the door to those larger efforts, or muddies the waters. Do steps that are in the right direction, but not sufficient, help us collectively produce common knowledge of the problem and build towards the right longer-term solutions, or do they mostly leave people misled about what it’s going to take to solve the problem? I’m not sure.

  • What will we use the pause to do? An open letter like this one could be a step towards cooperative agreements on evaluations, standards, and governance, in which case it’s great. It could also go badly, if in six months labs go right back to developing powerful models and people walk away with the impression the pause was performative or meaningless. By itself, taking a few months off doesn’t gain us much (especially if a pause is entirely voluntary, so the least cooperative actors can simply ignore it). If we use that time well, to set up binding standards, good evaluations of whether our models are dangerous, and a much larger national conversation about what’s at stake here, then that could change everything.

  • Does this ask impact companies unevenly? This specific call — to not train models larger than GPT-4 — is inapplicable to almost every AI lab today, because most of them can’t train models larger than GPT-4 in the next six months anyway. OpenAI may well be the only AI lab in a position to act on, or not act on, this demand.

    That doesn’t delight me. Obviously, when regulations are being considered, one of the things companies inevitably do is try to design the regulations to advantage them and disadvantage their competitors. If proposed AI regulations appear to be an obvious grab at commercial rivals, I expect they’ll get less traction.

    Moreover, I’m worried that an unevenly applied moratorium might backfire. If OpenAI can’t train GPT-5 for 6 months, other AI labs may use that time to rush to train GPT-4-sized models. That could mean that when the moratorium is lifted, OpenAI feels more pressure to get ahead again and may push for an even larger training run than they were planning originally. This moratorium could end up accomplishing very little except for making competitive dynamics even fiercer.

    Overall, I’d prefer a policy that creates costs for all players and is careful to avoid creating potential perverse incentives.

Predicting the details of how future AI development will play out isn’t easy. But my best guess is that we’re facing a marathon, not a sprint. The next generation of language models will be even more powerful and scary than GPT-4, and the generation after that will be even scarier still. In my ideal world, we would pause and reflect and do a lot of safety evaluations, make models slightly bigger, and then pause again and do more reflecting and testing. We would do that over and over again as we inch toward transformative AI.

But we’re not living in an ideal world. The single most important thing we can do is to pause when the next model we train would be powerful enough to obsolete humans entirely, and then take as long as we need to work on AI alignment with the help of our existing models. That means that pausing now is mostly valuable insofar as it helps us build towards the harder, more complicated task of identifying when we might be at the brink and pausing for as long as we need to then. I’m not sure what the impact of this letter will be — it might help, or it might hurt.

I don’t want to lose sight of the basic point here in all this analysis. We could be doing so much better, in terms of approaching AI responsibly. The call for a pause comes from a place I empathize with a lot. If it were up to me, I would slow down AI development starting now — and then later slow down even more.

  1. Musk is also reportedly working on a competitor to OpenAI, which invites a cynical interpretation of his call to action here: perhaps he just wants to give his own lab a chance to catch up. I don’t think that’s the whole story, but I do think that many people at large labs are thinking about how to take safety measures that serve their own commercial interests. ↩︎





More posts like this

Sorted by Click to highlight new comments since: Today at 1:02 PM

I appreciated this post. I also found this Twitter thread arguing for caution bout slowing down AI progress (from @Matthew_Barnett) really interesting & helpful for learning about different considerations for why a pause might be harmful or not as helpful as one might think.

I should flag that I think it relies on a number of assumptions that people disagree on, or at least makes a number of controversial claims — see e.g. of direct pushback. I'm really interested in seeing more discussion on basically everything here, especially places where I'm misinterpreting or pushback/corroboration on any of the points from the thread. (Related to an older thread that I've summarized with GPT in this footnote.[1])

Some assorted highlights from the thread, for myself (including things that I think I disagree with):

  1. The hardware overhang argument: If a ban on large AI training runs is lifted, it could lead to a sudden jump in AI capabilities, which is particularly dangerous (starts here) — see also LessWrong's Concepts page on Computing Overhang
    1. "To summarize: if a ban on large training runs is ever lifted, then large actors will be able to immediately train larger runs than ever before. The longer a ban is sustained, the larger the jump will be, which would mean that we would see abrupt AI progress. // It seems that our best hope for making AI go well is ensuring that AI follows a smooth, incremental development curve. If we can do that, then we can test AI safety ideas on incrementally more powerful models, which would be safer than needing everything to work on the first try. 
      Some have said to me recently that we can simply continue the ban on large training runs indefinitely, or slowly lift the ban, which would ensure that progress is continuous. I agree that this strategy is possible in theory, but it's a very risky move. // There are many reasons why a ban on large training runs might be lifted suddenly. For example, a global war might break out, and the United States might want to quickly develop powerful AI to win the war. [...] 
      Right now, incremental progress is likely driven by the fact that companies simply can't scale their training runs by e.g. 6 OOMs in a short period of time. If we had a major hardware overhang caused by artificial policy constraints, that may no longer be true. // Even worse, a ban on large training runs would probably cause people to keep doubling down and try to extend the ban to prevent rapid progress via a sudden lifting of restrictions. The more that people double down, the more likely we are to get an overhang."
  2. In a ~section arguing that it's important to track that AI companies are incentivized[2] to make their models aligned, there was this point:
    1. " under [the theory that AI capabilities are vastly outpacing alignment research], you might expect GPT-2 to be very aligned with users, GPT-3 to be misaligned in subtle ways, and GPT-4 to be completely off-the-rails misaligned. However, if anything, we see the opposite, with GPT-4 exhibiting more alignment than GPT-2. // Of course, this is mostly because OpenAI put much more effort into aligning GPT-4 than they did for GPT-2, but that's exactly my point. As AI capabilities advance, we should expect AI companies to care more about safety. It's not clear to me that we're on a bad trajectory."
    2. Bold mine; that last bit was especially interesting for me. 
  3. There are downsides to competition, and a moratorium right now increases competition
    1. "To beat competition, companies have incentives to cut corners in AI safety in a risky gamble to remain profitable. The impacts of these mistakes may be amplified if their AI is very powerful. // 
      A moratorium on model scaling right now would likely *increase* the incentive for OpenAI to cut corners, since it would cut their lead ahead of competitors. // 
      More generally, given that algorithmic progress increasingly makes it cheaper to train powerful AI, scaling regulations plausibly have the effect of reducing the barriers to entry for AI firms, increasing competition -- ironically the opposite of traditional regulation."
    2. I don't know if there's a canonical source for the arguments about downsides to competition (e.g. it's discussed by Scott Alexander here, in the "The Race Argument" section) — I've read about it in bits and pieces in different places, and participated in conversations about it. I'm interested. 
  4. We have a limited budget of delays, and it might be better to use that budget closer to seriously dangerous systems
    1. This is related to the value of earlier vs. later work done on AI safety.
  1. ^

    The older thread

    1) AI progress is driven by two factors: model scaling and algorithmic progress.
    2) The open letter suggests prohibiting large training runs while allowing algorithmic progress, leading to a "hardware overhang."
    3) Discontinuous AI progress is less safe than continuous progress due to its unpredictability and challenges in coping with sudden powerful AI.
    4) Prohibiting large training runs for 6 months would reduce OpenAI's lead in AI development.
    5) A single leading actor in AI is safer than multiple top actors close together, as they can afford to slow down when approaching dangerous AI.
    6) Implementing a moratorium on model scaling now may be premature and would weaken OpenAI's lead.
    7) There is no clear rule for ending the proposed moratorium or a detailed plan for proceeding after the 6-month period.
    8) A general call to slow down AI could be acceptable if actors shift focus from algorithmic progress to safety, but a government-enforced prohibition on model scaling seems inappropriate.

  2. ^

    (Although, not as much as an impartial observer, probably.)

I can see why some people think the publicity effects of the letter might be valuable, but — when it comes to the 6-month pause proposal itself — I think Matthew's reasoning is right.

I've been surprised by how many EA folk are in favour of the actual proposal, especially given that AI governance literature often focuses on the risks of fuelling races. I'd be keen to read people's counterpoints to Matthew's thread(s); I don't think many expect GPT-5 will pose an existential threat, and I'm not yet convinced that 'practice' is a good enough reason to pursue a bad policy.

I wrote a shortform on this thread, inspired by Lizka's sharing of it.

On the letter itself, though, I have a bunch of uncertainties around whether a six month pause right now would actually help

I share many of your concerns, though I think on balance I feel more enthusiastic about the six-month pause. (Note that I'm thinking about a six-month pause on frontier AI development that is enforced across the board, at least in the US, and I'm more confused about a six-month pause that a few specific labs opt-in to). 

I wonder if this relates more to an epistemic difference (e.g., the actual credence we put on the six-month pause being net positive [or to be more nuanced, the EV we expect once we account for the entire distribution of outcomes]) or a communication difference (e.g., differences in our willingness to express support for things under conditions of high uncertainty). 

Regarding the worries you list, #4 is the one I'm most concerned/uncertain about. The others seem to boil down to "if we get a pause, we need to make sure we use it well." If we get a pause, we should use it to (a) strengthen AI governance ideas and evals, (b) develop and push for more ambitious asks, and (c) build a larger a coalition of people who are concerned about risks from advanced AI.

All of these things are hard. But, all else equal, they seem more likely to happen in a world with a six-month pause than a world without one. 

Whereas I think the fourth worry argues why the pause might be net negative. I'm particularly concerned about scenarios where there are many more actors at the frontier of AI development, and race dynamics are even more concerning. (On the other hand, a six-month pause is also a signal that the world is more likely to regulate frontier AI labs. If people expect that the six-month pause will be followed by additional regulation, this might make it less appealing for new actors to enter the race.)

Anyways, I'm still left wondering why I (a) agree with lots of your points yet (b) feel more enthusiastic about a six-month pause. 

I'm curious about your all-things-considered perspective on the six-month pause idea: Do you currently think it's net positive, net negative, or near-zero-value in expectation?

Kelsey - thanks for a very reasonable take on the pause, and some good questions.

IMO, the main benefit of the open letter isn't so much implementing the pause itself, but simply raising public awareness of AI risks (which it succeeded in doing, remarkably well), and helping ordinary people realize that they they have the political, social, and cultural power to stand up to an increasingly reckless, arrogant, & profit-hungry AI industry. 

In other words, the open letter is putting 'AI safety' inside the Overton window, as something that reputable citizens, politicians, pundits, and journalists can talk about, without being mocked or dismissed.  That's probably much more valuable than the 6-month pause itself could ever be.

This moratorium could end up accomplishing very little except for making competitive dynamics even fiercer.

This feels important and cruxy to me. I currently feel like this is a pretty big downside, and not worth a six month pause.

My main counterpoint and cause for optimism is that I believe this will "help us collectively produce common knowledge of the problem and build towards the right longer-term solutions."

In the end I am, like Kelsey, unsure.

Curated and popular this week
Relevant opportunities