Hide table of contents

The model of AI risk I’ve mostly believed for years was that of fast-takeoff and therefore a unipolar world.[1] This model allowed me to have some concrete models of what the EA community should do to make AI go better. Now, I am at least half-persuaded of slow-takeoff, multipolar worlds (1, 2). But I have much less idea what to do in this world. So, what should the top priorities be for EA longtermists who want to make AI go well?

Fast-takeoff, unipolar priorities, as seen by me writing quickly:

  • Get the top AI labs concerned about safety
    • Means that if they feel like they’re close to AGI, they will hopefully be receptive to whatever the state of the art in alignment research is
  • Try to solve the alignment problem in the most rigorous way possible.
    • After all, we only get one shot
  • [Less obvious to me] Try to get governments concerned about safety in case they nationalize AI labs. But also don’t increase the likelihood of them doing that by shouting about how AI is going to be this incredibly powerful thing.

Multipolar, slow takeoff worlds

  • Getting top AI labs concerned about safety seems much harder in the long term, as they become increasingly economically incentivized to ignore it.
  • Trying to solve the alignment problem in the most rigorous way possible seems less necessary. Also maybe alignment is easier, and therefore is less likely to be the thing that fails.
  • Governments might be captured by increasingly powerful private interests / there might be AI-powered propaganda that does … something to their ability to function.

Broadly in this world I’m much more worried about race-to-the-bottom dynamics. In Meditations on Moloch terms, instead of AI being the solution to Moloch, Moloch becomes the largest contributor to AI x-risk.

I’m interested in all sorts of comments, including:

  • What should the top priorities be to make AI go well in a slow-takeoff world?
  • Challenging the hypothetical
  • Is there anything wrong with this analysis?

  1. See Nick Bostrom’s Superintelligence ↩︎




New Answer
New Comment

2 Answers sorted by

I think that "AI alignment research right now" is a top priority in unipolar fast-takeoff worlds, and it's also a top priority in multipolar slow-takeoff worlds. (It's certainly not the only thing to do—e.g. there's multipolar-specific work to do, like the links in Jonas's answer on this page, or here etc.)

(COI note: I myself am doing "AI alignment research right now" :-P )

First of all, in the big picture, right now humanity is simultaneously pursuing many quite different research programs towards AGI (I listed a dozen or so here (see Appendix)). If more than one of them is viable (and I think that's likely), then in a perfect world we would figure out which of them has the best hope of leading to Safe And Beneficial AGI, and differentially accelerate that one (and/or differentially decelerate the others). This isn't happening today—that's not how most researchers are deciding what AI capabilities research to do, and it's not how most funding sources are deciding what AI capabilities research to fund. Could it happen in the future? Yes, I think so! But only if...

  • AI alignment researchers figure out which of these AGI-relevant research programs is more or less promising for safety,
  • …and broadly communicate that information to experts, using legible arguments…
  • …and do it way in advance of any of those research programs getting anywhere close to AGI

The last one is especially important. If some AI research program has already gotten to the point of super-powerful proto-AGI source code published on GitHub, there's no way you're going to stop people from using and improving it. Whereas if the research program is still very early-stage and theoretical, and needs many decades of intense work and dozens more revolutionary insights to really start getting powerful, then we have a shot at this kind of differential technological development strategy being viable.

(By the same token, maybe it will turn out that there's no way to develop safe AGI, and we want to globally ban AGI development. I think if a ban were possible at all, it would only be possible if we got started when we're still very far from being able to build AGI.)

So for example, if it's possible to build a "prosaic" AGI using deep neural networks, nobody knows whether it would be possible to control and use it safely. There are some kinda-illegible intuitive arguments on both sides. Nobody really knows. People are working on clarifying this question, and I think they're making some progress, and I'm saying that it would be really good if they could figure it out one way or the other ASAP.

Second of all, slow takeoff doesn't necessarily mean that we can just wait and solve the alignment problem later. Sometimes you can have software right in front of you, and it's not doing what you want it to do, but you still don't know how to fix it. The alignment problem could be like that.

One way to think about it is: How slow is slow takeoff, versus how long does it take to solve the alignment problem? We don't know.

Also, how much longer would it take, once somebody develops best practices to solve the alignment problem, for all relevant actors to reach a consensus that following those best practices is a good idea and in their self-interest? That step could add on years, or even decades—as they say, "science progresses one funeral at a time", and standards committees work at a glacial pace, to say nothing of government regulation, to say nothing of global treaties.

Anyway, if "slow takeoff" is 100 years, OK fine, that's slow enough. If "slow takeoff" is ten years, maybe that's slow enough if the alignment problem happens to have an straightforward, costless, highly-legible and intuitive, scalable solution that somebody immediately discovers. Much more likely, I think we would need to be thinking about the alignment problem in advance.

For more detailed discussion, I have my own slow-takeoff AGI doom scenario here. :-P

Thanks for your answer. (Just to check, I think you are a different Steve Byrnes than the one I met at Stanford EA in 2016 or so?)

I do  want to emphasize  is that I don't doubt that technical AI safety work is one of the top priorities. It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work. It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.

Steven Byrnes
No I don't think we've met! In 2016 I was a professional physicist living in Boston. I'm not sure if I would have even known what "EA" stood for in 2016. :-) I agree. But maybe I would have said "less hard" rather than "easier" to better convey a certain mood :-P I'm not sure what your model is here. Maybe a useful framing is "alignment tax": if it's possible to make an AI that can do some task X unsafely with a certain amount of time/money/testing/research/compute/whatever, then how much extra time/money/etc. would it take to make an AI that can do task X safely? That's the alignment tax. The goal is for the alignment tax to be as close as possible to 0%. (It's never going to be exactly 0%.) In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others won't, and we want one of the former to win the race, not one of the latter. In the slow-takeoff multipolar case, we want a low alignment tax because we're asking organizations to make tradeoffs for safety, and if that's a very big ask, we're less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all we're asking is for them to spend 1% more training time, maybe they all will. If instead we're asking them all to spend 100× more compute plus an extra 3 years of pre-deployment test protocols, well, that's much less promising. So either way, we want a low alignment tax. OK, now let's get back to what you wrote. I think maybe your model is: "If Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGI" (You can correct me if I'm misunderstanding.) If we accept th
Sorted by Click to highlight new comments since:

Thanks! I'm broadly sympathetic, including to the point that race-to-the-bottom dynamics seem like much bigger risks assuming slow takeoff (since more actors get reasons to join intense competition if AI looks more useful earlier in its development process).

Getting top AI labs concerned about safety seems much harder in the long term, as they become increasingly economically incentivized to ignore it.

This doesn't yet make sense to me. Why would they be incentivized to ignore safety?

Maybe the concern is that they're incentivized to make customers and regulators ignore safety?

  • But at worst that leaves top AI labs with minimal economic and regulatory incentives to take safety seriously, which would happen anyway under sufficiently fast takeoff. So at worst, slow takeoff would mean that getting top AI labs concerned about safety is just as hard, not harder.
    • Or maybe it's even worse, if--in order to keep the public from hearing about accidents--AI companies keep their own employees from learning about safety issues, or they filter for non-safety-conscious employees?
      • But companies have incentives to not do that, e.g. wanting talented employees who happen to be safety-conscious, wanting employees to have lots of info about the systems they're working on, and wanting safe products.
  • Also, the Streisand effect might mean that AI companies hiding risks (whether from the public or internally) counterproductively increases pressure toward safety.

I'd guess there's also a factor strongly pushing in the other direction (toward safety concerns being easier): small and medium-scale accidents seem significantly more likely to happen and scare people (with enough time for people to act on that, by changing regulations and consumption) if we assume slow takeoff. Companies' expectation of this, and its occurrence, would incentivize AI companies to pay some attention to safety.

I was assuming that designing safe AI systems is more expensive than otherwise, suppose 10% more expensive. In a world with only a few top AI labs which are not yet ruthlessly optimized, they could probably be persuaded to sacrifice that 10%. But to try to convince a trillion dollar company to sacrifice 10% of their budget requires a whole lot of public pressure. The bosses of those companies didn't get there without being very protective of 10% of their budgets.

You could challenge that though. You could say that alignment was instrumentally useful for creating market value. I'm not sure what my position is on that actually.

Thanks! Is the following a good summary of what you have in mind?

It would be helpful for reducing AI risk if the CEOs of top AI labs were willing to cut profits to invest in safety. That's more likely to happen if top AI labs are relatively small at a crucial time, because [??]. And top AI labs are more likely to be small at this crucial time if takeoff is fast, because fast takeoff leaves them with less time to create and sell applications of near-AGI-level AI. So it would be helpful for reducing AI risk if takeoff were fast.

What fills in the "[??]" in the above? I could imagine a couple of possibilities:

  • Slow takeoff gives shareholders more clear evidence that they should be carefully attending to their big AI companies, which motivates them to hire CEOs who will ruthlessly profit-maximize (or pressure existing CEOs to do that).
  • Slow takeoff somehow leads to more intense AI competition, in which companies that ruthlessly profit-maximize get ahead, and this selects for ruthlessly profit-maximizing CEOs.

Additional ways of challenging those might be:

  • Maybe slow takeoff makes shareholders much more wealthy (both by raising their incomes and by making ~everything cheaper) --> makes them value marginal money gains less --> makes them more willing to invest in safety.
  • Maybe slow takeoff gives shareholders (and CEOs) more clear evidence of risks --> makes them more willing to invest in safety.
  • Maybe slow takeoff involves the economies of scale + time for one AI developer to build a large lead well in advance of AGI, weakening the effects of competition.

This all seems reasonable.

Curated and popular this week
Relevant opportunities