Hide table of contents

Scott Alexander recently wrote a post about OpenAI's Planning for AGI and beyond. I found it thoughtful, and I think others here might want to read or discuss it. 

Some highlights:

ExxonMobil analogy

Imagine ExxonMobil releases a statement on climate change. It’s a great statement! They talk about how preventing climate change is their core value. They say that they’ve talked to all the world’s top environmental activists at length, listened to what they had to say, and plan to follow exactly the path they recommend. So (they promise) in the future, when climate change starts to be a real threat, they’ll do everything environmentalists want, in the most careful and responsible way possible. They even put in firm commitments that people can hold them to.

An environmentalist, reading this statement, might have thoughts like:

  • Wow, this is so nice, they didn’t have to do this.
  • I feel really heard right now!
  • They clearly did their homework, talked to leading environmentalists, and absorbed a lot of what they had to say. What a nice gesture!
  • And they used all the right phrases and hit all the right beats!
  • The commitments seem well thought out, and make this extra trustworthy.
  • But what’s this part about “in the future, when climate change starts to be a real threat”?
  • Is there really a single, easily-noticed point where climate change “becomes a threat”?
  • If so, are we sure that point is still in the future?
  • Even if it is, shouldn’t we start being careful now?
  • Are they just going to keep doing normal oil company stuff until that point?
  • Do they feel bad about having done normal oil company stuff for decades? They don’t seem to be saying anything about that.
  • What possible world-model leads to not feeling bad about doing normal oil company stuff in the past, not planning to stop doing normal oil company stuff in the present, but also planning to do an amazing job getting everything right at some indefinite point in the future?
  • Are they maybe just lying?
  • Even if they’re trying to be honest, will their bottom line bias them towards waiting for some final apocalyptic proof that “now climate change is a crisis”, of a sort that will never happen, so they don’t have to stop pumping oil?

This is how I feel about OpenAI’s new statement, Planning For AGI And Beyond.

Doomer argument: Acceleration burns time

Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI.

Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty.

Response to OpenAI's argument that gradual deployment helps society prepare for dangerous AI systems

You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this:

  • Release AI #1
  • Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
  • Then release AI #2
  • Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
  • And so on . . .

Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them.

Response to three other arguments in favor of acceleration ("we want safety-conscious actors to be ahead", compute overhang, and "we want to demonstrate dangers as quickly as possible so the world takes AI safety more seriously")

These three lines of reasoning argue that that burning a lot of timeline now might give us a little more timeline later. This is a good deal if:

Burning timeline now actually buys us the extra timeline later. For example, it’s only worth burning timeline to establish a lead if you can actually get the lead and keep it.

A little bit of timeline later is worth a lot of timeline now.

Everybody between now and later plays their part in this complicated timeline-burning dance and doesn’t screw it up at the last second.

I’m skeptical of all of these.

DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years, but a few months after they came out with GPT, at least Google, Facebook, and Anthropic had comparable large language models; a few months after they came out with DALL-E, random nobody startups came out with StableDiffusion and MidJourney. None of this research has established a commanding lead, it’s just moved everyone forward together and burned timelines for no reason.

The alignment researchers I’ve talked to say they’ve already got their hands full with existing AIs. Probably they could do better work with more advanced models, but it’s not an overwhelming factor, and they would be happiest getting to really understand what’s going on now before the next generation comes out. One researcher I talked to said the arguments for acceleration made sense five years ago, when there was almost nothing worth experimenting on, but that they no longer think this is true.

Finally, all these arguments for burning timelines require that lots of things go right later. The same AI companies burning timelines now turn into model citizens when the stakes get higher, and convert their lead into improved safety instead of capitalizing on it to release lucrative products. The government responds to an AI crisis responsibly, rather than by ignoring it or making it worse.

OpenAI's impact on timelines thus far

On the other hand - man, they sure have burned a lot of timeline. The big thing all the alignment people were trying to avoid in the early 2010s was an AI race. DeepMind was the first big AI company, so we should just let them to their thing, go slowly, get everything right, and avoid hype. Then Elon Musk founded OpenAI in 2015, murdered that plan, mutilated the corpse, and danced on its grave. Even after Musk left, the remaining team did everything to challenge everyone else to a race short of shooting a gun and waving a checkered flag.

OpenAI still hasn’t given a good explanation of why they did this. Absent anything else, I’m forced to wonder if it’s just “they’re just the kind of people who would do that sort of thing” - in which case basically any level of cynicism would be warranted.

I hate this conclusion. I’m trying to resist it. I want to think the best of everyone. Individual people at OpenAI have been very nice to me. I like them. They've done many good things for the world.

FTX fallout and analogy to OpenAI

Scott, suppose a guy named Sam, who you’re predisposed to like because he’s said nice things about your blog, founds a multibillion dollar company. It claims to be saving the world, and everyone in the company is personally very nice and says exactly the right stuff. On the other hand it’s aggressive, seems to cut some ethical corners, and some of your better-emotionally-attuned friends get bad vibes from it. Consider the possibility that either they’re lying and not as nice as they sound, or at the very least that they’re not as smart as they think they are and their master plan will spiral out of control before they’re able to get to the part where they do the good things.

Praise for commitment to independent evals, stop-and-assist clause, and hope for a more safety-conscious OpenAI moving forward

Realistically we’re going to thank them profusely for their extremely good statement, then cross our fingers really hard that they’re telling the truth.

OpenAI has unilaterally offered to destroy the world a bit less than they were doing before. They’ve voluntarily added things that look like commitments - some enforceable in the court of public opinion, others potentially in courts of law. Realistically we’ll say “thank you for doing that”, offer to help them turn those commitments into reality, and do our best to hold them to it. It doesn’t mean we have to like them period, or stop preparing for them to betray us. But on this particular sub-sub-topic we should take the W.

Where OpenAI goes, other labs might follow. The past eight years of OpenAI policy have been far from ideal. But this document represents a commitment to move from safety laggard to safety model, and I look forward to seeing how it works out.

Comments4


Sorted by Click to highlight new comments since:

Akash - thanks for posting this. Scott Alexander, as usual, has good insights, and is well worth reading here. 

I think at some point, EAs might have to bite the bullet, set aside our all-too-close ties to the AI industry, and realize that 'AGI is an X-risk' boils down 'OpenAI, Deepmind, and other AI companies that aren't actually taking AIXR seriously are the real X risks' -- and should be viewed and treated accordingly.

100% agree. 

 I like the analogy with Exon mobil, I think it's helpful to keep that comparison in mind.

I mentioned before that I don't think companies that work on AI should have a significant voice in the AI discourse, at least in the EA sphere - we can't control the public discourse.

The primary purpose  (maybe 80% + of their purpose) of a company is to make money, plain and simple. The job of their PR people is to garner public support through whichever means necessary. Often that is by sounding as reasonable as possible. Their press releases, blogs, podcasts etc. should be treated at worst as dangerous propaganda, at best as biased and compromised arguments.

Why then do we engage with their arguments so seriously? There are so many contrasting opinions on AI safety even among neutral researches that are hard to understand and important to engage with, why would we throw compromised perspectives in the mix?

I lean towards using these kinds of blogs to understand the plans of AI companies and to understand the arguments we need to counter in the public sphere, not as reasonable well thought out opinions by neutral people.

As a former climate activist who organised a protest outside Exxon offices after my country failed to commit to climate agreements, I can personally confirm Scott's hypothetical.

I also share many of the same concerns of the AGI race dynamics. The current frontrunners in the feared "AGI race" are all AI Safety companies, and ChatGPT has attracted billions into capabilities research from people who otherwise would've never looked into AI.

Just a week ago, Peking University Zhu Song-Chun professor spoke at a CCP conference about how China needs to go all-in to beat the US to AGI. ChatGPT created a very compelling proof-of-concept to pour money into AI.

Counterfactuals and uncertainties aside, the AI Safety community has created the AGI race. I wonder if it's a good idea.

Good analogy. Note that environmental statements made by oil companies cannot be trusted even for a few years when expected profits increase, even when costly actions and investment patterns appear to back them up temporarily. E.g.
https://www.ft.com/content/b5b21c66-92de-45c0-9621-152aa335d48c

'BPs chief executive Bernard Looney defended its latest reversal, stating that “The conversation three or four years ago was somewhat singular around cleaner energy, lower-carbon energy. Today, there is much more conversation about energy security, energy affordability.”'

Curated and popular this week
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal
 ·  · 1m read
 · 
We’ve written a new report on the threat of AI-enabled coups.  I think this is a very serious risk – comparable in importance to AI takeover but much more neglected.  In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here’s a very basic threat model for AI takeover: 1. Humanity develops superhuman AI 2. Superhuman AI is misaligned and power-seeking 3. Superhuman AI seizes power for itself And now here’s a closely analogous threat model for AI-enabled coups: 1. Humanity develops superhuman AI 2. Superhuman AI is controlled by a small group 3. Superhuman AI seizes power for the small group While the report focuses on the risk that someone seizes power over a country, I think that similar dynamics could allow someone to take over the world. In fact, if someone wanted to take over the world, their best strategy might well be to first stage an AI-enabled coup in the United States (or whichever country leads on superhuman AI), and then go from there to world domination. A single person taking over the world would be really bad. I’ve previously argued that it might even be worse than AI takeover. [1] The concrete threat models for AI-enabled coups that we discuss largely translate like-for-like over to the risk of AI takeover.[2] Similarly, there’s a lot of overlap in the mitigations that help with AI-enabled coups and AI takeover risk — e.g. alignment audits to ensure no human has made AI secretly loyal to them, transparency about AI capabilities, monitoring AI activities for suspicious behaviour, and infosecurity to prevent insiders from tampering with training.  If the world won't slow down AI development based on AI takeover risk (e.g. because there’s isn’t strong evidence for misalignment), then advocating for a slow down based on the risk of AI-enabled coups might be more convincing and achieve many of the same goals.  I really want to encourage readers — especially those at labs or governments — to do something