Hide table of contents

Scott Alexander recently wrote a post about OpenAI's Planning for AGI and beyond. I found it thoughtful, and I think others here might want to read or discuss it. 

Some highlights:

ExxonMobil analogy

Imagine ExxonMobil releases a statement on climate change. It’s a great statement! They talk about how preventing climate change is their core value. They say that they’ve talked to all the world’s top environmental activists at length, listened to what they had to say, and plan to follow exactly the path they recommend. So (they promise) in the future, when climate change starts to be a real threat, they’ll do everything environmentalists want, in the most careful and responsible way possible. They even put in firm commitments that people can hold them to.

An environmentalist, reading this statement, might have thoughts like:

  • Wow, this is so nice, they didn’t have to do this.
  • I feel really heard right now!
  • They clearly did their homework, talked to leading environmentalists, and absorbed a lot of what they had to say. What a nice gesture!
  • And they used all the right phrases and hit all the right beats!
  • The commitments seem well thought out, and make this extra trustworthy.
  • But what’s this part about “in the future, when climate change starts to be a real threat”?
  • Is there really a single, easily-noticed point where climate change “becomes a threat”?
  • If so, are we sure that point is still in the future?
  • Even if it is, shouldn’t we start being careful now?
  • Are they just going to keep doing normal oil company stuff until that point?
  • Do they feel bad about having done normal oil company stuff for decades? They don’t seem to be saying anything about that.
  • What possible world-model leads to not feeling bad about doing normal oil company stuff in the past, not planning to stop doing normal oil company stuff in the present, but also planning to do an amazing job getting everything right at some indefinite point in the future?
  • Are they maybe just lying?
  • Even if they’re trying to be honest, will their bottom line bias them towards waiting for some final apocalyptic proof that “now climate change is a crisis”, of a sort that will never happen, so they don’t have to stop pumping oil?

This is how I feel about OpenAI’s new statement, Planning For AGI And Beyond.

Doomer argument: Acceleration burns time

Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI.

Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty.

Response to OpenAI's argument that gradual deployment helps society prepare for dangerous AI systems

You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this:

  • Release AI #1
  • Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
  • Then release AI #2
  • Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.
  • And so on . . .

Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them.

Response to three other arguments in favor of acceleration ("we want safety-conscious actors to be ahead", compute overhang, and "we want to demonstrate dangers as quickly as possible so the world takes AI safety more seriously")

These three lines of reasoning argue that that burning a lot of timeline now might give us a little more timeline later. This is a good deal if:

Burning timeline now actually buys us the extra timeline later. For example, it’s only worth burning timeline to establish a lead if you can actually get the lead and keep it.

A little bit of timeline later is worth a lot of timeline now.

Everybody between now and later plays their part in this complicated timeline-burning dance and doesn’t screw it up at the last second.

I’m skeptical of all of these.

DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years, but a few months after they came out with GPT, at least Google, Facebook, and Anthropic had comparable large language models; a few months after they came out with DALL-E, random nobody startups came out with StableDiffusion and MidJourney. None of this research has established a commanding lead, it’s just moved everyone forward together and burned timelines for no reason.

The alignment researchers I’ve talked to say they’ve already got their hands full with existing AIs. Probably they could do better work with more advanced models, but it’s not an overwhelming factor, and they would be happiest getting to really understand what’s going on now before the next generation comes out. One researcher I talked to said the arguments for acceleration made sense five years ago, when there was almost nothing worth experimenting on, but that they no longer think this is true.

Finally, all these arguments for burning timelines require that lots of things go right later. The same AI companies burning timelines now turn into model citizens when the stakes get higher, and convert their lead into improved safety instead of capitalizing on it to release lucrative products. The government responds to an AI crisis responsibly, rather than by ignoring it or making it worse.

OpenAI's impact on timelines thus far

On the other hand - man, they sure have burned a lot of timeline. The big thing all the alignment people were trying to avoid in the early 2010s was an AI race. DeepMind was the first big AI company, so we should just let them to their thing, go slowly, get everything right, and avoid hype. Then Elon Musk founded OpenAI in 2015, murdered that plan, mutilated the corpse, and danced on its grave. Even after Musk left, the remaining team did everything to challenge everyone else to a race short of shooting a gun and waving a checkered flag.

OpenAI still hasn’t given a good explanation of why they did this. Absent anything else, I’m forced to wonder if it’s just “they’re just the kind of people who would do that sort of thing” - in which case basically any level of cynicism would be warranted.

I hate this conclusion. I’m trying to resist it. I want to think the best of everyone. Individual people at OpenAI have been very nice to me. I like them. They've done many good things for the world.

FTX fallout and analogy to OpenAI

Scott, suppose a guy named Sam, who you’re predisposed to like because he’s said nice things about your blog, founds a multibillion dollar company. It claims to be saving the world, and everyone in the company is personally very nice and says exactly the right stuff. On the other hand it’s aggressive, seems to cut some ethical corners, and some of your better-emotionally-attuned friends get bad vibes from it. Consider the possibility that either they’re lying and not as nice as they sound, or at the very least that they’re not as smart as they think they are and their master plan will spiral out of control before they’re able to get to the part where they do the good things.

Praise for commitment to independent evals, stop-and-assist clause, and hope for a more safety-conscious OpenAI moving forward

Realistically we’re going to thank them profusely for their extremely good statement, then cross our fingers really hard that they’re telling the truth.

OpenAI has unilaterally offered to destroy the world a bit less than they were doing before. They’ve voluntarily added things that look like commitments - some enforceable in the court of public opinion, others potentially in courts of law. Realistically we’ll say “thank you for doing that”, offer to help them turn those commitments into reality, and do our best to hold them to it. It doesn’t mean we have to like them period, or stop preparing for them to betray us. But on this particular sub-sub-topic we should take the W.

Where OpenAI goes, other labs might follow. The past eight years of OpenAI policy have been far from ideal. But this document represents a commitment to move from safety laggard to safety model, and I look forward to seeing how it works out.

105

0
0

Reactions

0
0

More posts like this

Comments4


Sorted by Click to highlight new comments since:

Akash - thanks for posting this. Scott Alexander, as usual, has good insights, and is well worth reading here. 

I think at some point, EAs might have to bite the bullet, set aside our all-too-close ties to the AI industry, and realize that 'AGI is an X-risk' boils down 'OpenAI, Deepmind, and other AI companies that aren't actually taking AIXR seriously are the real X risks' -- and should be viewed and treated accordingly.

100% agree. 

 I like the analogy with Exon mobil, I think it's helpful to keep that comparison in mind.

I mentioned before that I don't think companies that work on AI should have a significant voice in the AI discourse, at least in the EA sphere - we can't control the public discourse.

The primary purpose  (maybe 80% + of their purpose) of a company is to make money, plain and simple. The job of their PR people is to garner public support through whichever means necessary. Often that is by sounding as reasonable as possible. Their press releases, blogs, podcasts etc. should be treated at worst as dangerous propaganda, at best as biased and compromised arguments.

Why then do we engage with their arguments so seriously? There are so many contrasting opinions on AI safety even among neutral researches that are hard to understand and important to engage with, why would we throw compromised perspectives in the mix?

I lean towards using these kinds of blogs to understand the plans of AI companies and to understand the arguments we need to counter in the public sphere, not as reasonable well thought out opinions by neutral people.

As a former climate activist who organised a protest outside Exxon offices after my country failed to commit to climate agreements, I can personally confirm Scott's hypothetical.

I also share many of the same concerns of the AGI race dynamics. The current frontrunners in the feared "AGI race" are all AI Safety companies, and ChatGPT has attracted billions into capabilities research from people who otherwise would've never looked into AI.

Just a week ago, Peking University Zhu Song-Chun professor spoke at a CCP conference about how China needs to go all-in to beat the US to AGI. ChatGPT created a very compelling proof-of-concept to pour money into AI.

Counterfactuals and uncertainties aside, the AI Safety community has created the AGI race. I wonder if it's a good idea.

Good analogy. Note that environmental statements made by oil companies cannot be trusted even for a few years when expected profits increase, even when costly actions and investment patterns appear to back them up temporarily. E.g.
https://www.ft.com/content/b5b21c66-92de-45c0-9621-152aa335d48c

'BPs chief executive Bernard Looney defended its latest reversal, stating that “The conversation three or four years ago was somewhat singular around cleaner energy, lower-carbon energy. Today, there is much more conversation about energy security, energy affordability.”'

Curated and popular this week
trammell
 ·  · 25m read
 · 
Introduction When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously. The phenomenon in general is therefore sometimes known as the “Peltzman Effect”, though it is more often known as “risk compensation”.[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2] In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior. There’s no reason why risk compensation shouldn’t apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3] Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it “the dangerous valley problem”. There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms. In a sense what they do goes well beyond this post, but as far as I’m aware none of t
 ·  · 19m read
 · 
I am no prophet, and here’s no great matter. — T.S. Eliot, “The Love Song of J. Alfred Prufrock”   This post is a personal account of a California legislative campaign I worked on March-June 2024, in my capacity as the indoor air quality program lead at 1Day Sooner. It’s very long—I included as many details as possible to illustrate a playbook of everything we tried, what the surprises and challenges were, and how someone might spend their time during a policy advocacy project.   History of SB 1308 Advocacy Effort SB 1308 was introduced in the California Senate by Senator Lena Gonzalez, the Senate (Floor) Majority Leader, and was sponsored by Regional Asthma Management and Prevention (RAMP). The bill was based on a report written by researchers at UC Davis and commissioned by the California Air Resources Board (CARB). The bill sought to ban the sale of ozone-emitting air cleaners in California, which would have included far-UV, an extremely promising tool for fighting pathogen transmission and reducing pandemic risk. Because California is such a large market and so influential for policy, and the far-UV industry is struggling, we were seriously concerned that the bill would crush the industry. A partner organization first notified us on March 21 about SB 1308 entering its comment period before it would be heard in the Senate Committee on Natural Resources, but said that their organization would not be able to be publicly involved. Very shortly after that, a researcher from Ushio America, a leading far-UV manufacturer, sent out a mass email to professors whose support he anticipated, requesting comments from them. I checked with my boss, Josh Morrison,[1] as to whether it was acceptable for 1Day Sooner to get involved if the partner organization was reluctant, and Josh gave me the go-ahead to submit a public comment to the committee. Aware that the letters alone might not do much, Josh reached out to a friend of his to ask about lobbyists with expertise in Cal
Rasool
 ·  · 1m read
 · 
In 2023[1] GiveWell raised $355 million - $100 million from Open Philanthropy, and $255 million from other donors. In their post on 10th April 2023, GiveWell forecast the amount they expected to raise in 2023, albeit with wide confidence intervals, and stated that their 10th percentile estimate for total funds raised was $416 million, and 10th percentile estimate for funds raised outside of Open Philanthropy was $260 million.  10th percentile estimateMedian estimateAmount raisedTotal$416 million$581 million$355 millionExcluding Open Philanthropy$260 million$330 million$255 million Regarding Open Philanthropy, the April 2023 post states that they "tentatively plans to give $250 million in 2023", however Open Philanthropy gave a grant of $300 million to cover 2023-2025, to be split however GiveWell saw fit, and it used $100 million of that grant in 2023. However for other donors I'm not sure what caused the missed estimate Credit to 'Arnold' on GiveWell's December 2024 Open Thread for bringing this to my attention   1. ^ 1st February 2023 - 31st January 2024
Relevant opportunities
26
CEEALAR
· · 1m read