Hide table of contents

(Co-written by Connor Leahy and Gabe)

We have talked to a whole bunch of people about pauses and moratoriums. Members of the AI safety community, investors, business peers, politicians, and more.

Too many claimed to pursue the following approach:

  1. It would be great if AGI progress stopped, but that is infeasible.
  2. Therefore, I will advocate for what I think is feasible, even if it is not ideal. 
  3. The Overton window being what it is, if I claim a belief that is too extreme, or endorse an infeasible policy proposal, people will take me less seriously on the feasible stuff.
  4. Given this, I will be tactical in what I say, even though I will avoid stating outright lies.

Consider if this applies to you, or people close to you.

If it does, let us be clear: hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.

Not only is it morally wrong, it makes for a terrible strategy. As it stands, the AI Safety Community itself can not coordinate to state that we should stop AGI progress right now!

Not only can it not coordinate, the AI Safety Community is defecting, by making it more costly for people who do say it to say it.

We all feel like we are working on the most important things, and that we are being pragmatic realists.

But remember: If you feel stuck in the Overton window, it is because YOU ARE the Overton window.

1. The AI Safety Community is making our job harder

In a saner world, all AGI progress should have already stopped. If we don’t, there’s more than a 10% chance we all die.

Many people in the AI safety community believe this, but they have not stated it publicly. Worse, they have stated different beliefs more saliently, which misdirect everyone else about what should be done, and what the AI safety community believes.

To date, in our efforts to inform, motivate and coordinate with people: People in the AI Safety Community publicly lying has been one of the biggest direct obstacles we have encountered.

The newest example of this is ”Responsible Scaling Policies”, with many AI Safety people being much more vocal about their endorsement of RSPs than their private belief that in a saner world, all AGI progress should stop right now.

Because of them, we have been told many times that we are a minority voice, and that most people in the AI Safety community (understand, Open Philanthropy adjacent) disagree that we should stop all AGI progress right now.

That actually, there is an acceptable way to continue scaling! And given that this makes things easier, if there is indeed an acceptable way to continue scaling, this is what we should do, rather than stop all AGI progress right now!

Recently, Dario Amodei (Anthropic CEO), has used the RSP to frame the moratorium position as the most extreme version of an extreme position, and this is the framing that we have seen used over and over again. ARC mirrors this in their version of the RSP proposal, describing itself as a “pragmatic middle ground” between a moratorium and doing nothing.

Obviously, all AGI Racers use this against us when we talk to people.

There are very few people that we have consistently seen publicly call for a stop to AGI progress. The clearest ones are Eliezer’s “Shut it All Down” and Nate’s “Fucking stop”.

The loudest silence is from Paul Christiano, whose RSPs are being used to safety-wash scaling.

Proving me wrong is very easy. If you do believe that, in a saner world, we would stop all AGI progress right now, you can just write this publicly.

When called out on this, most people we talk to just fumble.

2. Lying for Personal Gain

We talk to many people who publicly lie about their beliefs.

The justifications are always the same: “it doesn’t feel like lying”, “we don’t state things we do not believe”, “we are playing an inside game, so we must be tactical in what we say to gain influence and power”.

Let me call this for what it is: lying for personal gain. If you state things whose main purpose is to get people to think you believe something else, and you do so to gain more influence and power: you are lying for personal gain.

The results of this “influence and power-grabbing” has many times over materialised with the safety-washing of the AGI race. What a coincidence it is that DeepMind, OpenAI and Anthropic are all related to the AI Safety community.

The only benefit we see from this politicking is the people lying gain more influence, while the time we have left to AGI keeps getting shorter.

Consider what happens when a community rewards the people who gain more influence by lying!

So many people lie, and they screw not only humanity, but one another.

Many AGI corp leaders will privately state that in a saner world, AGI progress should stop, but they will not state it because it would hurt their ability to race against each other!

Safety people will lie so that they can keep ties with labs in order to “pressure them” and seem reasonable to politicians.

Whatever: they just lie to gain more power.

“DO NOT LIE PUBLICLY ABOUT GRAVE MATTERS” is a very strong baseline. If you want to defect, you need a much stronger reason than “it will benefit my personal influence, and I promise I’ll do good things with it”.

And you need to accept the blame when you’re called out. You should not muddy the waters by justifying your lies, covering them, telling people they misunderstood, and try to maintain more influence within the community.

We have seen so many people be taken in this web of lies: from politicians and journalists, to engineers and intellectuals, all up until the concerned EA or regular citizen who wants to help, but is confused by our message when it looks like the AI safety community is ok with scaling.

Your lies compound and make the world a worse place.

There is an easy way to fix this situation: we can adopt the norm of publicly stating our true beliefs about grave matters.

If you know someone who claims to believe that in a saner world we should stop all AGI progress, tell them to publicly state their beliefs, unequivocally. Very often, you’ll see them fumbling, caught in politicking. And not that rarely, you’ll see that they actually want to keep racing. In these situations, you might want to stop finding excuses for them.

3. The Spirit of Coordination

A very sad thing that we have personally felt is that it looks like many people are so tangled in these politics that they do not understand what the point of honesty even is.

Indeed, from the inside, it is not obvious that honesty is a good choice. If you are honest, publicly honest, or even adversarially honest, you just make more opponents, you have less influence, and you can help less.

This is typical deontology vs consequentialism. Should you be honest, if from your point of view, it increases the chances of doom?

The answer is YES.

a) Politicking has many more unintended consequences than expected.

Whenever you lie, you shoot potential allies at random in the back.
Whenever you lie, you make it more acceptable for people around you to lie.

b) Your behavior, especially if you are a leader, a funder or a major employee (first 10 employees, or responsible for >10% of the headcount of the org), ripples down to everyone around you.

People lower in the respectability/authority/status ranks do defer to your behavior.
People outside of these ranks look at you.
Our work toward stopping AGI progress becomes easier whenever a leader/investor/major employee at Open AI, DeepMind, Anthropic, ARC, Open Philanthropy, etc. states their beliefs about AGI progress more clearly.

c) Honesty is Great.

Existential Risks from AI are now going mainstream. Academics talk about it. Tech CEOs talk about it. You can now talk about it, not be a weirdo, and gain more allies. Polls show that even non-expert citizens express diverse opinions about super intelligence.

Consider the following timeline:

  • ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.
  • People at AGI labs working in the safety teams echo this message publicly.
  • AGI labs leaders who think this state it publicly.
  • We start coordinating explicitly against orgs (and groups within orgs) that race.
  • We coordinate on a plan whose final publicly stated goal is to get to a world state that, most of us agree is not one where humanity’s entire existence is at risk.
  • We publicly, relentlessly optimise for this plan, without compromising on our beliefs.

Whenever you lie for personal gain, you fuck up this timeline.

When you start being publicly honest, you will suffer a personal hit in the short term. But we truly believe that, coordinated and honest, we will have timelines much longer than any Scaling Policy will ever get us.





More posts like this

Sorted by Click to highlight new comments since:

I feel like this post is doing something I really don't like, which I'd categorize as something like "instead of trying to persuade with arguments, using rhetorical tricks to define terms in such a way that the other side is stuck defending a loaded concept and has an unjustified uphill battle."

For instance:

let us be clear: hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.

I mean, no, that's just not how the term is usually used. It's misleading to hide your beliefs in that way, and you could argue it's dishonest, but it's not generally what people would call a "lie" (or if they did, they'd use the phrase "lie by omission"). One could argue that lies by omission are no less bad than lies by commission, but I think this is at least nonobvious, and also a view that I'm pretty sure most people don't hold. You could have written this post with words like "mislead" or "act coyly about true beliefs" instead of "lie", and I think that would have made this post substantially better.

I also feel like the piece weirdly implies that it's dishonest to advocate for a policy that you think is second best. Like, this just doesn't follow – someone could, for instance, want a $20/hr minimum wage, and advocate for a $15/hr minimum wage based on the idea that it's more politically feasible, and this isn't remotely dishonest unless they're being dishonest about their preference for $20/hr in other communications. You say:

many AI Safety people being much more vocal about their endorsement of RSPs than their private belief that in a saner world, all AGI progress should stop right now.

but this simply isn't contradictory – you could think a perfect society would pause but that RSPs are still good and make more sense to advocate for given the political reality of our society.

Isn’t your point a little bit pedantic here in the sense that you seem to be perfectly able to understand the key point the post was trying to make, find that point somewhat objectionable or controversial, and thus point to some issues regarding „framing“ rather than really engage deeply with the key points?

Of course, every post could be better written, more thoughtful, etc. but let’s be honest, we are here to make progress on important issues and not to win „argument style points.“ In particular, I find it disturbing that this technique of criticizing style of argument seems to be used quite often to discredit or not engage with „niche“ viewpoints that criticize prevailing „mainstream“ opinions in the EA community. Happened to me as well, when I was suggesting we should look more into whether there are maybe alternatives to purely for profit/closed sourced driven business models for AI ventures. Some people where bending over backwards to argue some concerns that were only tangentially related to the proposal I made (e.g., government can't be trusted and is incompetent so anything involving regulation could never ever work, etc.). Another case was a post on engagement with "post growth" concepts. There I witnessed something like a wholesale character assassination of the post growth community for whatever reasons. Not saying this happened here but I am simply trying to show a pattern of dismissal of niche viewpoints for spurious, tangential reasons without really engaging with them.

Altogether, wouldn’t it be more productive to have more open minded discussions and practice more of what we preach to the normies out there ourselves (e.g., steel-manning instead of straw-manning)? Critiquing style is fine and has its place but maybe let’s do substance first and style second?

My problem with the post wasn't that it used subpar prose or "could be written better", it's that it uses rhetorical techniques that make actual exchange of ideas and truth-seeking harder. This isn't about "argument style points", it's about cultivating norms in the community that make it easier for us to converge on truth, even on hard topics.

The reason I didn't personally engage with the object level is I didn't feel like I had anything particularly valuable to say on the topic. I didn't avoid saying my object-level views (if he had written a similar post with a style I didn't take issue with, I wouldn't have responded at all), and I don't want other people in the community to avoid engaging with the ideas either.

Hey Daniel,

as I also stated in another reply to Nick, I didn’t really mean to diminish the point you raised but to highlight that this is really more of a „meta point“ that’s only tangential to the matter of the issue outlined. My critical reaction was not meant to be against you or the point you raised but the more general community practice / trend of focusing on those points at the expense of engaging the subject matter itself, in particular, when the topic is against mainstream thinking. This I think is somewhat demonstrated by the fact that your comment is by far the most upvoted on an issue that would have far reaching implications if accepted as having some merit.

Hope this makes it clearer. Don’t mean to criticize the object level of your argument, it’s just coincidental that I picked out your comment to illustrate a problematic development that I see.

P.S.: There is also some irony in me posting a meta critique of a meta critique to argue for more object level engagement but that’s life I guess.

Thanks Alex. In general I agree with you, if viewpoints are expressed that are outside of what most EAs think, they do sometimes get strawmanned and voted down without good reason (like you say ideas like handing more power to governments and post-growth concepts). In this case though I think the original poster was fairly aggressive with rhetorical tricks, as a pretty large part of making their argument - so I think Daniel's criticism was reasonable.

Hey Nick,

thanks for your reply. I didn’t mean to say that Daniel didn’t have a point. It’s a reasonable argument to make. I just wanted to highlight that this shouldn’t be the only angle to look at such posts. If you look, his comment is by far the most upvoted and it only addresses a point tangential to the problem at hand. Of course, getting upvoted is not his „fault“. I just felt compelled to highlight that overly focusing on this kind of angle only brings us so far.

Hope that makes it clearer :)

I think this is tantamount to saying that we shouldn't engage within the political system, compromise, or meet people where they are coming from in our advocacy. I don't think other social movements would have got anywhere with this kind of attitude, and this seems especially tricky with something very detail orientated like a AI safety.

Inside game approaches (versus outside game approaches like this is describing) are going to require engaging in things this post says that no one should do. Boldly stating exactly the ideal situation you are after could have its role, but I'd need to see and much more detailed argument about why that should be the only game in town when it comes to AI.

I think that as AI safety turns more into an advocacy project it needs engage more with the existing literature on the subject including what has worked for past social movements.

Also, importantly, this isn't lying (as Daniel's comment explains). 

what has worked for past social movements

This is fundamentally different imo, because we aren't asking for people to right injustices, stick up for marginalised groups, care about future generations, or do good of any kind; we're asking people not to kill literally everyone, including ourselves, and for those who would do (however unintentionally) to be stopped by governments. It's a matter of survival above all else.

I don't think the scale or expected value affects this strategy question directly. You still just use a strategy that is going to be most likely to achieve the goal.

If the goal is something you have really widespread agreement on, that probably leans you towards an uncompromising, radical ask approach. Seems like things might be going pretty well for AI safety in that respect, though I don't know if it's been established that people are buying into the high probability of doom arguments that much. I suspect that we are much less far along than the climate change movement in that respect, for example. And even if support were much greater, I wouldn't agree with a lot of this post.

Oh, my expertise is in animal advocacy, not AI safety FYI

I think there is something to be said for the radical flank effect, and Connor and Gabe are providing a somewhat radical flank (even though I actually think the "fucking stop!" position is the most reasonable, moderate one, given the urgency and the stakes!).

note - everything i say here is based on my own interpretation of the AI scene. I have no special knowledge, so this may be inaccurate, out-of-date, or both

So this post has provoked a lot of reaction, and I'm not going to defend all of it. I do think, however, that debating what is lying/white lying/lying by omission/being misleading is not as important as the other issues. (I think the tone is overly combative and hurts its cause, for instance). But from the perspective of someone like Connor, I can see why they may think that the AI Safety Community (especially those at the larger labs) are deliberately falsifying their beliefs.

On the May 16th Senate Judiciary Committee hearing where the witnesses are under oath[1], Blumenthal asks Sam Altman about his quote that "development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity" and leads on to talk about the effect of AI on jobs and employment. Sam answers this by also talking about jobs, but Gary Marcus then calls him out by saying "Sam's worst fear I do not think is employment and he never told us what his worst fear actually is". When asked again, Sam says "my worst fears are that...we cause significant harm to the world"[2]

I think that's a lie of omission - Sam's worst fear is the extinction of humanity or perhaps some S-risk scenario, and he did not say that. Does this violate Section 1001 of the United States Code? I don't know. I think it'd be good to get more concrete estimates from Sam of what he thinks the risks are, and I think as the CEO of the world's leading AI company, it'd be something the US Senate should interested in.

In a recent podcast, Dario Amodei implied his p(doom) was between 10-25%, though it's not a clear prediction with a timeframe, resolution criteria, or whether it's conditional or unconditional on Anthropic's actions (listen and judge for yourself). But Dario's been aware of xRisk for a long time, so this isn't a new belief, but it didn't make it into his senate testimony either. If my p(doom) was that high, and I was leading one of the world's top AI labs, I'd be making a lot more noise. But, instead, they've recently announced partnerships with and taken money from Google and Amazon with the intent to develop frontier AI and, presumably, commercialise it.

Now, many people see this and go "that doesnt' add up". It's what leads many critics of AI x-risk to go "this is just a galaxy brained marketing strategy", and I hope this comment has given some clarification as to why they think so.

Connor, on the other hand, has been consistently against scaling up AI systems and has argued for this in public appearances on the news, on podcasts, and in public hearings. I don't agree with his pessimism or framing of the positions, but he has authenticity regarding his position. It's what people expect people to act like if they think the industry they work in has a significant chance of destroying the world, whereas I can't make sense of Amodei's position, for example, without guessing additional beliefs or information which he may or may not actually have.

So, tying it back to the intense scepticism of RSPs from the Connor/ctrl.ai/Conjecture side of the AI Safety space is because given a track record like the above they don't trust that RSPs are actually a mechanism that will reduce the risk from developing unaligned AGI. And I can see why they'd be sceptical - if I'd just accepted the promise of $4 billion from Amazon, that's a big incentive to put your thumb on the scale and say "we're not at ASL-3 yet", for example.

The alternative might be titotal's explanation that Sam/Dario have much lower p(doom) than Connor, and I think that's probably true. But that might not be true for every employee at these organisations, and even then I think 10% is a high enough p(doom) that one ought to report that estimate fully and faithfully to your democratically elected government if they ask you about it.

I hope this makes the 'RSP sceptical' position more clear to people, though I don't hold it myself (at least to the same extreme as Connor and Gabe do here), and I'm open to corrections from either Connor or Gabe if you think I've interpreted what you're trying to say incorrectly.

  1. ^

    I can't find a recording or transcript of what the oath was, but I assume it included a promise to tell the whole truth

  2. ^

    The hearing is here, this exchange kicks off around 35:38

  • ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.
  • People at AGI labs working in the safety teams echo this message publicly.

Genuine question: Do the majority of people in open Phil, or at AGI safety labs, actually believe the statement above? 

I'm all for honesty, but it seems like an alternate explanation for a lot of people not saying the statement above is that they don't believe that the statement is true. I worry that setting this kind of standard will just lead to accusations that genuine disagreement is secret dishonesty.  

Yes, would be good to hear more from them directly. I'm disappointed that OpenPhil have not given any public update on their credences on the two main questions their AI Worldviews Contest sought to address.

Your question reminded me of the following quote:

It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It

Maybe here we are talking about an alternative version of this:

It Is Difficult to Get a Man to Say Something When His Salary (or Relevance, Power, Influence, Status) Depends Upon Him Not Saying It

'There are very few people that we have consistently seen publicly call for a stop to AGI progress. The clearest ones are Eliezer’s “Shut it All Down” and Nate’s “Fucking stop”.

The loudest silence is from Paul Christiano, whose RSPs are being used to safety-wash scaling.'

I'm not blaming the authors for this, as they couldn't know, but literally today, on this forum, Paul Christiano has publicly expressed clear beliefs about whether a pause would be a good idea, and why he's not advocating for one directly: https://forum.effectivealtruism.org/posts/cKW4db8u2uFEAHewg/thoughts-on-responsible-scaling-policies-and-regulation

Christiano: "If the world were unified around the priority of minimizing global catastrophic risk, I think that we could reduce risk significantly further by implementing a global, long-lasting, and effectively enforced pause on frontier AI development—including a moratorium on the development and production of some types of computing hardware. The world is not unified around this goal; this policy would come with other significant costs and currently seems unlikely to be implemented without much clearer evidence of serious risk. 

A unilateral pause on large AI training runs in the West, without a pause on new computing hardware, would have more ambiguous impacts on global catastrophic risk. The primary negative effects on risk are leading to faster catch-up growth in a later period with more hardware and driving AI development into laxer jurisdictions.

However, if governments shared my perspective on risk then I think they should already be implementing domestic policies that will often lead to temporary pauses or slowdowns in practice. For example, they might require frontier AI developers to implement additional protective measures before training larger models than those that exist today, and some of those protective measures may take a fairly long time (such as major improvements in risk evaluations or information security). Or governments might aim to limit the rate at which effective training compute of frontier models grows, in order to provide a smoother ramp for society to adapt to AI and to limit the risk of surprises."


Curated and popular this week
Relevant opportunities