Does AI risk “other” the AIs?

Joe_Carlsmith

Does AI risk “other” the AIs?

Comments

More from the author

137

Leaving Open Philanthropy, going to Anthropic

Joe_Carlsmith·8mo ago·22m read

Fake thinking and real thinking

Joe_Carlsmith·1y ago·Curated 1y ago·46m read

237

Killing the ants

Joe_Carlsmith·5y ago·9m read

Curated and popular this week

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 5d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

135

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·6d ago·4m read

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·1d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Recent opportunities to take action

$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects

Matt Brooks·2d ago·3m read

135

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·6d ago·4m read

Build a flourishing EA group at the University of Toronto

Joseph Kostousov, Sophia Wan (navarhontes)·1w ago·1m read

Matthew_Barnett

(Cross-posted from LW)

I think there's an additional element of Hanson's argument that is both likely true and important, and as far as I can tell unaddressed in your post. When Hanson talks about "othering" AIs, he's often talking about the stuff you mentioned — projecting a propensity to do bad things onto the AIs — but he's also saying that future AIs won't necessarily form a natural, unified coalition against us. In other words, he's rejecting a form of out-group homogeneity used to portray AIs.

As an analogy, among humans, the class of "smart people" are not a natural coalition, even though they could in-principle get together and defeat all of the non-smart people in a one-on-one fight. Why don't smart people do that? Well, one reason is that smart people don't usually see themselves as being part of a coherent identity that includes all the other smart people. Poetically, there isn't much class consciousness among smart people as a unified group. They have diverse motives and interests that wouldn't be furthered much by attempting to join such a front. The argument Hanson makes is that AIs will also not form a natural, unified front against humans in the same sense. The relevant boundaries in future conflicts over power will likely be drawn across other lines.

The idea that AIs won't form a natural coalition has a variety of implications for the standard AI risk arguments. Most notably, it undermines the single-agent model that underlies many takeover stories and arguments for risk. More specifically, if AIs won't form a natural coalition, then,

We shouldn't model a bunch of AIs (or even a bunch of copies of the same model) as all basically being "a single AI". For instance, in Ajeya's story about AI takeover, she alternates between calling a single copy of an AI model an entity named "Alex" and calling the entire coalition of copies of the same model "Alex". However, in the absence of any unified coalition among the AIs, this conflation makes much less sense.
It will be hard for AIs to coordinate a violent takeover for basically the same reason why it's hard for humans to coordinate a violent takeover. In order to coordinate a violent plan, you generally need to alert other people about your intentions. However, since other people might not agree with your intentions, each person you inform is a chance for your plan to be exposed, and thus ended. This would apply to AIs who try to inform other AIs about their intentions.
If an AI is trained to catch deception in other AIs, there isn't a strong reason to assume that it will defect from its training and join the other AI in deceiving against the humans, because it won't necessarily see itself as "an AI" fighting against "the humans".

In my opinion, these examples only scratch the surface of the ways in which your story of AI might depart from the classic AI risk analysis if you don't think AIs will form a natural, unified coalition. When you start to read standard AI risk stories (including from people like Ajeya who do not agree with Eliezer on a ton of things), you can often find the assumption that "AIs will form a natural, unified coalition" written all over it.

There's also a bit in the original quote where Robin accuses the AI risk discourse of wanting to use "genocide, slavery, lobotomy, or mind-control" to control the AIs. But this is extra charged (and I don't know where Robin got the genocide bit), so I want to set it aside for a moment. ↩︎
Though: how alien is too alien? Hanson doesn't tend to say. And my sense is that he thinks, too, that even unadulterated Moloch will lead to a complex, diverse, and interesting ecosystem rather than a monoculture. (Though: is a diverse ecosystem of different office-supplies all that much of an improvement?) And also: that this ecosystem will retain various path-dependent "legacies" of the present. (Though: will they be legacies we care about?) ↩︎
Though, importantly, contemporary AI training does not look like creating a mind from scratch, and raises much more serious "brain-washing" type concerns. ↩︎
And often glad, too, that the process wasn't altered in any tiny way at all, lest their existence be canceled by the non-identity problem. But setting that aside. ↩︎

Does AI risk “other” the AIs?

Does AI risk “other” the AIs?

Some basic points up front

What exactly is Hanson's critique?

Will the AIs be more similar to us than AI risk expects?

Will future humans be more different from us than AI risk expects?