Does AI risk “other” the AIs?

Joe_Carlsmith

Comments 3

Sorted by

New & upvoted

(Cross-posted from LW)

I think there's an additional element of Hanson's argument that is both likely true and important, and as far as I can tell unaddressed in your post. When Hanson talks about "othering" AIs, he's often talking about the stuff you mentioned — projecting a propensity to do bad things onto the AIs — but he's also saying that future AIs won't necessarily form a natural, unified coalition against us. In other words, he's rejecting a form of out-group homogeneity used to portray AIs.

As an analogy, among humans, the class of "smart people" are not a natural coalition, even though they could in-principle get together and defeat all of the non-smart people in a one-on-one fight. Why don't smart people do that? Well, one reason is that smart people don't usually see themselves as being part of a coherent identity that includes all the other smart people. Poetically, there isn't much class consciousness among smart people as a unified group. They have diverse motives and interests that wouldn't be furthered much by attempting to join such a front. The argument Hanson makes is that AIs will also not form a natural, unified front against humans in the same sense. The relevant boundaries in future conflicts over power will likely be drawn across other lines.

The idea that AIs won't form a natural coalition has a variety of implications for the standard AI risk arguments. Most notably, it undermines the single-agent model that underlies many takeover stories and arguments for risk. More specifically, if AIs won't form a natural coalition, then,

We shouldn't model a bunch of AIs (or even a bunch of copies of the same model) as all basically being "a single AI". For instance, in Ajeya's story about AI takeover, she alternates between calling a single copy of an AI model an entity named "Alex" and calling the entire coalition of copies of the same model "Alex". However, in the absence of any unified coalition among the AIs, this conflation makes much less sense.
It will be hard for AIs to coordinate a violent takeover for basically the same reason why it's hard for humans to coordinate a violent takeover. In order to coordinate a violent plan, you generally need to alert other people about your intentions. However, since other people might not agree with your intentions, each person you inform is a chance for your plan to be exposed, and thus ended. This would apply to AIs who try to inform other AIs about their intentions.
If an AI is trained to catch deception in other AIs, there isn't a strong reason to assume that it will defect from its training and join the other AI in deceiving against the humans, because it won't necessarily see itself as "an AI" fighting against "the humans".

In my opinion, these examples only scratch the surface of the ways in which your story of AI might depart from the classic AI risk analysis if you don't think AIs will form a natural, unified coalition. When you start to read standard AI risk stories (including from people like Ajeya who do not agree with Eliezer on a ton of things), you can often find the assumption that "AIs will form a natural, unified coalition" written all over it.

Nick K.

I'm just noting that you are assuming that we have many robustly aligned AI's, in which case I agree that take-over seems less likely.

Absent this assumption, I don't think that "AIs will form a natural, unified coalition" is the necessary outcome, but it seems reasonable that the other outcomes will look functionally the same for us.

Vasco Grilo🔸

Nice post, Joe.

For two: clearly some sorts of otherness warrant some sorts of fear. For example: maybe you, personally, don't like to murder. But Bob, well: Bob is different. If Bob gets a bunch of power, then: yep, it's OK to hold your babies close. And often OK, too, to try to "control" Bob into not-killing-your-babies. Cf, also, the discussion of getting-eaten-by-bears in the first essay. And the Nazis, too, were different in their own way. Of course, there's a long and ongoing history of mistaking "different" for "the type of different that wants to kill your babies." We should, indeed, be very wary. But liberal tolerance has never been a blank check; and not all fear is hatred.

I think the strength of this point may be misleading. The word "murder" usually has a negative connotation in language, so if one hears that "a murderer got lots of power", then one should reasonably expect more negative stuff. However, although killing humans is almost always bad, it is not necessarily bad. For example, I think it would be totally fine to kill a terrorist to prevent the death of e.g. 1 M people. To assess the value of a given action, we have to ask what is better from the point of view of the universe. Once one does that, AI killing all humans is no longer necessarily bad, but it will also not be good for free.

Comments

More from the author

137

Leaving Open Philanthropy, going to Anthropic

Joe_Carlsmith·8mo ago·22m read

Fake thinking and real thinking

Joe_Carlsmith·1y ago·Curated 1y ago·46m read

239

Killing the ants

Joe_Carlsmith·5y ago·9m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 6d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

How (not) to fundraise from Anthropic staff

Jack Lewars·6d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·4d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·2d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·2d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·1d ago·1m read

Matthew_Barnett

(Cross-posted from LW)

We shouldn't model a bunch of AIs (or even a bunch of copies of the same model) as all basically being "a single AI". For instance, in Ajeya's story about AI takeover, she alternates between calling a single copy of an AI model an entity named "Alex" and calling the entire coalition of copies of the same model "Alex". However, in the absence of any unified coalition among the AIs, this conflation makes much less sense.
It will be hard for AIs to coordinate a violent takeover for basically the same reason why it's hard for humans to coordinate a violent takeover. In order to coordinate a violent plan, you generally need to alert other people about your intentions. However, since other people might not agree with your intentions, each person you inform is a chance for your plan to be exposed, and thus ended. This would apply to AIs who try to inform other AIs about their intentions.
If an AI is trained to catch deception in other AIs, there isn't a strong reason to assume that it will defect from its training and join the other AI in deceiving against the humans, because it won't necessarily see itself as "an AI" fighting against "the humans".

There's also a bit in the original quote where Robin accuses the AI risk discourse of wanting to use "genocide, slavery, lobotomy, or mind-control" to control the AIs. But this is extra charged (and I don't know where Robin got the genocide bit), so I want to set it aside for a moment. ↩︎
Though: how alien is too alien? Hanson doesn't tend to say. And my sense is that he thinks, too, that even unadulterated Moloch will lead to a complex, diverse, and interesting ecosystem rather than a monoculture. (Though: is a diverse ecosystem of different office-supplies all that much of an improvement?) And also: that this ecosystem will retain various path-dependent "legacies" of the present. (Though: will they be legacies we care about?) ↩︎
Though, importantly, contemporary AI training does not look like creating a mind from scratch, and raises much more serious "brain-washing" type concerns. ↩︎
And often glad, too, that the process wasn't altered in any tiny way at all, lest their existence be canceled by the non-identity problem. But setting that aside. ↩︎

Does AI risk “other” the AIs?

Some basic points up front

What exactly is Hanson's critique?

Will the AIs be more similar to us than AI risk expects?

Will future humans be more different from us than AI risk expects?