(Cross-posted from my website. Podcast version here, or search for "Joe Carlsmith Audio" on your podcast app.
This essay is part of a series I'm calling "Otherness and control in
the age of AGI." I'm hoping that the individual essays can be read
fairly well on their own, but see here for a brief summary of the essays
that have been released thus far.)
In my last essay, I discussed the way in which what I've called "deep
atheism" (that is, a fundamental mistrust towards both "Nature" and
"bare intelligence") can prompt an aspiration to exert extreme levels of
control over the universe; I highlighted the sense in which both humans
and AIs, on Yudkowsky's AI risk narrative, are animated by this sort
of aspiration; and I discussed some ways in which our civilization has
built up wariness around control-seeking of this kind. I think we should
be taking this sort of wariness quite seriously.
In this spirit, I want to look, in this essay, at Robin Hanson's
critique of the AI risk discourse – a critique especially attuned the
way in which this discourse risks control-gone-wrong. In particular,
I'm interested in Hanson's accusation that AI risk "others" the AIs (see
e.g.
here,
here,
and
here).
Hearing the claim that AIs may eventually differ greatly from us, and
become very capable, and that this could possibly happen fast, tends
to invoke our general fear-of-difference heuristic. Making us afraid
of these "others" and wanting to control them somehow ... "Hate" and
"intolerance" aren't overly strong terms for this attitude.[1]
Hanson sees this vice as core to the disagreement ("my best one-factor
model to explain opinion variance here is this: some of us 'other' the
AIs
more").
And he invokes a deep lineage of liberal ideals in opposition.
I think he's right to notice a tension in this vicinity. AI risk is,
indeed, about fearing some sort of uncontrolled other. But is that
always the bad sort of "othering?"
Some basic points up front
Well, let's at least avoid basic mistakes/misunderstandings. For one:
hardcore AI risk folks like Yudkowsky are generally happy to care about
AI welfare – at least if welfare means something like "happy
sentience." And pace some of Hanson's accusations of bio-chauvinism,
these folks are extremely not fussed about the fact that AI minds
are made of silicon (indeed: come now). Of course, this isn't to say
that AI welfare (and AI rights) issues don't get complicated (see e.g.
here
and here for a
glimpse of some of the complications), or that humanity as a whole will get the
"digital minds matter" stuff right. Indeed, I worry that we will get it
horribly wrong – and I do think that the AI risk discourse
under-attends to some of the tensions. But species-ism 101 (201?) – e.g., "I don't care about digital suffering" – isn't AI risk's vice.
For two: clearly some sorts of otherness warrant some sorts of fear.
For example: maybe you, personally, don't like to murder. But Bob, well:
Bob is different. If Bob gets a bunch of power, then: yep, it's OK to
hold your babies close. And often OK, too, to try to "control" Bob into
not-killing-your-babies. Cf, also, the discussion of
getting-eaten-by-bears in the first essay. And the Nazis, too, were
different in their own way. Of course, there's a long and ongoing
history of mistaking "different" for "the type of different that wants
to kill your babies." We should, indeed, be very wary. But liberal
tolerance has never been a blank check; and not all fear is hatred.
Indeed, many attempts to diagnose the ethical mistake behind various
canonical difference-related vices (racism, sexism, species-ism, etc)
reveal a certain shallowness of commitment to difference-per-se. In
particular: such vices are often understood as missing some underlying
sameness – for example, "common humanity," "persons," "sentient
beings," "children of the universe," and so forth. And calls for social
harmony often recapitulate this structure: we might be different in X
ways, but (watch for the but) we have blah in common. This isn't to
say that ethical commitment to a less adulterated difference-per-se is
impossible. But one wants, generally, a story about why it's OK to eat
apples but not babies; why Furbies programmed to say "Biden" shouldn't
get the vote; and why you can own a laptop but not a slave. And such a
story requires differences. The apple, the Furby, the laptop must be
importantly "Other" relative to e.g. human adults. They must be
outside some circle. Ethics is always drawing lines.

ChatGPT wouldn’t let the furby be voting for Biden in particular…
What exactly is Hanson's critique?
With these basics in mind, then, what exactly is Hanson's "other-ing the
AIs" critique? It has many facets, but here's one attempt at
reconstruction:
-
People worried about AI risk are much more scared of future AIs than
future humans, because they think that:
a. AIs are more likely to do stuff like murder all the humans,
overthrow the government, and violate property rights, and
b. AIs are more likely to have values pursuit of which will result
in a ~zero-value future more generally.
-
But in fact, neither of these things are true.
-
So greater fear of future AIs relative to future humans is best
understood as a kind of arbitrary, in-group partiality – i.e.,
="othering the AIs."
Clearly, (2) is where the action is, here. Whence such a departure from
Yudkowsky's nightmare? We can divide Hanson's justification into two
components. The first argues that future AIs will be more similar to us
than the AI risk story suggests. The second argues that future humans,
by default, will be more different.
Will the AIs be more similar to us than AI risk expects?
Let's start with "AIs will be more similar to us than AI risk expects."
Above I mentioned propensity-to-murder as a classic form of otherness
that it's OK to fear/control. And we often put "violating property
rights" and "overthrowing the government" in a similar bucket.
Presumably Hanson is not OK with AIs doing this stuff? But he doesn't
think they will – or at least, not more than humans will. And why not?
It's some combination of (i) "AIs would be designed and evolved to think
and act roughly like humans, in order to fit smoothly into our many
roughly-human-shaped social roles," and (ii) like humans, they'll be
constrained by legal and social incentives. And even setting aside
violence, Hanson generally appeals to (i) in response to objections like
"so ... are you actually fine with future agents tiling the universe
with paperclips"? The AI values, says Hanson, won't be that
alien.[2]
Big if true. But is it true? I won't dive in much here, except to say
that this aspect of Hanson's story generally strikes me as
under-argued. In particular, I think Hanson moves too quickly from "the
AIs will be trained to fit into the human economy" to "the AIs will have
values relevantly similar to human values," and that he takes too much
for granted that legal and social incentives protecting humans from
being murdered/violently-disempowered will continue to bind adequately
if the AIs have most of the hard power. In this, I think, his argument
for (2) misses a lot of the core doom concern.
Will future humans be more different from us than AI risk expects?
But I think the other aspect of his argument for (2) – namely, "future
humans will be more different from us than AI risk expects" – is more
interesting. Here, Hanson's basic move is to question the "alignment" of
the default human future, even absent AI. That is: human values have
changed dramatically over time – and not, argues Hanson, centrally in
response to a process of rational reflection, but rather in response to
other sorts of competition, contingency, and
economic/social/technological change. And even absent AIs, we should
expect this process to only continue and intensify, such that humans ten
generations from now (or: after ten doublings of GDP, or whatever) would
have values very different from our own – and not from having
done-more-philosophy.
Now, we can debate the empirics of past and future, here (though what
processes of values-change we endorse as "rational" may not be entirely
empirical).
Indeed, I think Hanson may be over-estimating how horrified the ancient
Greeks, or the hunter-gatherers, would be on reflection by the values of
the present-day world – and this even setting aside our material
abundance. And I might disagree, too, about exactly how different the
values of future humans would be, given various possible "futures
without AI" (though it's not an especially clear-cut category).

How pissed would they be, on reflection, about present-day values? (Image source here.)
Still, I think Hanson is poking at something important and
uncomfortable. In particular: suppose we grant him the empirics.
Suppose, indeed, that even without AI, the default values of future
humans would "drift" until they were as paperclippers relative to us, such that
the world they create would be utterly valueless from our perspective.
What follows? Well, umm, if you care about the future having value ...
then what follows is a need to exert more control. More yang. It is,
indeed, the "good future' part of the alignment problem all over again
(though not the "notkilleveryone" part).
Of course, trying to make sure that future humans aren't paperclippers
doesn't mean locking in your specific, object-level values right now
(you still want to leave room for moral progress you'd
endorse-on-reflection). Nor, pace some of Hanson's language, does it
mean "brainwashing" or "lobotomizing" the future people. If a boulder is
rolling towards a button that will create Sally, a paperclipper, and you
divert it towards a button that will create Bill, a deontologist, you're
not brainwashing or lobotomizing Sally.[3] (Confusions in this vein
are a classic
issue
for reasoning about your impact on future people – and Hanson's
analysis is not immune.)
Still, though: are you playing too much God, or too-Stalin? Who are you
to divert Nature's boulder – that oh-so-defined "default"? And Sally,
at least, is pissed. Indeed, Hanson reminds us: aren't we glad that the
ancient greeks didn't try to divert the future to replace us with
people more like them? (Well, who knows how much they tried. But good
thing they didn't succeed! Though, wait: how much did they succeed?).
But the question – or at least, the first-pass question – isn't
whether we're glad that the Greeks didn't control our
values-on-reflection to be more greek. Indeed, basically everyone who
gets created with some set of values-on-reflection is glad that the
process that created them didn't push towards agents with different
values instead.[4] If, in some horrible mistake, we set in motion a
future filled with suffering-maximizers, they, too, will be glad we
didn't "control" the values of the future more (because this would’ve led to a future-with-less-suffering). But from our perspective, it's not a good test.
Rather, the first-pass test, re:
lessons-from-the-ancient-greeks-about-controlling-future-values, is
whether the Greeks would be glad, on reflection, that they didn't
make our values more greek. And one traditional answer, here, is yes. If
we could sit down with Aristotle, and explain to him why actually,
slavery is wrong, and that no one is by nature someone else's
property,
then our hearts and his would sing in harmony. That is, on this story,
if Aristotle had somehow prevented future people from abolishing
slavery, then he would've been making a mistake by his own lights – preventing the flower-he-loves from blooming, via the march of Reason,
in history's hand.

“A master (right) and his slave (left) in a phlyax play, Silician red-figured calyx-krater, c. 350 BC–340 BC.” (Image source here.)
But this isn't the central story Hanson wants to tell. Rather, when
Hanson talks about values changing over time, he specifically wants to
deny that Reason has much to do with it. That is, it sounds a lot like
Hanson wants to say both that the ancient Greeks would be horrified even on reflection by our values, and that we should take our cues from
the ancient Greeks in deciding how much control to try to exert over the
values of future people. And at a high level, that sounds like a recipe
for, well, being horrified even on reflection by the values of future
people. Remind me why that's good again? Indeed, on any meta-ethics
where the normative truth would be revealed to our reflection, we just
stipulated that it's horrifying.
Now, we might try to construct Hanson's story in other, more complicated
ways (see e.g.
here
for one attempt). But I want to stay, for now, with the dialectic that
this version of his view creates, which I think is plenty interesting.
In particular: on the one hand, we just stipulated that absent control,
the values of future humans would be horrifying/meaningless to us, even
on reflection and full understanding. On the other hand, some sort of
discomfort in trying to control the values of future humans persists (at
least for me). I think Hanson is right to notice it – and to notice,
too, its connection to trying to control the values of the AIs. I think
the AI alignment discourse should, in fact, prompt this discomfort – and that we should be serious about understanding, and avoiding, the
sort of yang-gone-wrong that it's trying to track.
Indeed, I think when we bring certain other Yudkowskian vibes into view – and in particular, vibes related to the "fragility of value,"
"extremal Goodhart," and "the tails come apart" – this discomfort
should deepen yet further. I'll turn to this in the next essay.
(Cross-posted from LW)
I think there's an additional element of Hanson's argument that is both likely true and important, and as far as I can tell unaddressed in your post. When Hanson talks about "othering" AIs, he's often talking about the stuff you mentioned — projecting a propensity to do bad things onto the AIs — but he's also saying that future AIs won't necessarily form a natural, unified coalition against us. In other words, he's rejecting a form of out-group homogeneity used to portray AIs.
As an analogy, among humans, the class of "smart people" are not a natural coalition, even though they could in-principle get together and defeat all of the non-smart people in a one-on-one fight. Why don't smart people do that? Well, one reason is that smart people don't usually see themselves as being part of a coherent identity that includes all the other smart people. Poetically, there isn't much class consciousness among smart people as a unified group. They have diverse motives and interests that wouldn't be furthered much by attempting to join such a front. The argument Hanson makes is that AIs will also not form a natural, unified front against humans in the same sense. The relevant boundaries in future conflicts over power will likely be drawn across other lines.
The idea that AIs won't form a natural coalition has a variety of implications for the standard AI risk arguments. Most notably, it undermines the single-agent model that underlies many takeover stories and arguments for risk. More specifically, if AIs won't form a natural coalition, then,
In my opinion, these examples only scratch the surface of the ways in which your story of AI might depart from the classic AI risk analysis if you don't think AIs will form a natural, unified coalition. When you start to read standard AI risk stories (including from people like Ajeya who do not agree with Eliezer on a ton of things), you can often find the assumption that "AIs will form a natural, unified coalition" written all over it.
I'm just noting that you are assuming that we have many robustly aligned AI's, in which case I agree that take-over seems less likely.
Absent this assumption, I don't think that "AIs will form a natural, unified coalition" is the necessary outcome, but it seems reasonable that the other outcomes will look functionally the same for us.