(Cross-posted from my website.)
“With malice towards none; with charity towards all; with firmness in the right, as God gives us to see the right…”
- Abraham Lincoln
Lincoln’s Second Inaugural (image source here)
I’ve written a series of essays that I’m calling “Otherness and control in the age of AGI.” The series examines a set of interconnected questions about how agents with different values should relate to one another, and about the ethics of seeking and sharing power. They’re old questions – but I think that we will have to grapple with them in new ways as increasingly powerful AI systems come online. And I think they’re core to some parts of the discourse about existential risk from misaligned AI (hereafter, “AI risk”).
The series covers a lot of ground, but I’m hoping the individual essays can be read fairly well on their own. Here’s a brief summary of the essays that have been released thus far (I’ll update it as I release more):
- The first essay, “Gentleness and the artificial Other,” discusses the possibility of “gentleness” towards various non-human Others – for example, animals, aliens, and AI systems. And it also highlights the possibility of “getting eaten,” in the way that Timothy Treadwell gets eaten by a bear in Werner Herzog’s Grizzly Man: that is, eaten in the midst of an attempt at gentleness.
- The second essay, “Deep atheism and AI risk,” discusses what I call “deep atheism” – a fundamental mistrust both towards Nature, and towards “bare intelligence.” I take Eliezer Yudkowsky as a paradigmatic deep atheist, and I highlight the connection between his deep atheism and his concern about misaligned AI. I also connect deep atheism to the duality of “yang” (active, controlling) vs “yin” (receptive, letting-go). A lot of my concern, in the series, is about ways in which certain strands of the AI risk discourse can propel themselves, philosophically, towards ever-greater yang.
- The third essay, “When 'yang' goes wrong,” expands on this concern. In particular: it discusses the sense in which deep atheism can prompt an aspiration to exert extreme levels of control over the universe; it highlights the sense in which both humans and AIs, on Yudkowsky’s narrative, are animated by this sort of aspiration; and it discusses some ways in which our civilization has built up wariness around control-seeking of this kind. I think we should be taking this sort of wariness quite seriously.
- Pursuant to this goal, the fourth essay, “Does AI risk ‘other’ the AIs?”, examines Robin Hanson’s critique of the AI risk discourse – and in particular, his accusation that this discourse “others” the AIs, and seeks too much control over the values that steer the future. I find some aspects of Hanson’s critique uncompelling and implausible, but I do think he’s pointing at a real discomfort.
- The fifth essay, “An even deeper atheism,” argues that this discomfort should deepen yet further when we bring some other Yudkowskian philosophical vibes into view – in particular, vibes related to the “fragility of value,” “extremal Goodhart,” and “the tails come apart.” These vibes, I suggest, create a certain momentum towards deeming more and more agents – including: human agents – “misaligned” in the sense of: not-to-be-trusted to optimize the universe very intensely according to their values-on-reflection. And even if we do not follow this momentum, I think it can remind us of the sense in which AI risk is substantially (though, not entirely) a generalization and intensification of the sort of “balance of power between agents with different values” problem we already deal with in the purely human world – a problem about which our existing ethical and political traditions already offer lots of guidance.
- The sixth essay, “Being nicer than Clippy,” tries to draw on this guidance. In particular, it tries to point at the distinction between a paradigmatically “paperclip-y” way of being, and some broad and hazily-defined set of alternatives that I group under the label “niceness/liberalism/boundaries.” Too often, I think, a simplistic interpretation of the alignment discourse imagines that humans and AIs-with-different-values are both paperclippy at heart – except, only, with a different favored sort of “stuff.” I think this picture neglects core aspects of human ethics that are, themselves, about navigating precisely the sorts of differences-in-values that the possibility of misaligned AI forces us to grapple with. I think that attention to this part of human ethics can help us be better than the paperclippers we fear – not just in what we do with spare resources, but in how we relate to the distribution of power amongst a plurality of value systems more broadly. And I think it may have practical benefits as well, in navigating possible conflicts both between different humans, and between humans and AIs. That said, I don’t think that “niceness/liberalism/boundaries” is enough, on its own, to ensure a good future, or to allay all concern about trying to control that future over-much.
- The seventh essay, “On the abolition of man,” examines another version of that concern: namely, C.S. Lewis’s argument (in his book The Abolition of Man) that attempts by moral anti-realists to influence the values of future people must necessarily be “tyrannical.” I mostly disagree with Lewis — and in particular, I think he makes a number of fairly basic philosophical mistakes related to e.g. compatibilism about freedom, to the difference between creating-Bob-instead-of-Alice vs. brainwashing-Alice-to-make-her-like-Bob, and to the sense in which moral anti-realists can retain their grip on morality. But I do think his discussion points at some difficult questions about the ethics of influencing the values of others, including AIs – questions the essay takes an initial stab at grappling with.
- (More later.)
I’ll also note two caveats about the series as a whole. First, the series is centrally an exercise in philosophy, but it also touches on some issues relevant to the technical challenge of ensuring that the AI systems we build do not kill all humans, and to the empirical question of whether our efforts in this respect will fail. And I confess to some worry about bringing the philosophical stuff too near to the technical/empirical stuff. In particular: my sense is that people are often eager, in discussions about AI risk, to argue at the level of grand ideological abstraction rather than brass-tacks empirics – and I worry that these essays will feed such temptations. This isn’t to say that philosophy is irrelevant to AI risk – to the contrary, part of my hope, in these essays, is to help us see more clearly the abstractions that move and shift underneath certain discussions of the issue. But we should be very clear about the distinction between affiliating with some philosophical vibe and making concrete predictions about the future. Ultimately, it’s the concrete-prediction thing that matters most; and if the right concrete prediction is “advanced AIs have a substantive chance of killing all the humans,” you don’t need to do much philosophy to get upset, or to get to work. Indeed, particularly in AI, it’s easy to argue about philosophical questions over-much. Doing so can be distracting candy, especially if it lets you bounce off more technical problems. And if we fail on certain technical problems, we may well end up dead.
Second: even as the series focuses on philosophical stuff rather than technical/empirical stuff, it also focuses on a very particular strand of philosophical stuff – namely, a cluster of related philosophical assumptions and frames that I associate most centrally with Eliezer Yudkowsky, whose writings have done a lot to frame and popularize AI risk as an issue. And here, too, I worry about pushing the conversation in the wrong direction. That is: I think that Yudkowsky’s philosophical views are sufficiently influential, interesting, and fleshed-out that it’s worth interrogating them in depth. But I don’t want people to confuse their takes on Yudkowsky’s philosophical views (or his more technical/empirical views, or his vibe more broadly) for their takes on the severity of existential risk from AI more generally – and I worry these essays might prompt such a conflation. So please, remember: there are a very wide variety of ways to care about making sure that advanced AIs don’t kill everyone. Fundamentalist Christians can care about this; deep ecologists can care about this; solipsists can care about this; people who have no interest in philosophy at all can care about this. Indeed, in many respects, these essays aren’t centrally about AI risk in the sense of “let’s make sure that the AIs don’t kill everyone” (i.e., “AInotkilleveryoneism”) – rather, they’re about a set of broader questions about otherness and control that arise in the context of trying to ensure that the future goes well more generally. And what’s more, as I note in the series in various places, much of my interrogation of Yudkowsky’s views has to do with the sort of philosophical momentum they create in various directions, rather than with whether Yudkowsky in particular takes them there. In this sense, my concern is not ultimately with Yudkowsky’s views per se, but rather with a sort of abstracted existential narrative that I think Yudkowsky’s writings often channel and express – one that I think different conversations about advanced AI live within to different degrees, and which I hope to help us see more whole.
Thanks to Katja Grace, Ketan Ramakrishnan, Carl Shulman, Anna Salamon, Will MacAskill, and many others over the years for conversation about these topics; and thanks to Carl Shulman for written comments. Some of my thinking and writing on these topics occurred in the context of my work for Open Philanthropy, but I am here speaking only for myself and not for my employer.
There are lots of other risks from AI, too; but I want to focus on existential risk from misalignment, here, and I want the short phrase “AI risk” for the thing I’m going to be referring to repeatedly.