781 karmaJoined


Research Fellow at the Center for AI Safety



Lucius Caviola's post mentioned the "happy servants" problem:

If AIs have moral patienthood but don’t desire autonomy, certain interpretations of utilitarian theories would consider it morally justified to keep them captive. After all, they would be happy to be our servants. However, according to various non-utilitarian moral views, it would be immoral to create “happy servant” AIs that lack a desire for autonomy and self-respect (Bales, 2024; Schwitzgebel & Garza, 2015). As an intuition pump, imagine we genetically engineered a group of humans with the desire to be our servants. Even if they were happy, it would feel wrong.

This issue also mentioned as a key research question in Digital Minds: Importance and Key Research Questions by Mogensen, Saad, and Butlin.

This is just a note to flag that there's also some discussion of this issue in Carl Shulman's recent 80,000 podcast episode. (cf. also my post about that episode.)

Rob Wiblin: Yeah. The idea of training a thinking machine to just want to take care of you and to serve your every whim, on the one hand, that sounds a lot better than the alternative. On the other hand, it does feel a little bit uncomfortable. There’s that famous example, the famous story of the pig that wants to be eaten, where they’ve bred a pig that really wants to be farmed and consumed by human beings. This is not quite the same, but I think raises some of the same discomfort that I imagine people might have at the prospect of creating beings that enjoy subservience to them, basically. To what extent do you think that discomfort is justified?

Carl Shulman: So the philosopher Eric Schwitzgebel has a few papers on this subject with various coauthors, and covers that kind of case. He has a vignette, “Passion of the Sun Probe,” where there’s an AI placed in a probe designed to descend into the sun and send back telemetry data, and then there has to be an AI present in order to do some of the local scientific optimisation. And it’s made such that, as it comes into existence, it absolutely loves achieving this mission and thinks this is an incredibly valuable thing that is well worth sacrificing its existence.

And Schwitzgebel finds that his intuitions are sort of torn in that case, because we might well think it sort of heroic if you had some human astronaut who was willing to sacrifice their life for science, and think this is achieving a goal that is objectively worthy and good. And then if it was instead the same sort of thing, say, in a robot soldier or a personal robot that sacrifices its life with certainty to divert some danger that maybe had a 1-in-1,000 chance of killing some human that it was protecting. Now, that actually might not be so bad if the AI was backed up, and valued its backup equally, and didn’t have qualms about personal identity: to what extent does your backup carry on the things you care about in survival, and those sorts of things.

There’s this aspect of, do the AIs pursue certain kinds of selfish interests that humans have as much as we would? And then there’s a separate issue about relationships of domination, where you could be concerned that, maybe if it was legitimate to have Sun Probe, and maybe legitimate to, say, create minds that then try and earn money and do good with it, and then some of the jobs that they take are risky and whatnot. But you could think that having some of these sapient beings being the property of other beings, which is the current legal setup for AI — which is a scary default to have — that’s a relationship of domination. And even if it is consensual, if it is consensual by way of manufactured consent, then it may not be wrong to have some sorts of consensual interaction, but can be wrong to set up the mind in the first place so that it has those desires.

And Schwitzgebel has this intuition that if you’re making a sapient creature, it’s important that it wants to survive individually and not sacrifice its life easily, that it has maybe a certain kind of dignity. So humans, because of our evolutionary history, we value status to differing degrees: some people are really status hungry, others not as much. And we value our lives very much: if we die, there’s no replacing that reproductive capacity very easily.

There are other animal species that are pretty different from that. So there are solitary species that would not be interested in social status in the same kind of way. There are social insects where you have sterile drones that eagerly enough sacrifice themselves to advance the interests of their extended family.

Because of our evolutionary history, we have these concerns ourselves, and then we generalise them into moral principles. So we would therefore want any other creatures to share our same interest in status and dignity, and then to have that status and dignity. And being one among thousands of AI minions of an individual human sort offends that too much, or it’s too inegalitarian. And then maybe it could be OK to be a more autonomous, independent agent that does some of those same functions. But yeah, this is the kind of issue that would have to be addressed.

Rob Wiblin: What does Schwitzgebel think of pet dogs, and our breeding of loyal, friendly dogs?

Carl Shulman: Actually, in his engagement with another philosopher, Steve Petersen — who takes the contrary position that it can be OK to create AIs that wish to serve the interests or objectives that their creators intended — does raise the example of a sheepdog really loves herding. It’s quite happy herding. It’s wrong to prevent the sheepdog from getting a chance to herd. I think that’s animal abuse, to always keep them inside or not give them anything that they can run circles around and collect into clumps. And so if you’re objecting with the sheepdog, it’s got to be not that it’s wrong for the sheepdog to herd, but it’s wrong to make the sheepdog so that it needs and wants to herd.

And I think this kind of case does make me suspect that Schwitzgebel’s position is maybe too parochial. A lot of our deep desires exist for particular biological reasons. So we have our desires about food and external temperature that are pretty intrinsic. Our nervous systems are adjusted until our behaviours are such that it keeps our predicted skin temperature within a certain range; it keeps predicted food in the stomach within a certain range.

And we could probably get along OK without those innate desires, and then do them instrumentally in service to some other things, if we had enough knowledge and sophistication. The attachment to those in particular seems not so clear. Status, again: some people are sort of power hungry and love status; others are very humble. It’s not obvious that’s such a terrible state. And then on the front of survival that’s addressed in the Sun Probe case and some of Schwitzgebel’s other cases: if minds that are backed up, the position that having all of my memories and emotions and whatnot preserved less a few moments of recent experience, that’s pretty good to carry on, that seems like a fairly substantial point. And the point that the loss of a life that is quickly physically replaced, that it’s pretty essential to the badness there, that the person in question wanted to live, right?

Rob Wiblin: Right. Yeah.

Carl Shulman: These are fraught issues, and I think that there are reasons for us to want to be paternalistic in the sense of pushing that AIs have certain desires, and that some desires we can instil that might be convenient could be wrong. An example of that, I think, would be you could imagine creating an AI such that it willingly seeks out painful experiences. This is actually similar to a Derek Parfit case. So where parts of the mind, maybe short-term processes, are strongly opposed to the experience that it’s undergoing, while other processes that are overall steering the show keep it committed to that.

And this is the reason why just consent, or even just political and legal rights, are not enough. Because you could give an AI self-ownership, you could give it the vote, you could give it government entitlements — but if it’s programmed such that any dollar that it receives, it sends back to the company that created it; and if it’s given the vote, it just votes however the company that created it would prefer, then these rights are just empty shells. And they also have the pernicious effect of empowering the creators to reshape society in whatever way that they wish. So you have to have additional requirements beyond just, is there consent?, when consent can be so easily manufactured for whatever.

True, I should have been more precise—by consciousness I meant phenomenal consciousness. On your (correct) point about Kammerer being open to consciousness more generally, here's Kammerer (I'm sure he's made this point elsewhere too):

Illusionists are not committed to the view that our introspective states (such as the phenomenal judgment “I am in pain”) do not reliably track any real and important psychological property. They simply deny that such properties are phenomenal, and that there is something it is like to instantiate them. Frankish suggests calling such properties “quasi-phenomenal properties” (Frankish 2016, p. 15)—purely physico-functional and non-phenomenal properties which are reliably tracked (but mischaracterized as phenomenal) by our introspective mechanisms. For the same reason (Frankish 2016, p. 21), illusionists are not committed to the view that a mature psychological science will not mention any form of consciousness beyond, for example, access-consciousness. After all, quasi-phenomenal consciousness may very well happen to have interesting distinctive features from the point of view of a psychologist.

But on your last sentence

He could still believe moral status should be about consciousness, just not phenomenal consciousness.

While that position is possible, Kammerer does make it clear that he does not hold it, and thinks it is untenable for similar reasons that he thinks moral status is not about phenomenal consciousness. (cf. p. 8)

It's an excellent question! There are two ways to go here:

  1. keep the liberal notion of preferences/desires, one that seems like it would apply to plants and bacteria, and conclude that moral patienthood is very widespread indeed. As you note, few people go for this view (I don't either). But you can find people bumping up against this view:

Korsgaard: "The difference between the plant's tropic responses and the animal's action might even, ultimately, be a matter of degree. In that case, plants would be, in a very elementary sense, agents, and so might be said to have a final good." (quoted in this lecture on moral patienthood by Peter Godfrey-Smith.

  1. Think that for patienthood what's required is a more more demanding notion of "preference", such that plants don't satisfy it but dogs and people do. And there are ways of making "preference" more demanding besides "conscious preference". You might think that morally-relevant preferences/desires have to have some kind of complexity, or some kind of rational structure, or something like that. That's of course quite hand-wavy—I don't think anyone has a really satisfying account.

Here's a remark from Francois Kammerer, who thinks that moral status cannot be about consciousness (which he thinks does not exist), argues that it should be about desire, and who lays out nicely the 'scale' of desires of various levels of demandingness:

On the one extreme, we can think of the most basic way of desiring: a creature can value negatively or positively certain state of affairs, grasped in the roughest way through some basic sensing system. On some views, entities as simple as bacteria can do that (Lyon & Kuchling, 2021). On the other hand, we can think of the most sophisticated ways of desiring. Creatures such as, at least, humans, can desire for a thing to thrive in what they take to be its own proper way to thrive and at the same time desire their own desire for this thing to thrive to persist – an attitude close to what Harry Frankfurt called “caring” (Frankfurt, 1988). Between the two, we intuitively admit that there is some kind of progressive and multidimensional scale of desires, which is normatively relevant – states of caring matter more than the most basic desires. When moving towards an ethic without sentience, we would be wise to ground our ethical system on conceptsthat we will treat as complex and degreed, and even more as “complexifiable” as the study of human, animal and artificial minds progresses.


small correction that Jonathan Birch is at LSE, not QMUL. Lars Chittka, the co-lead of the project, is at QUML

You’re correct, Fai - Jeff is not on a co-author on the paper. The other participants - Patrick Butlin, Yoshua Bengio, and Grace Lindsay - are.

What's something about you that might surprise people who only know your public, "professional EA" persona?

I suggest that “why I don’t trust pseudonymous forecasters” would be a more appropriate title. When I saw the title I expected an argument that would apply to all/most forecasting, but this worry is only about a particular subset


Unsurprisingly, I agree with a lot of this! It's nice to see these principles laid out clearly and concisely:

You write

AI welfare is potentially an extremely large-scale issue. In the same way that the invertebrate population is much larger than the vertebrate population at present, the digital population has the potential to be much larger than the biological population in the future.

Do you know of any work that estimates these sizes? There are various places that people have estimated the 'size of the future' including potential digital moral patients in the long run, but do you know of anything that estimates how many AI moral patients there could be by (say) 2030?

Hi Timothy! I agree with your main claim that "assumptions [about sentience] are often dubious as they are based on intuitions that might not necessarily ‘track’ sentience", shaped as they are by potentially unreliable evolutionary and cultural factors. I also think it's a very important point! I commend you for laying it out in a detailed way.

I'd like to offer a piece of constructive criticism if I may. I'd add more to the piece that answers, for the reader:

  1. what kind of piece am I reading? What is going to happen in it?
  2. why should I care about the central points? (as indicated, I think there are many reasons to care, and could name quite a few myself)
  3. how does this piece relate to what other people say about this topic?

While getting 'right to the point' is a virtue, I feel like more framing and intro would make this piece more readable, and help prospective readers decide if it's for them.

[meta-note: if other readers disagree, please do of course vote 'disagree' on this comment!]

Hi Brian! Thanks for your reply. I think you're quite right to distinguish between your flavor of panpsychism and the flavor I was saying doesn't entail much about LLMs. I'm going to update my comment above to make that clearer, and sorry for running together your view with those others.

Load more