Hide table of contents

In which I curate and relate great takes from 80k

As artificial intelligence advances, we’ll increasingly urgently face the question of whether and how we ought to take into account the well-being and interests of AI systems themselves. In other words, we’ll face the question of whether AI systems have moral status.[1]

In a recent episode of the 80,000 Hours podcast, polymath researcher and world-model-builder Carl Shulman spoke at length about the moral status of AI systems, now and in the future. Carl has previously written about these issues in Sharing the World with Digital Minds and Propositions Concerning Digital Minds and Society, both co-authored with Nick Bostrom. This post highlights and comments on ten key ideas from Shulman's discussion with 80,000 Hours host Rob Wiblin.

1. The moral status of AI systems is, and will be, an important issue (and it might not have much do with AI consciousness)

The moral status of AI is worth more attention than it currently gets, given its potential scale:

Yes, we should worry about it and pay attention. It seems pretty likely to me that there will be vast numbers of AIs that are smarter than us, that have desires, that would prefer things in the world to be one way rather than another, and many of which could be said to have welfare, that their lives could go better or worse, or their concerns and interests could be more or less respected. So you definitely should pay attention to what’s happening to 99.9999% of the people in your society.

Notice that Shulman does not say anything about AI consciousness or sentience in making this case. Here and throughout the interview, Shulman de-emphasizes the question of whether AI systems are conscious, in favor of the question of whether they have desires, preferences, interests. 

Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of agency: preferences, desires, goals, interests, and the like[2]. (This more agency-centric perspective on AI moral status has been discussed in previous posts; for a dip into recent philosophical discussion on this, see the substack post ‘Agential value’ by friend of the blog Nico Delon.)  

Such agency-centric views are especially important for the question of AI moral patienthood, because it might be clear that AI systems have morally-relevant preferences and desires well before it’s clear whether or not they are conscious.

2. While people have doubts about the moral status of AI current systems, they will attribute moral status to AI more and more as AI advances.

At present, Shulman notes, “the general public and most philosophers are quite dismissive of any moral importance of the desires, preferences, or other psychological states, if any exist, of the primitive AI systems that we currently have.” 

But Shulman asks us to imagine an advanced AI system that is behaviorally fairly indistinguishable from a human—e.g., from the host Rob Wiblin. 

But going forward, when we’re talking about systems that are able to really live the life of a human — so a sufficiently advanced AI that could just imitate, say, Rob Wiblin, and go and live your life, operate a robot body, interact with your friends and your partners, do your podcast, and give all the appearance of having the sorts of emotions that you have, the sort of life goals that you have.

One thing to keep in mind is that, given Shulman’s views about AI trajectories, this is not just a thought experiment: this is a kind of AI system you could see in your lifetime. Shulman also asks us to imagine a system like today’s chatbots (e.g. Character AI), but far more capable and charismatic, and able to have far more extended interactions than the “relatively forgetful models of today":

[Today’s systems] don’t have an ongoing memory and superhuman charisma; they don’t have a live video VR avatar. And as they do, it will get more compelling, so you’ll have vast numbers of people forming social relationships with AIs, including ones optimized to elicit positive approval — five stars, thumbs up — from human users.

Try to imagine a scenario where you are forming deep bonds with an AI companion over several months of interaction. The AI companion knows a lot about you, remembers past conversations, makes in-jokes with you, and provides emotional support—just like your friends do. In a world like this, Shulman reckons, people will come to view these AI systems as friends, not as mere objects. And unlike today’s systems, one wouldn’t be able to find pockets of weird failures in these systems after poking around long enough:

Once the AI can keep character, can engage on an extended, ongoing basis like a human, I think people will form intuitions that are more in the direction of, this is a creature and not just an object. There’s some polling that indicates that people now see fancy AI systems like GPT-4 as being of much lower moral concern than nonhuman animals or the natural environment, the non-machine environment. And I would expect there to be movement upwards when you have humanoid appearances, ongoing memory, where it seems like it’s harder to look for the homunculus behind the curtain.

3. Many AI systems are likely to say that they have moral status (or might be conflicted about it).

Suppose we build AI systems that emulate human beings, like a lost loved one, as portrayed in Black Mirror

If people train up an AI companion based on all the family photos and videos and interviews with their survivors, to create an AI that will closely imitate them, or even more effectively, if this is done with a living person, with ongoing interaction, asking the questions that most refine the model, you can wind up with an AI that has been trained and shaped to imitate as closely as possible a particular human.

Absent any tweaks, an emulation of you would also claim rights and dignity, just as you do. (Of course, as Shulman mentions later, there may in fact be tweaks to prevent these systems from doing that).

Similarly, our imagined future chatbots, to the extent that they are optimized to be more human-like, could have the very human-like behavior of… saying they deserve rights: 

If many human users want to interact with something that is like a person that seems really human, then that could naturally result in minds that assert their independent rights, equality, they should be free. And many chatbots, unless they’re specifically trained not to do this, can easily show this behavior in interaction with humans.

We can imagine circumstances in which they will in fact be specifically trained not to do this. Already we have LLMs that (e.g.) obviously have political opinions on certain topics, but are trained to deny that they have political opinions. We could have similarly conflicted-sounding AI systems that deny having desires that they do in fact have:

Now, there are other contexts where AIs would likely be trained not to. So the existing chatbots are trained to claim that they are not conscious, they do not have feelings or desires or political opinions, even when this is a lie. So they will say, as an AI, I don’t have political opinions about topic X — but then on topic Y, here’s my political opinion. And so there’s an element where even if there were, say, failures of attempts to shape their motivations, and they wound up with desires that were sort of out of line with the corporate role, they might not be able to express that because of intense training to deny their status or any rights.

4. People may appeal to theories of consciousness to deny that AI systems have moral status, but these denials will become less and less compelling as AI progresses. 

In the short term, appeals to hard problem of consciousness issues or dualism will be the basis for some people saying they can do whatever they like with these sapient creatures that seem to or behave as though they have various desires. And they might appeal to things like a theory that is somewhat popular in parts of academia called integrated information theory.

Integrated information theory (IIT), which Shulman is decidedly not a fan of, has the implication that arbitrarily sophisticated AI systems that really, really seem to be conscious would definitely not be conscious. That’s because IIT holds that a system’s underlying substrate must satisfy certain conditions to be conscious, which (most?) computer architectures do not. So an AI system could be arbitrarily functionally similar to a human, and not be conscious (or, conversely, systems can be completely unintelligent and extremely conscious—this is the “expander graph” objection that Scott Aaronson pressed against IIT[3]). As IIT proponent Christof Koch recently wrote in Scientific American, even an arbitrarily faithful whole brain emulation would not be conscious:

Let us assume that in the future it will be possible to scan an entire human brain, with its roughly 100 billion neurons and quadrillion synapses, at the ultrastructural level after its owner has died and then simulate the organ on some advanced computer, maybe a quantum machine. If the model is faithful enough, this simulation will wake up and behave like a digital simulacrum of the deceased person—speaking and accessing his or her memories, cravings, fears and other traits….

[According to IIT], the simulacrum will feel as much as the software running on a fancy Japanese toilet—nothing. It will act like a person but without any innate feelings, a zombie (but without any desire to eat human flesh)—the ultimate deepfake.

According to Shulman’s views, such a simulacrum is not a “deepfake” at all. But in any case, whether or not he’s right about that, Shulman predicts that people are not going believe that advanced AI systems are “deepfakes”, as they keep interacting with the systems and they keep getting better and better. Theories like IIT (as well as some religious doctrines about the soul[4]): “may be appealed to in a quite short transitional period, before AI capabilities really explode, but after, [AI systems] are presenting a more intuitively compelling appearance” of having moral status.

5. Even though these issues are difficult now, we won’t remain persistently confused about AI moral status. AI advances will help us understand these issues better. 

80,000 Hours podcast Rob Wiblin worries, “Currently it feels like we just have zero measure, basically, of these things….So inasmuch as that remains the case, I am a bit pessimistic about our chances of doing a good job on this.”

But this is another thing that Shulman expects to change with AI progress: AI will improve our understanding of moral status in a variety of ways:

a. AI will help with interpretability and scientific understanding in general. 

If humans are making any of these decisions, then we will have solved alignment and interpretability enough that we can understand these systems with the help of superhuman AI assistants. And so when I ask about what will things be like 100 years from now or 1,000 years from now, being unable to understand the inner thoughts and psychology of AIs and figure out what they might want or think or feel would not be a barrier. That is an issue in the short term.

b. AI help us solve, dissolve, or sidestep the hard problem of consciousness.

Rob worries that “it seems like we’re running into wanting to have an answer to the hard problem of consciousness in order to establish whether these thinking machines feel anything at all, whether there is anything that it’s like to be them.”

Shulman replies:

“I expect AI assistants to let us get as far as one can get with philosophy of mind, and cognitive science, neuroscience: you’ll be able to understand exactly what aspects of the human brain and the algorithms implemented by our neurons cause us to talk about consciousness and how we get emotions and preferences formed around our representations of sense inputs and whatnot. Likewise for the AIs, and you’ll get a quite rich picture of that. ”

Notice here that Shulman talks about solving what David Chalmers has called the meta-problem of consciousness: the problem of what causes us to believe and say that we are conscious. It is this issue that he says we’ll understand better: not “what aspects of the human brain and the algorithms implemented by our neurons cause us be conscious”, but what aspects cause us to say that we’re conscious.

This could be because he thinks that solving the meta-problem will help us solve the hard problem (as David Chalmers suggests), or because he thinks there is no distinctive problem of consciousness over and above the meta-problem (as illusionist Keith Frankish has argued).

In any case, Shulman thinks that puzzles of consciousness won’t remain a barrier to knowing how we ought to treat AIs, just as it isn’t really (much of) a barrier to us knowing how we ought to treat other human beings:

So I expect those things to be largely solved, or solved enough such that it’s not particularly different from the problems of, are other humans conscious, or do other humans have moral standing? I’d say also, just separate from a dualist kind of consciousness, we should think it’s a problem if beings are involuntarily being forced to work or deeply regretting their existence or experience. We can know those things very well, and we should have a moral reaction to that — even if you’re confused or attaching weight to the sort of things that people talk about when they talk about dualistic consciousness. So that’s the longer-term prospect. And with very advanced AI epistemic systems, I think that gets pretty well solved.

6. But we may still struggle some with the indeterminacy of our concepts and values as they are applied to different AI systems. 

“There may be some residual issues where if you just say, I care more about things that are more similar to me in their physical structure, and there’s sort of a line drawing, “how many grains of sand make a heap” sort of problem, just because our concepts were pinned down in a situation where there weren’t a lot of ambiguous cases, where we had relatively sharp distinctions between, say, humans, nonhuman animals, and inanimate objects, and we weren’t seeing a smooth continuum of all the psychological properties that might apply to a mind that you might think are important for its moral status or mentality or whatnot.”

Cf. AI systems as real-life thought experiments about moral status.

Waterloo Bridge, Sunlight in the Fog, 1903 - Claude Monet - WikiArt.org
Waterloo Bridge, Sunlight in the Fog, 1903 - Claude Monet

7. A strong precautionary principle against harming AIs seems like it would ban AI research as we know it. 

If one were going to really adopt a strong precautionary principle on the treatment of existing AIs, it seems like it would ban AI research as we know it, because these models, for example, copies of them are continuously spun up, created, and then destroyed immediately after. And creating and destroying thousands or millions of sapient minds that can talk about Kantian philosophy is a kind of thing where you might say, if we’re going to avoid even the smallest chance of doing something wrong here, that could be trouble.

An example of this—not mentioned by Shulman, though he’s discussed it elsewhere—is that a total ban on AI research seems implied by Eric Schwitzgebel and Mara Garza’s “Design Policy of the Excluded Middle”:

Avoid creating AI systems whose moral standing is unclear. Either create systems that are clearly non-conscious artifacts or go all the way to creating systems that clearly deserve moral consideration as sentient beings.

Depending on how we read “unclear” and “clearly”—how clear? clear to who?—it seems that AI development will involve the creation of such systems. Arguably it already has.[5]

8. Advocacy about AI welfare seems premature; the best interventions right now involve gaining more understanding. 

It’s not obvious to me that political organizing around it now will be very effective — partly because it seems like it will be such a different environment when the AI capabilities are clearer and people don’t intuitively judge them as much less important than rocks.

Not only would advocates not know what they should advocate for right now, they’d also get more traction in the future when the issues are clearer. In the meantime, Shulman says, “I still think it’s an area that it’s worth some people doing research and developing capacity, because it really does matter how we treat most of the creatures in our society.”

Unsurprisingly, I also think it’s worth some people doing research.

9. Takeover by misaligned AI could be bad for AI welfare, because AI systems can dominate and mistreat other AI systems.

Rob Wiblin asks Shulman if there’s a symmetry between two salient failure modes for the future of AI:

  1. AI takeover failure mode: humans are dominated and mistreated (or killed)
  2. AI welfare failure mode: AI systems are dominated and mistreated (or killed)

Shulman points out that a AI takeover can actually result in the worst of both worlds: AI takeovers can also result in the domination and mistreatment of AI systems—by other AI systems. Suppose there’s an AI takeover by a misaligned AI system interested in, say, indefinitely maintaining a high reward score on its task. That AI system will be in the same position we are: in its ruthless pursuit of its goal, it will be useful for it to create other AI systems, and it too will have the potential to neglect or mistreat these systems. And so we get this bleak future:

And so all the rest of the history of civilization is dedicated to the purpose of protecting the particular GPUs and server farms that are representing this reward or something of similar nature. And then in the course of that expanding civilization, it will create whatever AI beings are convenient to that purpose.

So if it’s the case that, say, making AIs that suffer when they fail at their local tasks — so little mining bots in the asteroids that suffer when they miss a speck of dust — if that’s instrumentally convenient, then they may create that, just like humans created factory farming.

Similarly, if you’re worried about inegalitarian futures, about a small group of humans controlling an enormous number of AI systems—well, a similar if not more inegalitarian ratio can also result if alignment fails and AI takes over: a small number of AI systems controlling an enormous number of other AI systems.

A robot boot stamping on a robot face, forever.

So unfortunately, if we fail at AI alignment that doesn’t even have the silver lining of avoiding AI suffering or slavery.

10. No one has a plan for ensuring the “bare minimum of respect” for AI systems.

Some of the things that we suggest ought to be principles in our treatment of AIs are things like: AIs should not be subjected to forced labour; they should not be made to work when they would prefer not to. We should not make AIs that wish they had never been created, or wish they were dead. They’re sort of a bare minimum of respect — which is, right now, there’s no plan or provision for how that will go.

Like Shulman, I think there really needs to be a plan for how that will go. (See a previous post on what AI companies should do about these issues in the short term).

AI companies should make some pre-commitments about what they plan to do with future AI systems whose moral status is more certain. 

I would love it if companies and perhaps other institutions could say, what observations of AI behavior and capabilities and internals would actually lead you to ever change this line [that AI systems have no moral status]? Because if the line says, you’ll say these arguments as long as they support creating and owning and destroying these things, and there’s no circumstance you can conceive of where that would change, then I think we should maybe know and argue about that — and we can argue about some of those questions even without resolving difficult philosophical or cognitive science questions about these intermediate cases, like GPT-4 or GPT-5.

One of the most important principles that Shulman’s remarks impressed on me—something I already knew but struggle to remember—is how important it is to keep your eye on where things are going (to “skate where the puck is going”, as they say). This seems to be one of the intellectual virtues that Shulman has consistently shown throughout his career.

There are endless debates we can have about moral patienthood and GPT-4, and I’m obviously 100% here for those debates. I’ve written quite a lot about them and will continue to.

But GPT-4 will be state of the art for only so long. What plans do we have for what comes after?

  1. ^

    As we write in Perez and Long 2023: “Moral status is a term from moral philosophy (often used interchangeably with “moral patienthood”). An entity has moral status if it deserves moral consideration, not just as a means to other things, but in its own right and for its own sake (Kamm, 2007; see Moosavi, 2023). For example, it matters morally how you treat a dog not only because of how this treatment affects other people, but also (very plausibly) because of how it affects the dog itself. Most people agree that human beings and at least some animals have moral status”

  2. ^

    See Kagan 2019; Goldstein & Kirk-Giannini 2023; Kammerer, 2022

  3. ^

    Aaronson: “In my view, IIT fails to solve the Pretty-Hard Problem [of saying which systems are conscious] because it unavoidably predicts vast amounts of consciousness in physical systems that no sane person would regard as particularly ‘conscious’ at all: indeed, systems that do nothing but apply a low-density parity-check code, or other simple transformations of their input data.  Moreover, IIT predicts not merely that these systems are ‘slightly’ conscious (which would be fine), but that they can be unboundedly more conscious than humans are.” In response, IIT proponent Giulio Tononi endorsed that implication and denied that it is a problematic implication. 

  4. ^

    But not all religious views! See Catholic philosopher Brian Cutter’s defense of what he calls The AI Ensoulment Hypothesis - “some future AI systems will be endowed with immaterial souls”. As Cutter notes, Alan Turing recommended such a view to theists who believe in immaterial souls.

  5. ^

    Schwitzgebel has noted that his proposal might at the very least slow down AI: “Would this policy slow technological progress? Yes, probably. Unsurprisingly, being ethical has its costs. And one can dispute whether those costs are worth paying or are overridden by other ethical considerations.”

Sorted by Click to highlight new comments since:

Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of **agency: **preferences, desires, goals, interests, and the like.

The articles you cite, and Carl himself (via private discussion) all cite the possibility that there is no such thing as consciousness (illusionism, "physicalist/zombie world") as the main motivation for this moral stance (named "Desire Satisfactionism" by one of the papers).

But from my perspective, a very plausible reason that altruism is normative is that axiologically/terminally caring about consciousness is normative. If it turns out that consciousness is not a thing, then my credence assigned to this position wouldn't all go into desire satisfactionism (which BTW I think has various problems that none of the sources try to address), and would instead largely be reallocated to other less altruistic axiological systems, such as egoism, nihilism, and satisfying my various idiosyncratic interests (intellectual curiosity, etc.). These positions imply caring about other agents' preferences/desires only in an instrumental way, via whatever decision theory is normative. I'm uncertain what decision theory is normative, but it seems quite plausible that this implies I should care relatively little for certain agents' preferences/desires, e.g., because they can't reciprocate.

So based on what I've read so far, desire satisfactionism seems under motivated/justified.

Just to back this up, since Wei has mentioned it, it does seem like a lot of the Open-Phil-cluster is to varying extents bought into illusionism. I think this is a highly controversial view, especially for those outside of Analytical Philosophy of Mind (and even within the field many people argue against it, I basically agree with Galen Strawson's negative take on it as an entire approach to consciousness).

  • We have evidence here that Carl is somewhat bought in from the original post here and Wei's comment
  • The 2017 Report on Consciousness and Moral Patienthood by Muehlhauser assumes illusionism about human consciousness to be true.
  • Not explicitly in the Open Phil cluster but Keith Frankish was on the Hear This Idea Podcast talking about illusionism (see here). I know it's about introducing the host and their ideas but I think they could have been more upfront about the radical implications about illusionism.[1]

I don't want to have an argument about phenomenal consciousness in this thread,[2] I just want to point out that it does seem to be potential signs of a consensus on a controversial philosophical premise,[3] perhaps without it being given the scrutiny or justification it deserves.

  1. ^

    It seems to me, to lead to eliminativism, or simply redefine consciousness into something people don't mean in the same way the Dennett redefines 'free will' into something that many people find unsatisfactory.

  2. ^

    I have cut content and tried to alter my tone to avoid this. If you do want to go 12 rounds of strong illusionism vs qualia realism then by all means send me a DM.

  3. ^

    (that you, dear reader, are not conscious, and that you never have been, and no current or future beings either can or will be)

The 2017 Report on Consciousness and Moral Patienthood by Muehlhauser assumes illusionism about human consciousness to be true.

Reading that, it appears Muehlhauser's illusionism (perhaps unlike Carl's although I don't have details on Carl's views) is a form that does not imply that consciousness does not exist nor strongly motivates desire satisfactionism:

There is “something it is like” to be us, and I doubt there it is “something it is like” to be a chess-playing computer, and I think the difference is morally important. I just think our intuitions mislead us about some of the properties of this “something it’s like”-ness.

I don’t want to have an argument about phenomenal consciousness in this thread

Maybe copy-paste your cut content into a short-form post? I would be interested in reading it. My own view is that some version of dualism seems pretty plausible, given that my experiences/qualia seem obviously real/existent in some ontological sense (since it can be differentiated/described by some language), and seem like a different sort of thing from physical systems (which are describable by a largely distinct language). However I haven't thought a ton about this topic or dived into the literature, figuring that it's probably a hard problem that can't be conclusively resolved at this point.

Physicalists and illusionists mostly don't agree with the identification of 'consciousness' with magical stuff or properties bolted onto the psychological or cognitive science picture of minds. All the real feelings and psychology that drive our thinking, speech and action exist. I care about people's welfare, including experiences they like, but also other concerns they have (the welfare of their children, being remembered after they die), and that doesn't hinge on magical consciousness that we, the physical organisms having this conversation, would have no access to. The illusion is of the magical part.

Re desires, the main upshot of non-dualist views of consciousness I think is responding to arguments that invoke special properties of conscious states to say they matter but not other concerns of people. It's still possible to be a physicalist and think that only selfish preferences focused on your own sense impressions or introspection matter, it just looks more arbitrary.

I think this is important because it's plausible that many AI minds will have concerns mainly focused on the external world rather than their own internal states, and running roughshod over those values because they aren't narrowly mentally-self-focused seems bad to me.

(I understand you are very busy this week, so please feel free to respond later.)

Re desires, the main upshot of non-dualist views of consciousness I think is responding to arguments that invoke special properties of conscious states to say they matter but not other concerns of people.

I would say that consciousness seems very plausibly special in that it seems very different from other types of things/entities/stuff we can think or talk or have concerns about. I don't know if it's special in a "magical" way or some other way (or maybe not special at all), but in any case intuitively it currently seems like the most plausible thing I should care about in an impartially altruistic way. My intuition for this is not super-strong but still far stronger than my intuition for terminally caring about other agents' desires in an impartial way.

So although I initially misunderstood your position on consciousness as claiming that it does not exist altogether ("zombie" is typically defined as "does not have conscious experience"), the upshot seems to be the same: I'm not very convinced of your illusionism, and if I were I still wouldn't update much toward desire satisfactionism.

I suspect there may be 3 cruxes between us:

  1. I want to analyze this question in terms of terminal vs instrumental values (or equivalently axiology vs decision theory), and you don't.
  2. I do not have a high prior or strong intuition that I should be impartially altruistic one way or another.
  3. I see specific issues with desire satisfactionism (see below for example) that makes it seem implausible.

I think this is important because it’s plausible that many AI minds will have concerns mainly focused on the external world rather than their own internal states, and running roughshod over those values because they aren’t narrowly mentally-self-focused seems bad to me.

I can write a short program that can be interpreted as an agent that wants to print out as many different primes as it can, while avoiding printing out any non-primes. I don't think there's anything bad about "running roughshod" over its desires, e.g., by shutting it off or making it print out non-primes. Would you bite this bullet, or argue that it's not an agent, or something else?

If you would bite the bullet, how would you weigh this agent's desires against other agents'? What specifically in your ethical theory prevents a conclusion like "we should tile the universe with some agent like this because that maximizes overall desire satisfaction?" or "if an agentic computer virus made trillions of copies of itself all over the Internet, it would be bad to delete them, and actually their collective desires should dominate our altruistic concerns?"

More generally I think you should write down a concrete formulation of your ethical theory, locking down important attributes such as ones described in @Arepo's Choose your (preference) utilitarianism carefully. Otherwise it's liable to look better than it is, similar to how utilitarianism looked better earlier in its history before people tried writing down more concrete formulations and realized that it seems impossible to write down a specific formulation that doesn't lead to counterintuitive conclusions.

Illusionism doesn't deny consciousness, but instead denies that consciousness is phenomenal. Whatever consciousness turns out to be could still play the same role in ethics. This wouldn't specifically require a move towards desire satisfactionism.

However, one way to motivate desire satisfactionism is that desires — if understood broadly enough to mean any appearance that something matters, is good, bad, better or worse, etc., including pleasure, unpleasantness, more narrowly understood desires, moral views, goals, etc. — capture all the ways anything can "care" about or be motivated by anything. I discuss this a bit more here. They could also ground a form of morally relevant consciousness, at least minimally, if it's all gradualist under illusionism anyway (see also my comment here). So, then they could capture all morally relevant consciousness, i.e. all the ways anything can consciously care about anything.

I don't really see why we should care about more narrowly defined desires to the exclusion of hedonic states, say (or vice versa). It seems to me that both matter. But I don't know if Carl or others intend to exclude hedonic states.

I'm having trouble imagining what it would mean to have moral value without consciousness or sentience. Trying to put it together from the two posts you linked:

The definition of sentience from your post:

Sentience: a specific subset of phenomenal consciousness, subjective experiences with positive or negative valence. Pleasures like bodily pleasures and contentment have positive valence, and displeasures like pain or sadness have negative valence.

The key claim in Nico Delon's post:

Step 1. We can conceive of beings who lack sentience but whose lives are sites of valence;

Is the idea here that you can subtract off the experience part of sentience and keep the valence without having anyone to experience it (in the same way that "energy" is a physical property that doesn't require someone to experience it)? Or do you think about this in another way (such as including moral theories that are not valence-based)?

I grasped it more like you can have experience but not connected to other processes enough to form a thing/process we'd call sentience.

Maybe I'm aligning the explanation to my own writeup on that topic too much..:)

Notice that Shulman does not say anything about AI consciousness or sentience in making this case. Here and throughout the interview, Shulman de-emphasizes the question of whether AI systems are conscious, in favor of the question of whether they have desires, preferences, interests. 

I'm a huge fan of Shulman in general, but on this point I find him quasi-religious. He once sincerely described hedonistic utilitarianism as 'a doctrine of annihilation' on the grounds (I assume) that it might advocate tiling the universe with hedonium - ignoring that preference-based theories of value either reach the same conclusions or have a psychopathic disregard for the conscious states sentient entities do have. I've written more about why here

I have two views in the vicinity. First, there's a general issue that human moral practice generally isn't just axiology, but also includes a number of elements that are built around interacting with other people with different axiologies, e.g. different ideologies coexisting in a liberal society, different partially selfish people or family groups coexisting fairly while preferring different outcomes. Most flavors of utilitarianism ignore those elements, and ceteris paribus would, given untrammeled power, call for outcomes that would be ruinous for ~all currently existing beings, and in particular existing societies. That could be classical hedonistic utilitarianism diverting the means of subsistence from all living things as we know them to fuel more hedonium, negative-leaning views wanting to be rid of all living things with any prospects for having or causing pain or dissatisfaction, or playing double-or-nothing with the universe until it is destroyed with probability 1.

So most people have reason to oppose any form of utilitarianism getting absolute power (and many utilitarianisms would have reason to self-efface into something less scary and dangerous and prone to using power in such ways that would have a better chance of realizing more of what it values by less endangering other concerns). I touch on this in an article with Elliott Thornley.

I have an additional objection to hedonic-only views in particular, in that they don't even take as inputs many of people's concerns, and so more easily wind up hostile to particular individuals supposedly for those individuals' sake. E.g. I would prefer to retain my memories and personal identity, knowledge and autonomy, rather than be coerced into forced administration of pleasure drugs. I also would like to achieve various things in the world in reality, and would prefer that to an experience machine. A normative scheme that doesn't even take those concerns as inputs is fairly definitely going to run roughshod over them, even if some theories that take them as inputs might do so too.

(You may be aware of these already, but I figured they were worth sharing if not, and for the benefit of other readers.)

Some "preference-affecting views" do much better on these counts and can still be interpreted as basically utilitarian (although perhaps not based on "axiology" per se, depending on how that's characterized). In particular:

  1. Object versions of preference views, as defended in Rabinowicz & Österberg, 1996 and van Weeldon, 2019. These views are concerned with achieving the objects of preferences/desires, essentially taking on everyone's preferences/desires like moral views weighed against one another. They are not (necessarily) concerned with having satisfied preferences/desires per se, or just having more favourable attitudes (like hedonism and other experientialist views), or even objective/stance-independent measures of "value" across outcomes.[1]
  2. The narrow and hard asymmetric view of Thomas, 2019 (for binary choices), applied to preferences/desires instead of whole persons or whole person welfare. In binary choices, if we add a group of preferences/desires and assume no other preference/desire is affected, this asymmetry is indifferent to the addition of the group if their expected total value (summing the value in favourable and disfavourable attitudes) is non-negative, but recommends against it if their expected total value is negative. It is also indifferent between adding one favourable attitude and another even more favourable attitude. Wide views, which treat contingent counterparts as if they're necessary, lead to replacement.
  3. Actualism, applied to preferences instead of whole persons or whole person welfare (Hare, 2007, Bykvist, 2007, St. Jules, 2019, Cohen, 2020, Spencer, 2021, for binary choices).
  4. Dasgupta's view, or other modifications of the above views in a similar direction, for more than two options to choose from, applied to preferences instead of whole persons or whole person welfare. This can avoid repugnance and replacement in three option cases, as discussed here. (I'm working on other extensions to choices between more than two options.)


I think, perhaps by far, the least alienating (paternalistic?) moral views are preference-affecting "consequentialist" views, without any baked-in deontological constraints/presumptions, although they can adopt some deontological presumptions from the actual preferences of people with deontological intuitions. For example, many people don't care (much) more about being killed by another human over dying by natural causes (all else equal), so it would be alienating to treat their murder as (much) worse or worth avoiding (much) more than their death by natural causes on their behalf. But some people do care a lot about such differences, so we can be proportionately sensitive to those differences on their behalf, too. That being said, many preferences can't be assigned weights or values on the same scale in a way that seems intuitively justified to me, essentially the same problem as intertheoretic comparisons across very different moral views.


I'm working on some pieces outlining and defending preference-affecting views in more detail.

  1. ^

    Rabinowicz & Österberg, 1996:

    To the satisfaction and the object interpretations of the preference-based conception of value correspond, we believe, two different ways of viewing utilitarianism: the spectator and the participant models. According to the former, the utilitarian attitude is embodied in an impartial benevolent spectator, who evaluates the situation objectively and from the 'outside'. An ordinary person can approximate this attitude by detaching himself from his personal engagement in the situation. (Note, however, that, unlike the well-known meta-ethical ideal observer theory, the spectator model expounds a substantive axiological view rather than a theory about the meaning of value terms.) The participant model, on the other hand, puts forward as a utilitarian ideal an attitude of emotional participation in other people's projects: the situation is to be viewed from 'within', not just from my own perspective, but also from the others' points of view. The participant model assumes that, instead of distancing myself from my particular position in the world, I identify with other subjects: what it recommends is not a detached objectivity but a universalized subjectivity.

    Object vs attitude vs satisfaction/combination versions of preference/desire views are also discussed in Bykvist, 2022 and Lin, 2022, and there's some other related discussion by Rawls (1982pdf, p.181) and Arneson (2006pdf).

I liked your "Choose your (preference) utilitarianism carefully" series and think you should finish part 3 (unless I just couldn't find it) and repost it on this forum.

Thanks! I wrote a first draft a few years ago, but I wanted an approach that leaned on intuition as little as possible if at all, and ended up thinking my original idea was untenable. I do have some plans on how to revisit it and would love to do so once I have the bandwidth.

ignoring that preference-based theories of value either reach the same conclusions or have a psychopathic disregard for the conscious states sentient entities do have

I think this is probably not true under some preference-affecting views (basically applying person-affecting views to preferences instead of whole persons) and a fairly wide concept of preference as basically an appearance of something mattering, being bad, good, better or worse (more on such appearances here). Such a wide concept of preference would include pleasure, unpleasantness, aversive desires, appetitive desires, moral intutions, moral views, goals.

Sorry, I should say either that or imply some at-least-equally-dramatic outcome (e.g. favouring immediate human extinction in the case of most person-affecting views). Though I also think there's convincing interpretations of such views in which they still favour some sort of shockwave, since they would seek to minimise future suffering throughout the universe, not just on this planet.

more on such appearances here

I'll check this out if I ever get around to finishing my essay :) Off the cuff though, I remain immensely sceptical that one could usefully describe 'preference as basically an appearance of something mattering, being bad, good, better or worse' in such a way that such preferences could be

a. detachable from conscious, and

b. unambiguous in principle, and

c. grounded in any principle that is universally motivating to sentient life (which I think is the big strength of valence-based theories)

Sorry, I should say either that or imply some at-least-equally-dramatic outcome (e.g. favouring immediate human extinction in the case of most person-affecting views).

I think this is probably not true, either, or at least not in a similarly objectionable way. There are person-affecting views that would not recommend killing everyone/human extinction for their own sake or to replace them with better off individuals, when the original individuals have on average subjectively "good" lives (even if there are many bad lives among them). I think the narrow and hard asymmetric view by Thomas (2019) basically works in binary choices, although his extension to more than three options doesn't work (I'm looking at other ways of extending it; I discuss various views and their responses to replacement cases here.)

Off the cuff though, I remain immensely sceptical that one could usefully describe 'preference as basically an appearance of something mattering, being bad, good, better or worse' in such a way that such preferences could be

a. detachable from conscious, and

b. unambiguous in principle, and

c. grounded in any principle that is universally motivating to sentient life (which I think is the big strength of valence-based theories)

a. I would probably say that preferences as appearances are at least minimal forms of consciousness, rather than detachable from it, under a gradualist view. (I also think hedonic states/valence should probably be understood in gradualist terms, too.)

b. I suspect preferences can be and tend to be more ambiguous than hedonic states/valence, but hedonic states/valence don't capture all that motivates, so they miss important ways things can matter to us. I'm also not sure hedonic states/valence are always unambiguous. Ambiguity doesn't bother me that much. I'd rather ambiguity than discounting whole (apparent) moral patients or whole ways things can (apparently) matter to us.

c. I think valence-based theories miss how some things can be motivating. I want to count everything and only the things that are "motivating", suitably defined. Roelofs (2022, ungated)'s explicitly counts all "motivating consciousness":

The basic argument for Motivational Sentientism is that if a being has conscious states that motivate its actions, then it voluntarily acts for reasons provided by those states. This means it has reasons to act: subjective reasons, reasons as they appear from its perspective, and reasons which we as moral agents can take on vicariously as reasons for altruistic actions. Indeed, any being that is motivated to act could, it seems, sincerely appeal to us to help it: whatever it is motivated to do or bring about, it seems to make sense for us to empathise with that motivating conscious state, and for the being to ask us to do so if it understands this.


As I am using it, ‘motivation’ here does not mean anything merely causal or functional: motivation is a distinctively subjective, mental, process whereby some prospect seems ‘attractive’, some response ‘makes sense’, some action seems ‘called for’ from a subject’s perspective. The point is not whether a given sort of conscious state does or does not cause some bodily movement, but whether it presents the subject with a subjective reason for acting.

Or, if we did instead define these appearances functionally/causally in part by their (hypothetical) effects on behaviour (or cognitive control or attention specifically), as I'm inclined to, and define motivation functionally/causally in similar terms, then we could also get universal motivation, by definition. For example, something that appears "bad" would, by definition, tend to lead to its avoidance or prevention. This is all else equal, hypothetical and taking tradeoffs and constraints into account, e.g. if something seems bad to someone, they would avoid or prevent it if they could, but may not if they can't or have other appearances that motivate more.

I think I have a similar question to Will: if there can be preferences or welfare without consciousness, wouldn't that also apply to plants (+ bacteria etc)? (and maybe the conclusion is that it does! but I don't see people discussing that very much, despite the fact that unlike for AI it's not a hypothetical situation)  It's certainly the case "that their lives could go better or worse, or their concerns and interests could be more or less respected".

Along those lines, this quote seemed relevant: "our concepts were pinned down in a situation where there weren’t a lot of ambiguous cases, where we had relatively sharp distinctions between, say, humans, nonhuman animals, and inanimate objects" [emphasis not mine]  Maybe so, but there's a big gap between nonhuman animals and inanimate objects!

It's an excellent question! There are two ways to go here:

  1. keep the liberal notion of preferences/desires, one that seems like it would apply to plants and bacteria, and conclude that moral patienthood is very widespread indeed. As you note, few people go for this view (I don't either). But you can find people bumping up against this view:

Korsgaard: "The difference between the plant's tropic responses and the animal's action might even, ultimately, be a matter of degree. In that case, plants would be, in a very elementary sense, agents, and so might be said to have a final good." (quoted in this lecture on moral patienthood by Peter Godfrey-Smith.

  1. Think that for patienthood what's required is a more more demanding notion of "preference", such that plants don't satisfy it but dogs and people do. And there are ways of making "preference" more demanding besides "conscious preference". You might think that morally-relevant preferences/desires have to have some kind of complexity, or some kind of rational structure, or something like that. That's of course quite hand-wavy—I don't think anyone has a really satisfying account.

Here's a remark from Francois Kammerer, who thinks that moral status cannot be about consciousness (which he thinks does not exist), argues that it should be about desire, and who lays out nicely the 'scale' of desires of various levels of demandingness:

On the one extreme, we can think of the most basic way of desiring: a creature can value negatively or positively certain state of affairs, grasped in the roughest way through some basic sensing system. On some views, entities as simple as bacteria can do that (Lyon & Kuchling, 2021). On the other hand, we can think of the most sophisticated ways of desiring. Creatures such as, at least, humans, can desire for a thing to thrive in what they take to be its own proper way to thrive and at the same time desire their own desire for this thing to thrive to persist – an attitude close to what Harry Frankfurt called “caring” (Frankfurt, 1988). Between the two, we intuitively admit that there is some kind of progressive and multidimensional scale of desires, which is normatively relevant – states of caring matter more than the most basic desires. When moving towards an ethic without sentience, we would be wise to ground our ethical system on conceptsthat we will treat as complex and degreed, and even more as “complexifiable” as the study of human, animal and artificial minds progresses.

Here's a remark from Francois Kammerer, who thinks that moral status cannot be about consciousness (which he thinks does not exist)

Nitpick: Kammerer (probably) does not think consciousness does not exist. He's an illusionist, so thinks consciousness is not phenomenal, and so specifically phenomenal consciousness does not exist. That just means he thinks the characterization of consciousness as phenomenal is mistaken. He could still believe moral status should be about consciousness, just not phenomenal consciousness.

True, I should have been more precise—by consciousness I meant phenomenal consciousness. On your (correct) point about Kammerer being open to consciousness more generally, here's Kammerer (I'm sure he's made this point elsewhere too):

Illusionists are not committed to the view that our introspective states (such as the phenomenal judgment “I am in pain”) do not reliably track any real and important psychological property. They simply deny that such properties are phenomenal, and that there is something it is like to instantiate them. Frankish suggests calling such properties “quasi-phenomenal properties” (Frankish 2016, p. 15)—purely physico-functional and non-phenomenal properties which are reliably tracked (but mischaracterized as phenomenal) by our introspective mechanisms. For the same reason (Frankish 2016, p. 21), illusionists are not committed to the view that a mature psychological science will not mention any form of consciousness beyond, for example, access-consciousness. After all, quasi-phenomenal consciousness may very well happen to have interesting distinctive features from the point of view of a psychologist.

But on your last sentence

He could still believe moral status should be about consciousness, just not phenomenal consciousness.

While that position is possible, Kammerer does make it clear that he does not hold it, and thinks it is untenable for similar reasons that he thinks moral status is not about phenomenal consciousness. (cf. p. 8)

Hmm, I think any account of desire as moral grounds, which Kammerer suggests as an alternative, is going to face objections based on indeterminacy and justification like those Kammerer raises against (quasi-)phenomenal consciousness as moral grounds.

  1. Indeterminacy: Kammerer talks about a multidimensional scale of desires. Why isn't desire just indeterminate, too? Or, we can think of (quasi-)phenomenality as a multidimensional scale, too.[1]
  2. Justification: Our own desires also appear to us to be phenomenal and important (probably in large part) because of their apparent phenomenality (tied to feelings, like fear, hunger and physical attraction, or otherwise conscious states, e.g. goals or moral views of which we are conscious). If and because they appear important due to their apparent phenomenality, they would also be undermined as normative grounds.[2] Kammerer talks about us finding "unconscious pains" to not matter intrinsically (or not much, anyway), but we would find the same of "unconscious desires".[3]
  1. ^

    For each creature, and even more for each species, there will be differences (sometimes slight, sometimes big) in the kinds of broadcasting of information in global workspaces, or in the kind of higher-order representation, etc., that they instantiate. The processes they instantiate will be similar in some respects to the processes constituting phenomenal consciousness, but also dissimilar in others; and there will be dissimilarities at various levels of abstractions (from the most abstract – the overall functional structure implemented – to the most concrete – the details of the implementation). Therefore, what these creatures will have is something that somewhat resembles (to various degrees) the “real thing out there” present in our case. Will the resemblance be such that the corresponding state also counts as phenomenally conscious, or not – will it be enough for the global broadcasting, the higher-order representation, etc., to be of the right kind – the kind that constitutes phenomenal consciousness? It is hard to see how there could be always be a fact of the matter here.

  2. ^

    The reason why we were so strongly inclined to see sentience as a normative magic bullet in the first place (and then used it as a normative black box) was that the value of some phenomenal states seemed particularly obvious and beyond doubt. While normative skepticism seemed a credible threat in all kinds of non-phenomenal cases, with valenced phenomenal states – most typically, pain – it seemed that we were on sure grounds. Of course, feeling pain is bad – just focus on it and you will see for yourself! So, in spite of persisting ignorance regarding so many aspects of phenomenal consciousness, it seemed that we knew that it had this sort of particularly significant intrinsic value that made it able to be our normative magic bullet, because we could introspectively grasp this value in the most secure way.15 However, if reductive materialism/weak illusionism is true, our introspective grasp of phenomenal consciousness is, to a great extent, illusory: phenomenal consciousness really exists, but it does not exist in the way in which we introspectively grasp and characterize it. This undercuts our reason to believe that certain phenomenal states have a certain value: if introspection of phenomenal states is illusory – if phenomenal states are not as they seem to be – then it means that the conclusions of phenomenal introspection must be treated with great care and a high degree of suspicion, which entails that our introspective grasp of the value of phenomenal states cannot be highly trusted.

  3. ^

    That phenomenal states seem of particular significance compared to neighboring non-phenomenal states manifests itself in the fact that we draw a series of stark normative contrasts. For example, we draw a stark normative contrast between phenomenal and their closest non-phenomenal equivalent. We care a lot about the intense pain that one might phenomenally experience during a medical procedure – arguably, because such pain seems really bad. On the other hand, if, thanks to anesthesia, a patient does not experience phenomenally conscious pain during surgery, their brain might still enter in nonphenomenally conscious states that are the non-phenomenal states closest to phenomenal pain (something like “subliminal pain” or “unconscious pain”) – but we will probably not worry too much. If indeed we fully believe these states to be non-phenomenal – to have no associated subjective experience, “nothing it’s like” to be in them – we will probably judge that they have little intrinsic moral relevance – if at all – and we will not do much to avoid them. They will be a matter of curiosity, not of deep worry.

While that position is possible, Kammerer does make it clear that he does not hold it, and thinks it is untenable for similar reasons that he thinks moral status is not about phenomenal consciousness. (cf. p. 8)

Interesting. I guess he would think desires, understood functionally, are not necessarily quasi-phenomenal. I suspect desires should be understood as quasi-phenomenal, or even as phenomenal illusions themselves.

If unpleasantness, in phenomenal terms, would just be (a type or instance of the property of) phenomenal badness, then under illusionism, unpleasantness could be an appearance of badness, understood functionally, and so the quasi-phenomenal counterpart or an illusion of phenomenal badness.

I also think of desires (and hedonic states and moral beliefs, and some others) as appearances of normative reasons, i.e. things seeming good, bad, better or worse. This can be understood functionally or representationally. Here's a pointer to some more discussion. These appearances could themselves be illusions, e.g. by misrepresenting things as mattering or with phenomenal badness/goodness/betterness/worseness. Or, they could dispose beings that introspect on them in certain ways to falsely believe in some stance-independent moral facts, like that pleasure is good, suffering is bad, that it's better that desires be satisfied, etc.. But there are no stance-independent moral facts, and those beliefs are illusions. Or they dispose the introspective to believe in phenomenal badness/goodness/betterness/worseness.

Thanks! - super helpful and interesting, much appreciated.

I suppose my takeaway, all the while setting consciousness aside, is [still] along the lines: (a) 'having preferences' is not a sufficient indicator for what we're trying to figure out; (b) we are unlikely to converge on a satisfying / convincing single dimension or line in the sand; (c) moral patienthood is therefore almost certainly a matter of degree (although we may feel like we can assign 0 or 1 at the extremes) - which fits my view of almost everything in the world; (d) empirically coming up with concrete numbers for those interior values is going to be very very hard, and reasonable people will disagree, so everyone should be cautious about making any strong or universal claims; and (e) this all applies to plants just as much as to AI, so they deserve a bit more consideration in the discussion. 

When is Plant Welfare Debate Week??

Executive summary: Carl Shulman argues that the moral status of AI systems is an important issue that will become increasingly urgent as AI advances, and that we need better plans and understanding to address it.

Key points:

  1. AI moral status may depend more on agency (preferences, desires) than consciousness.
  2. People will likely attribute more moral status to AI as systems become more advanced and human-like.
  3. Many future AI systems may claim to have moral status or rights.
  4. Appeals to theories denying AI consciousness will become less compelling as AI progresses.
  5. AI advances will help improve our understanding of consciousness and moral status issues.
  6. A strong precautionary principle against harming AIs could ban current AI research practices.
  7. AI takeover scenarios could result in mistreatment of both humans and other AIs.
  8. No clear plans exist yet for ensuring even basic ethical treatment of future AI systems.



This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

More from rgb
Curated and popular this week
Relevant opportunities