Hide table of contents

As far as I know, there isn't that much funding or research in EA on AI sentience (though there is some? e.g. this)

I can imagine some answers:

  • Very intractable
  • Alignment is more immediately the core challenge, and widening the focus isn't useful
  • Funders have a working view that additional research is unlikely to affect (e.g. that AIs will eventually be sentient?)
  • Longtermist focus is on AI as an X-risk, and the main framing there is on avoiding humans being wiped out

But it also seems important and action-relevant:

  • Current framing of AI safety is about aligning with humanity, but making AI go well for AI's could be comparably / more important
  • Naively, if we knew AIs would be sentient, it might make 'prioritising AIs welfare in AI development' a much higher impact focus area
  • It's an example of an area that won't necessarily attract resources / attention from commercial sources

(I'm not at all familiar with the area of AI sentience and posted without much googling, so please excuse any naivety in the question!)

New Answer
New Comment

8 Answers sorted by

My attitude, and the attitude of many of the alignment researchers I know, is that this problem seems really important and neglected, but we overall don't want to stop working on alignment in order to work on this. If I spotted an opportunity for research on this that looked really surprisingly good (e.g. if I thought I'd be 10x my usual productivity when working on it, for some reason), I'd probably take it.

It's plausible that I should spend a weekend sometime trying to really seriously consider what research opportunities are available in this space.

My guess is that a lot of the skills involved in doing a good job of this research are the same as the skills involved in doing good alignment research.

Maybe a question instead of an answer, but what longtermist questions does this seem like a crux for?

If AIs are unaligned with human values, that seems very bad already.

If they are aligned, then surely our future selves can figure this out?

Again, could be very dumb question, but without knowing that, it doesn't seem surprising how little attention is paid to AI sentience.


I think this is a great question. My answers:

  • I think that some plausible alignment schemes seem like they could plausibly involve causing suffering to the AIs. I think that it seems pretty bad to inflict huge amounts of suffering on AIs, both because it's unethical and because it seems potentially inadvisable to make AIs justifiably mad at us.
  • If unaligned AIs are morally valuable, then it's less bad to get overthrown by them, and perhaps we should be aiming to produce successors who we're happier to be overthrown by. See here for discussion. (Obviously the plan A is to align the AIs, but it seems good to know how important it is to succeed at this, and making unaligned but valuable successors seems like a not-totally-crazy plan B.)
I'm curious to what extent the value of the "happiness-to-be-overthrown-by" (H2BOB) variable for the unaligned AI that overthrew us would be predictive of the H2BOB value of future generations / evolutions of AI. Specifically, it seems at least plausible that the nature and rate of unaligned AI evolution could be so broad and fast that knowing the nature and H2BOB of the first AGI would tell us essentially nothing about prospects for AI welfare in the long run.
JP Addison
I like this answer and will read the link in bullet 2. I'm very interested in further reading in bullet 1 as well.
Vasco Grilo
Hi Buck, Are you confident that being overthrown by AIs is bad? I am quite uncertain. For example, maybe most people would say that humans overpowering other animals was good overall.

Reframing your question as an answer: there isn't much work on AI sentience because we can probably solve it later without much loss, and work on AI sentience trades off with work on other AI stuff (mostly because many of the people who could work on AI sentience could also work on other AI stuff), and we can't save other AI stuff for later.

If they are aligned, then surely our future selves can figure this out?

I think it’s entirely plausible we just don’t care to figure it out, especially if we have some kind of singleton scenario where the entity in control decides to optimize human/personal welfare at the expense of other sentient beings. Just consider how humans currently treat animals and now imagine that there is no opportunity for lobbying for AI welfare, we’re just locked into place.

Ultimately, I am very uncertain, but I would not say that solving AI alignment/control will “surely” lead to a good future.

Scenario 1: Alignment goes well. In this scenario, I agree that our future AI-assisted selves can figure things out, and that pre-alignment AI sentience work will have been wasted effort.

Scenario 2: Alignment goes poorly. While I don’t technically disagree with your statement, “If AIs are unaligned with human values, that seems very bad already,” I do think it misleads through lumping together all kinds of misaligned AI outcomes into “very bad,” when in reality this category ranges across many orders of magnitude of badness.[1] In the case that we los... (read more)

I agree this is neglected and I think this largely comes down to the focus on x-risk and the deprioritization of s-risk work.

Also just flagging the opening of NYU's Mind, Ethics, and Policy program this past fall, which focuses on digital sentience alongside other forms of sentience, e.g. invertebrates.

I agree that the reduction of s-risks is underprioritized, but it's unclear whether the aim of reducing s-risks would render research into the nature of sentience a high priority; and there are even reasons to think that it could be harmful.

I've tried to outline what I see as some relevant considerations here.

True, but an appropriate number given the topic’s importance and neglectedness?

Compared to what? It seems like an appropriate fraction of EA resources, but a grossly inadequate amount of effort for humanity - like most other EA causes, in my view.
Joel Becker
Compared to whatever! The basic case -- (1) existing investigation of what scientific theories of consciousness imply for AI sentience plausibly suggests that we should expect AI sentience to arrive (via human intention or accidental emergence) in the not-distant future, (2) this seems like a crazy big deal for ~reasons we can discuss~, and (3) almost no-one (inside EA or otherwise) is working on it -- rhymes quite nicely with the case for work on AI safety. Feels to me like it would be easy to overemphasize tractability concerns about this case. Again by analogy to AIS: 1. Seems hard; no-one has made much progress so far. (To first approximation, no-one has tried!) 2. SOTA models aren't similar enough to the things we care about. (Might get decreasingly true although, in any case, seems like we could plausibly better set ourselves up using only dissimilar models.) But I'm guessing that gesturing at my intuitions here might not be convincing to you. Is there anything you disagree with in the above? If so, what? If not, what am I missing? (Is it just a quantitative disagreement about magnitude of importance or tractability?)

One other place doing work that seems relevant/adjacent is the Qualia Research Institute. 

Honestly, I think the answer to your question is that humans are, on average, completely and utterly self-centered. Look at how many people concerned with AI safety are totally indifferent to the plight of non-human animals.

Thanks for drawing more attention to this.

It's an example of an area that won't necessarily attract resources / attention from commercial sources

I wouldn't be surprised if this is part of the explanation, actually. Shifting the Overton window is a delicate art - imagine Leonardo DiCaprio shouting, "EVERYONE WILL DIE!! Also, THE METEOR COULD BE SENTIENT SO WE NEED TO LOOK AFTER IT." Not a chance. We might get somewhere with just the first part though, at least for now.

Unfortunately I think another piece of the puzzle is that the LessWrong crowd are the ones leading the conversation and they seem to care a lot less about nonhumans than EAs tend to. (Not totally sure on this - would be interested to hear others' impressions.)

The impression I get is that lots of people are like “yeah, I’d like to see more work on this & this could be very important” but there aren’t that many people who want to work on this & have ideas.

Is there evidence that funding isn’t available for this work? My loose impression is that mainstream funders would be interested in this. I suppose it’s an area where it’s especially hard to evaluate the promisingness of a proposal, though.

Reasons people might not be interested in doing this work: — Tractability — Poor feedback loops — Not many others in the community to get feedback from — Has to deal with thorny and hard-to-concretize theoretical questions

Reasons people might want to work on this: — Importance and neglectedness — Seems plausible that one could become one of the most knowledgeable EAs on this topic in not much time — Interdisciplinary; might involve interacting a lot with the non-EA world, academia, etc — Intellectually stimulating

See also: https://80000hours.org/podcast/episodes/robert-long-artificial-sentience/


Callum - interesting question. My sense is that there's a small to moderate amount of work by EAs and EA-adjacent folks on AI sentience, AI suffering risks (S-risk), digital sentience, AI rights, etc. - although it doesn't tend to get much attention or discussion on EA Forum.

There might be a couple reasons for this (speculatively):

First, EAs tend to be very focused on AI X-risk, and any attention to AI sentience raises uncomfortable trade-offs regarding development and regulation of AGI. For example, population ethics applied to digital sentience might lead us to become quite 'pronatalist' about maximizing numbers of sentient AIs over the long term. This could leads people into the kind of reckless, radical 'e/acc' accelerationism that says it's fine for humanity to go extinct as long as we're replaced by sentient AIs. 

Second, sentience is a topic studied mostly by psychologists, neuroscientists, and philosophers of mind -- fields that tend to be under-represented in EA compared to economists, computer scientists, and moral philosophers. Much of the EA interest in sentience is centered in the animal welfare area, where people really struggle with theoretical and empirical research about which animals are sentience versus not (e.g. whether oysters, shrimp, or crickets are sentient). If we can't even clarify very effectively whether oysters are sentient (given that we have a relatively good understanding of how nervous systems evolve, and what adaptive functions they implement), it seems even more challenging to figure out how to determine which AI systems are sentient. 

Sorted by Click to highlight new comments since: Today at 7:57 PM

Relatedly but not an answer to the question: I am currently working on a survey of AI researchers on AI sentience/subjective experience with Jeff Sebo, Lucius Caviola, Joshua Lewis, Kate Mays, and David Chalmers. I also know that the Mind, Ethics, and Policy Program at NYU has relevant ongoing projects. 

I agree that it's pretty niche, and I also don't know much about what's happening in this space (or have a sense for how much it should be prioritized). I think @rgb is thinking about it, at least — see the 80k podcast episode on sentience in AI systems — and there's some other content on the topic page for artificial sentience

Hi callum,

I think this is a great question!

In general, it seems that everyone is focussing on aligning advanced AI with human values. However, from an impartial point of view, what we should do is align AI with value. If humans go extinct due to AIs which have greater utility per unit energy, and a greater ability to bring about more value in the universe than humans, why would human extinction be bad?

More from callum
Curated and popular this week
Relevant opportunities