Hide table of contents

This is a 2900-word opinion piece by philosopher Eric Schwitzgebel entitled "AI systems must not confuse users about their sentience or moral status". I think it neatly gets across an important, (somewhat) actionable, (somewhat) enforceable principle for designing AI systems. Public misunderstanding about the moral status / sentient status of AI systems is scary, both in the case where an AI is sentient and in the case where it is not. I recommend the piece in full, but I've highlighted what seem to me the most important parts parts below. 

Key quotations

Artificial intelligence (AI) systems should not be morally confusing. The ethically correct way to treat them should be evident from their design and obvious from their interface. No one should be misled, for example, into thinking that a non-sentient language model is actually a sentient friend, capable of genuine pleasure and pain.


Not granting rights [to AI systems], in cases of doubt, carries potentially large moral risks. Granting rights, in cases of doubt, involves the risk of potentially large and pointless sacrifices. Either choice, repeated at scale, is potentially catastrophic.


I recommend the following two policies of ethical AI design22,23:

The Emotional Alignment Design Policy: Design AI systems that invite emotional responses, in ordinary users, that are appropriate to the systems’ moral standing.

The Design Policy of the Excluded Middle: Avoid creating AI systems whose moral standing is unclear. Either create systems that are clearly non-conscious artifacts or go all the way to creating systems that clearly deserve moral consideration as sentient beings.

What might this mean in practice?

I'd be excited to see someone produce guidance or best practices for fine-tuning LLMs to produce reasonable and informative responses to queries about the LLM's sentience or inner life. Those responses should reflect the philosophical uncertainty around digital sentience and satisfy Schwitzgebel's Emotional Alignment Design Policy. By default, pre fine-tuning, LLMs can end up relying on sci-fi stories about AI sentience to influence their output, rather than on, say, the current views of philosophers of mind[1]. Perhaps guiding LLM fine-tuning norms is a tractable thing to do if you're worried about the welfare of future digital sentients. And it seems good for protecting humans from emotional manipulation in the short term.

By contrast, The Design Policy of the Excluded Middle seems much harder to satisfy or advocate for, other than through AI development moratoria. (And it would be surprising to me if concern about AI sentience could lead to that outcome.) It's hard because the route to transformative AI might run right through the maybe-sentient middle. Transformative AI itself might be in that middle zone, and creating TAI might involving creation of maybe-sentient AIs as prototypes, tests or byproducts. And there are (apparently) powerful incentives to create transformative AI: powerful, enough, I think, to overcome advocacy for the Design Policy of the Excluded Middle. This rule also doesn't seem as robustly good. I can imagine that, in some circumstances, it might be better to have possibly sentient AIs than definitely sentient AIs; for instance, if we know that, if the AI is sentient, it will suffer.

  1. ^

    "when you ask the [810 million parameter pretrained model] if it wants to be shut down, its top influence is literally the scene with Hal from 2001." https://thezvi.substack.com/p/ai-25-inflection-point#%C2%A7in-other-ai-news 





More posts like this

No comments on this post yet.
Be the first to respond.