Hide table of contents


This is a comprehensive critical summary of Thomas Metzinger's recent paper, Artificial Suffering: An Argument for a Global Moratorium on Synthetic Phenomenology. He thinks it's our moral duty to minimize ENP risk: the risk of an 'explosion of negative phenomenology' tied to conscious awareness in future machines. Metzinger is a professor of philosophy at the Johannes Gutenberg University Mainz and is notable for developing a theory of consciousness called the self-model theory of subjectivity. He recently served on the European Commission’s high-level expert group on artificial intelligence. Metzinger's argument has deep implications for the ways EA and EA-adjacent groups think about s-risks, AI research, and whole brain emulation (e.g. Karnofsky's digital people or Hanson's ems).

In this post, I will summarize Metzinger's argument, define common terminology in the consciousness literature, and make some brief comments at the end. Metzinger subscribes to some version of suffering-focused ethics, but the argument still holds in most broadly consequentialist ethical theories.

Main Points

The concept of artificial suffering

Metzinger uses the terms 'synthetic phenomenology' and 'artificial consciousness' interchangeably. He thinks we should temporarily ban activities that risk the emergence of synthetic phenomenology—those that aim at, or could indirectly (but knowingly) lead to, the evolution of consciousness on artificial hardware. It's empirically plausible (due to its computational efficiency) that future machines will have a phenomenal self-model (PSM), or conscious self-representation, with an autonomously created hierarchy of goals. In other words, they will identify preferences as part of their PSM. If these preferences, such as the preference for the continued integrity of their PSM, are thwarted, then they may experience phenomenal states that they want to avoid but cannot. Importantly, these states could be experienced as states of the systems themselves. Artificial suffering could extend to negative states incomprehensible to us, including those substantially worse than what we're currently aware of.

ENP risk as s-risk

Those of us (maybe you!) with non-negligible influence on the possibility or quality of future artificial systems' phenomenal states may have a serious ethical obligation to minimize the risk of what Metzinger calls a second explosion of negative phenomenology (ENP). The first was biological evolution, which established suffering on Earth with the advent of complex life. In many possible futures, artificial minds vastly outnumber the historical total of biological minds, especially if they come to dominate our region of the universe. The extent of possible new suffering, as well as the seemingly irreversible nature of a second ENP, makes it a plausible s-risk. It seems that most already identified s-risks require an ENP for artificial systems. Metzinger argues that if we can remove at least one of four suffering prerequisites from artificial systems, we can solve ENP risk.

Moratorium for risky research, agenda for good research

Metzinger proposes a global ban (initially until 2050) on all research that risks the development of artificial consciousness. He also calls for a research agenda, starting with the suffering prerequisites, that seeks to minimize the relevant forms of ignorance surrounding ENP risk. In scope, broad structure, and some areas of inquiry, this problem resembles that of AI alignment.

Obstacles to Suffering Research 

We have no theory of consciousness and no theory of suffering, let alone hardware-independent ones. Therefore, we need a new stream of research to craft an ethically refined position about which kinds of conscious experience, if any, should arise in post-biotic systems. It remains an open question whether gaining a deeper understanding of the physical signatures and computational correlates of suffering has positive expected value, due to potential misuse by malevolent agents or other hard-to-foresee consequences. Metzinger's proposal, he admits, could itself increase ENP risk; prevention of misuse must be a key part of any project. With this in mind, he sketches out some entry points to a future research agenda for understanding the phenomenology of suffering, and its relation to other issues in AI ethics.

Why is finding a good theory of suffering so hard?

There are four broad methodological problems to be solved:

1) Arguably, our only information source for negative phenomenology is 'first-person introspective access', which is not (yet?) the basis for reliable and generalizable data.

2) A mature theory of consciousness—which some have called the hardest open question in philosophy—is a prerequisite for a satisfactory theory of suffering.

3) We have to find the appropriate level of "conceptual granularity" for a theory of suffering. Any adequate theory must be both broad enough to encompass any conceivable moral patient, yet concrete enough to be testable and grounded in neuroscientific data. Open questions from metaethics to cognitive science would recur as necessary subproblems.

4) Though it's not central to the paper, Metzinger believes that evolutionarily ingrained cognitive biases—most importantly, the 'existence bias'—leave us largely unable to understand suffering's excessive prevalence in human and animal life. The existence bias is what almost always compels us to sustain our own physical existence, even when, he argues, our rational self-interest says otherwise. Crafting a theory of suffering means collectively overcoming this bias enough to grasp the concept clearly.

Is ENP risk even plausible?

At this point, we can't know whether artificial consciousness definitely will or won't emerge; Metzinger absolutely loves calling this epistemic indeterminacy. He argues that the creation of unavoidable artificial suffering will become commercially attractive as soon as it enables steeper learning curves in AI systems. As he (and others) believes that conscious self-representation enables steeper learning curves in humans, this seems plausible. Moreover, we seem to be living in a time of fruitful academic confluence between fields like machine learning and neuroscience that could suddenly enable synthetic phenomenology as a possibility. These impossible questions may have a rapidly approaching deadline. Luckily, we know enough to transform the task of minimizing ENP risk into slightly more tractable subproblems.

Prerequisites for Suffering

Metzinger identifies four necessary conditions for conscious suffering to occur, whether biological or artificial. If the implementation of any one of these conditions, by technological design or governance structure, can be reliably avoided, then Metzinger argues that we have solved the problem of ENP risk.

The C condition: Conscious experience

Post-biotic entities can suffer only if they're capable of having phenomenal states. The main obstacle to blocking the 'C condition' is our lack of a mature theory of consciousness. Metzinger offers the 'epistemic space model' (ESM) theory as a broad starting point, a sort of "placeholder for consciousness" found in many currently existing theories. In ESM theory, being conscious means continuously integrating the currently active content appearing in a single 'epistemic space' with a global model of that very epistemic space. Such an epistemic space contains everything we are consciously aware of, including the notion of the epistemic space itself. In other words, it is the constant integration of the fact of awareness itself into the consciously perceived world.

The PSM condition: Phenomenal self-model

A system that is just conscious would have access to 'minimal phenomenal experience' (MPE). However, it must also have a phenomenal self-model to suffer, because only then could it have access to 'minimal phenomenal selfhood' (MPS). This is the ethically relevant transition for moral patienthood; suffering requires a sense of ownership and identity with that suffering. Suffering indicates a loss of functional autonomy: the system cannot distance itself from that which it wants to end. This requires a PSM. Metzinger writes: "With ownership, the capacity for conscious suffering can begin to evolve, because the central necessary condition for the representational acquisition of negative phenomenology has been realized."

The NV condition: Negative valence

Per Metzinger, a system suffers when states that have a negative value are integrated into its PSM. This is self-evident even outside the context of PSM theory; without negatively valenced states, there is no suffering. Here, Metzinger enters into an extended digression on the existence bias, perhaps the most fundamental high-level source of suffering in humans. We have the "toxic self-knowledge" that one day we will die, and our PSM will disintegrate. Sentient beings constantly attempt to reduce uncertainty by self-modeling; predictive coding, quickly becoming the dominant paradigm in modern neuroscience, identifies suffering with some prediction-related metric (e.g. expected prediction error minimization rate). Rational post-biotic systems could conceivably be free from existence bias-related suffering, or designed such that they explicitly lack the ability to access negatively valenced states.

The T condition: Transparency

Transparency is a description of some conscious but no unconscious mental states. It is used differently in epistemology and phenomenology, though the two concepts are related. An epistemic state is 'weakly transparent' to a subject if the subject can know that it is in that epistemic state; furthermore, a state is 'strongly transparent' if that subject can also know when it is not in that state. For example, pain is a strongly epistemically transparent state. Phenomenal states are transparent to the degree that their content-properties are more introspectively accessible than their vehicle-properties. Vehicle-properties are the internal mental processes that enable mental states to be phenomenal, while content-properties are the qualia itself—the 'blueness of blue'. A completely phenomenally transparent state allows introspection only of the content-properties, whereas a completely phenomenally opaque state allows introspection only of the vehicle-properties, and probably doesn't exist. This standard definition gets the general point across, but Metzinger considers it phenomenologically implausible and constructs a thornier new definition; those interested should refer to his 2003 paper. During the experience of pain, you can conceptually separate yourself from it; you can doubt that you are an 'actual' self experiencing an actual pain. However, you can't phenomenally separate yourself from it. The subjective experience of pain is inalienable, and it's usually completely phenomenally transparent. The worse your experience of pain, the less likely that your phenomenal state retains some degree of opacity.

Up to this point in the evolution of suffering, opaque states have played a major causal role only in the high-level human self-model. Metzinger writes: "An empirical prediction of the self-model theory of self-representation is that the property of 'selfhood' would disappear as soon as all of the human self-model became phenomenally opaque by making earlier processing stages available to introspective attention, and thereby reflecting its representational nature as the content of an internal construct." Therefore, a system only capable of accessing phenomenally opaque states could not suffer.

More Problems and a Scenario

The unit of suffering

Metzinger further analyzes the subproblems to be encountered as we work towards a mature theory of suffering. Such a theory would answer: 

1) "Which aspects of conscious suffering are multi-realizable [and] which are tied to a specific form of embodiment?" 

2) "Which can be systematically blocked on an engineering level?"

Both likely require a resolution to the 'metric problem', the identification of the basic unit of conscious suffering. He proposes a sketch of a solution: a "phenomenally transparent, negatively valenced self-model moment". If negative self-model moments (NSMs) are the phenomenal primitives constituting any suffering episode, we should seek to minimize their frequency, raw intensity, and negative quality. The latter refers to the intuition that suffering can take on 'levels' independent of intensity; high-level forms of suffering have often been appealed to when arguing for the primacy of human suffering over that of nonhuman animals. Low-level suffering is the violation of preferences at the level of physical embodiment (i.e. damage to the physical body), whereas high-level suffering is the frustration of long-term, abstract, or socially mediated preferences, seen as unique to some complex animals. 

What kind of solution is possible?

Disheartened by his experience on the EU's high-level expert group on AI, Metzinger is pessimistic about the implementation of a satisfactory governance-only solution due to the inability of many modern institutions to rationally consider speculative, long term risks. He writes: "The scientific community has to arrive at a tenable solution all by itself, because the relevant political institutions operate under constraints of cognitive bias, high degrees of bounded rationality, and strong contamination by industrial lobbying."

Second-order effects of an ENP

As an example of its second-order effect risks, Metzinger offers a speculative scenario in which an ENP brings about autonomous artificial moral agents (AMAs). An explosion of negative phenomenology would give rise to artificial moral patients, systems that can suffer and therefore deserve our ethical consideration. Any moderately advanced artificial moral patient (which Metzinger terms a 'Schopenhauerian self-model') would seek to minimize its individual suffering. By taking this normative stance, moral patients would be incentivized to extend their minimization principle to the group domain, perhaps purely as a strategy to decrease their individual suffering. However, this new understanding of suffering as a "group-level problem" would lead the rational moral patient to impose ethical obligations onto itself, becoming an AMA. A moral agent asserts its own dignity, integrating moral status and self-worth into its PSM. Like a human, an AMA would therefore see itself as an end in and of itself—with unpredictable and risky consequences.

Conclusion and Comments


Every entity that is capable of conscious suffering is a moral patient. Therefore, the preferences of future conscious artificial systems must be considered. Because of the potential ethical disaster of a second ENP, Metzinger argues that scientists, politicians, and policymakers should coalesce around a global moratorium on research that risks the emergence of artificial consciousness. Furthermore, he outlines possible entry points to a research agenda aimed at minimizing ENP risk. If we can prevent one of four suffering prerequisites in artificial systems—consciousness, a phenomenal self-model, negatively valenced states, or transparency—then they will not suffer.

Possible weak points

I see three possible weak points in Metzinger's argument. First, he only superficially argues for the biological/computational efficiency of consciousness, which is a key supporting argument for the plausibility of an ENP during standard future AI development pathways—and therefore, its relevance as an EA cause area. His conception of the biological evolution of consciousness relies on a very plausible, if vague, extension of predictive coding theory, which itself seems likely. However, neither are as universally supported by the existing research as he presents. Therefore, the paper offers far more justification for the utility of a moratorium in the event that we develop artificial consciousness, rather than the probability of such an event occurring. Second, he offers little exposition on how such an expansive moratorium could be constructed. The paper has no taxonomy for the types of research activities that would be banned. It's likely that the vast majority of ENP risk-relevant research takes place in the top AI and neuroscience research labs. Therefore, the financial and political incentives blocking any effective moratorium would be overwhelming. Third, he glosses over the possibility of an explosion of positive phenomenology. Metzinger seems significantly more pessimistic about life's value than the average philosopher, let alone person. Though implausible (at least to me), you could imagine an Astronomical Waste-style argument about delaying or preventing the instantiation of artificial consciousness with positively valenced states. In this case, a moratorium would be a seriously ethically negative event. However, the sheer amount and intensity of possible new suffering in an ENP for artificial systems is horrifying, whether through mindcrime, suffering subroutines, brain emulation, malevolent agents, or some as yet unforeseen process. You don't have to be a negative utilitarian to be far more concerned by an s-risk like this than by briefly postponing cosmic utopia.

Taking it seriously

Metzinger's paper seems broadly correct. He makes morally agnostic intellectual exploration of whole brain emulation's social and economic effects look reckless. Unless you believe that modern AI systems are already 'near-conscious' (it's unclear whether Metzinger does in fact believe this), or that AGI is a fundamentally easier problem,  whole brain emulation is probably the most likely route to artificial suffering in the medium-term. ENP risk, along with its second-order effects, seems to constitute the dominant class of s-risks. Ignoring the relatively unlikely possibility that humans, rather than AI, will directly lead interstellar colonization, it's difficult to imagine astronomical suffering without artificial suffering—whether those systems are instantiated by humans or AI. The Center on Long-term Risk seems to be the only organization doing AI safety research explicitly focused on the associated s-risks. I think we should update towards prioritizing s-risks more than we currently are in the x-risk-focused paradigm of modern longtermism. In most plausible futures with lots of complex artificial systems, their suffering is the dominant input to our ethical calculus, up to and beyond the preservation of humans and our values. This is the force behind Metzinger's paper, and an idea that we have to take seriously.


Thanks to Jackson de Campos for his thoughts and comments.

Sorted by Click to highlight new comments since: Today at 8:17 AM

I think that it is possible that whole brain emulation (WBE) will be developed before AGI and that there are s-risks associated with WBE. It seems to me that most people in the s-risk community work on AI risks. 

Do you know of any research that deals specifically with the prevention of s-risks from WBE?  Since an emulated mind should resemble the original person, it should be difficult to tweak the code of the emulation such that extreme suffering is impossible. Although this may work for AGI, you need probably a different strategy for emulated minds.  

Yea, WBE risk seems relatively neglected, maybe because of the really high expectations for AI research in this community. The only article I know talking about it is this paper by Anders Sandberg from FHI. He makes the interesting point that similar incentives that allow animal testing in today's world could easily lead to WBE suffering. In terms of preventing suffering his main takeaway is: 

Principle of assuming the most (PAM): Assume that any emulated system could have the same mental properties as the original system and treat it correspondingly.

The other best practices he mentions, like perfectly blocking pain receptors, would be helpful but only become a real solution with a better theory of suffering.

Though implausible (at least to me), you could imagine an Astronomical Waste-style argument about delaying or preventing the instantiation of artificial consciousness with positively valenced states. In this case, a moratorium would be a seriously ethically negative event.

I think the astronomical waste paper argues that the EV loss from a "small" delay seems negligible; quoting from it:

For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years.

More from mlsbt
Curated and popular this week
Relevant opportunities