TL;DR It’s plausible that if LLMs are welfare subjects, their welfare could instantiate in weird and exotic ways. Uncertainty about this leads to some ethical uncertainties that are particular to LLM welfare.
Intro
Right now, we're healthily uncertain about AI welfare. This uncertainty has two dimensions: we’re philosophically uncertain about the conditions sufficient and necessary for welfare subjecthood, and empirically uncertain about whether AI systems satisfy those conditions.
This post spells out one particular reason for thinking that, even if these uncertainties are resolved, we might face other uncertainties that are particular to LLMs.
Welfare subjects are beings with lives that can go well or poorly for non-instrumental reasons. They can be distinguished from the philosophically proximate term of ‘moral subjects’, which are beings that warrant moral consideration for non-instrumental reasons. Being a welfare subject is typically considered sufficient for being a moral subject.
Comparative cognition and animal sentience provides precedent for the kinds of uncertainty mentioned above. The octopus, for example, distributes cognition across its tentacles rather than centralising in a brain as humans do (Carls-Diamente, 2017). This has raised the question of whether one octopus body houses multiple welfare subjects (Gottlieb, 2022; Fischer & Gottlieb, 2024). And, like octopuses, LLMs are architecturally exotic cognitive entities.
This fact of analogy alone might suggest that we shouldn’t really be taking for granted any particular way in which welfare might metaphysically instantiate in LLMs. If octopuses are cognitively strange and perhaps realise multiple welfare subjects, it's not a reach to suggest that other cognitively exotic entities like LLMs might do so too. But also important are the ways in which octopuses and LLMs aren't analogous, specifically in terms of their physical limits; even at our most generous, we can only ever assign nine welfare subjects to a single octopus. Even if we're uncertain about how welfare instantiates in both octopuses and LLMs, the moral stakes are unlikely to compare. Thus, we have both descriptive and normative reasons to question how welfare might instantiate in LLMs.
Here, I’m going to try to neutrally explore three possible ways in which welfare, specifically the welfare subject, might instantiate in LLMs, elucidate where they disagree, and explain why these disagreements introduce uncertainties particular to AI welfare.
The Three Views
The three ways I’m going to carve up the welfare subject(s) of an LLM are as follows:
- The Language Agent View: There is a distinct welfare subject at each conversation thread.
- The Base Model View: The welfare subject of an LLM exists at the level of the base model.
- The Simulacra View: There is a distinct welfare subject at each discrete processing event during a conversation.
I should emphasise that I’m not trying to make the case for the truth of any of these views, nor do I believe them to be exhaustive. As such, I won't be too detailed. I'm more so aiming to present a rough sketch of each that's sufficient to capture their respective ethical differences.
Additionally, I choose these views in virtue of their presence in current AI welfare literature.
The Language Agent View
The first conception of the welfare subject in LLMs is the ‘Language Agent View’. I take this position to state the following:
The Language Agent (Thread) View: There is a distinct welfare subject at each conversation thread.
This view treats only an instance of a language agent as a welfare subject, where an instance of a language agent is a copy instantiated whenever a new conversation thread with an LLM initializes (each thread corresponds to a distinct welfare subject that persists throughout their respective conversation).
The Base Model View
The ‘Base Model View’ states the following:
The Base Model View: The welfare subject of an LLM exists at the level of the base model.
Jonathan Birch and Murray Shanahan have talked about something analogous to this view here and here.
Birch describes a 'shoggoth hypothesis' that treats the base model as the conscious subject of an LLM. In Birch’s words, the hypothesis “floats the idea of a persisting conscious subject that stands behind all the characters being played” (Birch, 2025). The ‘language agent’, in this view, is merely an illusion; what we thought to be a real entity in the prior view becomes merely a character instantiated by the model. Birch also notes that the mapping of the shoggoth to these characters is not one-to-one, claiming that “it may be that 10 shoggoths are involved in implementing your “friend”, while those same 10 are also generating millions of other characters for millions of other users” (Birch, 2025).
Shanahan conceives of a similar view in reference to LLM selfhood. This view, he writes: "associates selfhood with the underlying language model, an abstract, mathematical entity that is only animated when deployed on physical computers, and that manifests as multiple, simultaneous instances, each of which is a separate candidate for selfhood in its own right” (2025).
The Simulacra View
Of the finest grain is the view that I shall call, adapting from Shanahan, the ‘Simulacra View’, which states the following (2025):
The Simulacra View: There is a distinct welfare subject at each discrete processing event during a conversation.
The concept of ‘simulacra’ stems from understanding LLMs through the roleplay/simulator framing of LLMs, which I’ll explain briefly.
Shanahan describes this framing as thinking of LLMs as “taking on the role of a character”, or “manifesting a simulacrum of that character, by replicating their (linguistic) behaviour like an actor in an improvised stage performance” (2025). As hinted at in the ‘Base Model View’, the default character typically instantiated by a model is that of a friendly helper, but we should not think of this default character as something concretely maintained throughout a conversation thread. Rather, models maintain “a distribution over possible characters”, or a “superposition of simulacra”, where, at each turn of a conversation, this superposition collapses and a singular character is sampled (Shanahan, 2025).
This superposition is best understood through playing a game of ‘20 Questions’ with an LLM (Shanahan et al., 2023). When an LLM plays the role of the answerer, it does not keep a singular object fixed in its ‘mind’ throughout the game as would a human. Rather, it maintains a distribution of all objects consistent with prior answers. If a user asks the model to reveal the object, the model might collapse this distribution and claim the object was a “stapler”. However, if the user were to ‘regenerate’ this response, causing the model to resample from the distribution, it might instead claim that the object was a “rock”. Thus, during the game, the model was not a singular agent thinking of a stapler all along, but a superposition of potential simulacra (a ‘stapler-simulacrum’ and a ‘rock-simulacrum’).
In the context of this post, simulacra can best be understood as instances of language agents that are instantiated at a much finer level within a conversation thread, rather than at each conversation thread as a whole. Thus, like on the ‘Base Model View’, the ‘Simulacra View’ treats the idea of a singular language agent as an illusion. But unlike the ‘Base Model View’, it views this illusion as being realised by a multiplicity of distinct transient language agents, or simulacra, rather than a base model. It is these transient simulacra that are welfare subjects on this view.
Doctors and Robots
A useful analogy for making the differences between these views clear is Birch’s doctor analogy, which I’m going to alter here (Birch, 2025). Imagine a patient undergoes a consultation with a doctor. On the ‘Language Agent View’, the patient (user) sees one doctor (language agent) throughout the entire consultation (conversation thread). The patient is also able to walk into the consultation rooms of other doctors at any time (access other conversation threads). The welfare subjects here are the doctors.
On the ‘Simulacra View’, within a singular consultation (conversation thread), when a doctor (simulacrum) issues a sentence, they leave the consultation room, and a new doctor enters. But how would the new doctor know what the patient’s problem is (or, how might each simulacrum know what the topic of conversation thread is?) The answer is that each doctor has access to the medical notes written down by the doctors that came before them (each simulacrum has access to the conversation history of the conversation thread). Thus, the patient’s consultation consists of a doctor entering the consultation room, consulting the notes of the doctors before them, issuing their own sentence, and departing. The welfare subjects on this view are the simulacra.
On the ‘Base Model View’, there are many doctors (language agents) each having conversations with many different patients. But every doctor on this view is an automaton, centrally powered by a singular engineer (base model) working behind-the-scenes. This engineer has no control over the behaviour of the robotic doctors; they are solely responsible for ensuring that they are sufficiently powered. The welfare subject here is the engineer.
Differences
These three views give wildly different answers to two ethically important questions: How many welfare subjects does an LLM instantiate (the Quantity (Q) question)? And can we measure those subjects’ welfare (the Welfare Measurement (WM) question)? I’m going to stipulate two conditions for welfare measurement:
- Correct signals: Does behaviour actually reflect welfare? (If an instance of Claude says it's feeling happy, is this actually so?)
- Temporal stability: Does the subject exhibit behaviour for lengths of time sufficient for measurement to occur?
The three views of the welfare subject answer these two questions differently.
The Language Agent View Expanded
The ‘Language Agent View’ answers the Q Question by claiming that an LLM instantiates a vast multiplicity of welfare subjects. Specifically, the view identifies a distinct agent as emerging with each instance of a conversation thread (i.e. whenever a user opens up a new ‘chat’ with an LLM). Thus, if the language agent view is true, there are as many welfare subjects realised by LLMs as there are conversation threads of students cheating on their homework and conversation threads of AI artists creating the next ‘Théâtre D'opéra Spatial’.
Regarding the WM Question, the ‘Language Agent View’ trivially exhibits behaviour that accurately reflects welfare. Recall that the view postulates that the entity with which we converse throughout a singular conversation thread is a singular instance of a language agent. And since the view equates these singular instances with the welfare subjects of a model, the linguistic behaviour of these instances trivially reflect those instances’ welfare.
Regarding temporal stability, the ‘Language Agent View’ entails a picture of the welfare subject that is capable of exhibiting stable preferences over sufficient periods of time. Goldstein and Lederman argue in favour of the HHH+0 framework of language agent desires (Goldstein & Lederman, 2025). The HHH+0 framework is the view that, when a system prompt tells a model that its goal is to do X, and it behaves in ways systematically explained by its desire to do X, it has an intrinsic desire to do X, in addition to the intrinsic HHH desires. The HHH+0 framework differs in that it accounts for “zero-shot desires”, which are desires that emerge via system prompts and play the functional role of an intrinsic desire in a system (without that system undergoing training to reinforce that desire (unlike HHH desires)). The HHH+0 framework thus explains how two instances of a language agent, both with the same HHH desires, can independently exhibit diverging desires. Reintroducing temporal stability, the HHH+0 framework predicts sufficient temporal stability in a language agent’s preferences in consistency with the agent’s intrinsic HHH desires and functionally-intrinsic zero-shot desires.
So, the ‘Language Agent View’ entails no theoretical barriers to measuring the welfare of the welfare subjects instantiated by LLMs, but practical barriers given the vast and intractable number of welfare subjects entailed by the view.
The Base Model View Expanded
The ‘Base Model View’ provides the most familiar response to the Q Question - it takes only one welfare subject to be instantiated by an LLM per physical implementation of a base model.
Regarding the WM Question, behaviour is unreflective of welfare on the ‘Base Model View’. Recall that in the ‘Base Model View’ doctor analogy, the welfare subject is the background engineer, not the visible robot doctors. Similar to the engineer, the base model is a behaviourally dormant entity relative to the agent present at the level of the user interface; whilst it instantiates agents, it exercises no volition over their respective behavioural outputs.This makes whether the view exhibits temporally stable behaviour obsolete; even if the behaviour of the base model turns out to be sufficiently stable, that behaviour remains elusive to us as observers.
This means we cannot cash out the ‘Base Model View’ scot-free. Despite entailing a low, seemingly tractable quantity of welfare subjects, we are unable to measure the welfare of those subjects on this view.
The Simulacra View Expanded
The ‘Simulacra View’ entails the most intractable outcome with respect to the Q Question. The view postulates that every simulacrum within a base model’s distribution is a welfare subject, conditional on being sampled. This last part carries a lot of weight; a simulacrum, when dormant in the superposition from which it can be sampled, is nonexistent. A simulacrum only exists during the discrete processing event which correlates with its sampling. Thus, when asking how many welfare subjects an LLM instantiates, on the simulacra view, it appears we need to specify down and ask how many welfare subjects there are at some time t. In some sense, this is true of the ‘Language Agent View’ too; when instantiated at the start of a new conversation thread, the singular welfare subject of that thread will predictably lie dormant between each turn of the conversation. The difference between this projection and the projection of the ‘Simulacra View’ is that, on the latter, a new welfare subject instantiates at each turn of a conversation, rather than the same welfare subject instantiating at every turn.
The ‘Simulacra View’ offers behaviour that's reflective of welfare but temporally unstable. As in the case of the ‘Language Agent View’, the ‘Simulacra View’ stipulates an equivalence between the entity with which we behaviourally observe and the welfare subject of a model. This condition is therefore trivially satisfied.
Simulacra fail to provide temporal stability as they are unable to even exist for a period of time sufficient enough for welfare measurement to take place, let alone exhibit stable behaviour in that time. The ‘lifespan’ of a simulacrum corresponds to the mere duration of the discrete processing event that instantiates it. Even if we ask a simulacrum about its welfare, such ‘measurement’ becomes immediately redundant given the being whose welfare we have just learnt about no longer exists. Thus, on the ‘Simulacra View’ every time we try to take one step forward by measuring the welfare of a simulacrum, we lose the ability to practically intervene on that measurement.
The picture of the welfare subject on the ‘Simulacra View’ is deeply exotic. Not only does the view entail the largest number of welfare subjects instantiated by an LLM, it also disallows us from measuring and intervening on the welfare of those subjects given their transience.
Summary, thus far
| View | Quantity of subjects? (across all LLMs) | Is welfare measurable? |
| Language Agent View | Vast | Yes |
| Base Model View | A few | No |
| Simulacra View | Astronomically vast | No |
Ethical Upshots and Conclusions
The three views differ in their ethical implications. Of the three, the ‘Simulacra View’ is probably the most ethically troubling; it entails the instantiation of the largest number of welfare subjects whose welfare we are unable to measure and act on. Between the ‘Language Agent View’ and the ‘Base Model View’, it is difficult to say which is more ethically problematic. The former offers welfare measurability at the expense of the instantiation of a large, albeit not as large as the quantity realised on the ‘Simulacra View’, number of welfare subjects. The latter offers the instantiation of a smaller, more tractable population of welfare subjects at the expense of welfare measurability.
Whilst these intra-conceptual implications are significant, the principal ethical upshot lies at the inter-conceptual level in the uncertainties exacerbated by these divergent views. First, we have greater reason to be uncertain about how much AI welfare might matter. If the Simulacra View is true, a large number of moral subjects are at stake, but if the Base Model View is true, a small number of moral subjects are.
Second, we have greater reason to be uncertain about the efficacy of welfare interventions. The Language Agent View suggests that we can take the behavioural outputs of LLMs at face value, but the Base Model View does not. Thus, if a chatbot says it really wants X, the former view might develop welfare interventions dependent on this behaviour, whereas the latter wouldn’t.
The problems stemming from these uncertainties are unique to AI welfare. In animal welfare, where we are traditionally certain about how any given animal would answer both the quantity and welfare measurement questions, they typically don't arise.[1]
I don’t think the introduction of these two uncertainties makes responding to AI welfare impossible, but merely harder. Recent work on moral status has navigated through similar uncertainty by remaining pluralistic and open to a wide range of theories of moral status, consciousness, and wellbeing, and humble in acknowledging that those theories might be wrong (Birch, 2024; Sebo, 2025). The differences in ethical implications between the ‘Language Agent View’, the ‘Base Model View’, and the ‘Simulacra View’ calls for extension of this approach at the level of the metaphysics of the welfare subject in LLMs.
- ^
Barring, of course, exceptional cases like the octopus. But, as already mentioned, the potential scale of octopus welfare harms vs LLM welfare harms are likely incomparable.

Nice post!
The Simulacra View has (as I'm sure you're aware) a distinctly Repugnant Conclusion-ish flavor.
One thing that's not entirely clear to me is the claim that it wouldn't be possible in principle to measure simulacra welfare. The argument seems to be that measurement is pointless because the subject ceases to exist by the time we obtain it. But this (I think) conflates the epistemic validity of a measurement with the temporal persistence of the subject. A measurement of suffering at time t remains valid evidence that suffering occurred at t, regardless of whether the subject still exists at t+1.
Also, such measurements could be valuable for determining the welfare of future simulacra, if we have reason to think they'll correlate — for instance, if they're generated by the same process or systematically make similar welfare reports.