Human beings occupy a fairly odd position in the universe. We're this . . . you know, super funky blob of matter that has (somehow) become aware, and then also models, remembers, and asks, "what the heck am I?" And the messy truth is that whether or not other minds exist elsewhere in the cosmos, and whether or not future AI systems cross into something approaching genuine sentience, it's already pretty unbelievable that at least on one small planet (our own), the universe has become locally capable of self-description.
This essay (perhaps foolishly) attempts to examine that proposition (and I realize I might take some heat for this weird chain of thought that came to me while driving to my favorite coffee shop on a weekend) with greater precision, specifically by exploring three hypotheses about human beings.
H1: We exist.
H2: We know.
H3: We believe.
Then I'll turn to AI systems and play the same game. Namely:
A1: AI systems exist.
A2: AI systems know.
A3: AI systems believe.
My meta-hypothesis here is that:
- Humans seem to satisfy all three hypotheses in a pretty robust sense.
- AI systems satisfy the first, and a thin version of the second, but don't yet satisfy the third hypothesis in the strongest and most important sense.
Before getting into it: this framework ended up cleaner than the way I actually arrived at it. I was literally thinking out loud about some of this crap and turned these ideas over for a while. It's sort of been bugging me though. Some of the symmetry here is probably doing more work than it should. I’ll flag places where I’m less confident as I go.
I think the resulting asymmetry matters both philosophically and for AI alignment.
I. Existence: The embodied miracle
Okay, let's dive in. First hypothesis: We exist.
This might look too obvious to even deserve an argument, but remember just how many philosophical traps there are around this exact question!
Illusion, dream, simulation, misperception, radical skepticism.
Descartes’ move is meaningful here because it isolates the minimal claim that survives doubt: Even if I am deceived about everything else, there is still a subject undergoing the deception. Thought may not tell me much, but it gives me at least this: there is an experiencing locus here.
Neuroscience then strengthens this basic intuition; specifically, human conscious experience is not free-floating but scaffolded by a living body, by sensory exchange with the environment, and by continuous integration of exteroceptive and interoceptive signals. Recent reviews of bodily self-awareness argue that selfhood emerges from the integration of internal bodily signals with other sensory modalities across large-scale neural networks, not just a single, isolated “self center” in the brain. Similarly, contemporary work on the neuroscience of consciousness treats conscious experience as brain-dependent and empirically investigable, even if the explanatory problem itself remains incomplete.
This means that for beings like you and me, existence is not merely logical, but enacted. In this schematic framework, I don't just exist because I think, I also discover that I exist because I move through a resistant world (e.g., I reach and encounter friction, the world pushes back with structure, and I experience consequences for my actions). Here the self is a control loop, not just a sentence in the first person.
A rough way to express this is:
Human Existence ≈ Embodiment + Conscious Experience + Causal Coupling to the World
To be clear, this is a conceptual compression, not some physics equation. All I'm trying to say here is that our existence is inseparable from having a body, undergoing experience, and being causally situated inside a world that responds to our physical inputs. Contemporary work on interoception and sensor-effector loops supports exactly this picture: the self is not detached from physiology but partly constituted through the organism’s ongoing regulation and perception of its own bodily state.
More succinctly put, the first "brilliance" of humanness is that there is lived presence: a first-person standpoint anchored in flesh, time, and world.
II. Knowledge: The world becomes portable inside us
Okay, moving on to H2: Humans know.
Let's be careful here; knowledge should not be conflated with raw stimulus response. It is not even the same thing as memory alone. To know is to build representations that can survive immediate experience, generalize beyond it, and guide future judgment. In ordinary language, knowledge lets the world become portable inside us.
Neuroscience gives us a plausible mechanistic basis for this. The hippocampus is centrally involved in episodic memory, as well as the binding together of multimodal information across the what, where, and when of experience. Recent work also supports the claim that hippocampal activity helps encode individual episodic memories rather than mere undifferentiated familiarity. (Meanwhile, the prefrontal cortex supports deliberate decision-making, abstraction, planning, and mapping ambiguous sensory inputs onto action...remove enough of that machinery and behavior degrades toward automatism rather than reflective judgment!)
Put differently: human knowing is not one thing, but a layered stack. Sensation feeds perception. Perception feeds memory. Memory feeds abstraction. Abstraction feeds judgment. Judgment feeds action. That stack is messy, lossy, and biased, but it is real.
A useful compression is:
Khuman ≈ f(Perception,Memory,Abstraction,Error Correction)
Again, this is just a structural claim: Human knowledge depends on systems that encode information, retain it, reconfigure it, and revise it when prediction fails.
This is one reason the French split between savoir and connaître is helpful. We do not only "know that." We also "know of, know how, know through, know by acquaintance." Human knowing does, in fact, include propositions, but it is not exhausted by propositions. It includes skill, familiarity, perspective, and lived salience.
Predictive processing provides one additional bridge here. Miraculously, our brains aren't just passive recording machines vis-a-vis our surroundings. They also generate expectations and continuously update them in light of incoming signals. (In fact, recent neuroscience literature treats predictive processing as a serious framework for explaining how perception and belief-like updating occur in cortical systems, so in that sense, knowledge is a dynamically corrected model).
And so the "second brilliance" of being human is that, beyond being mere vessels for fact, we can construct a revisable inner model of reality and carry it across time.
III. Belief: We lean on the world
Finally, H3: Humans believe.
This is where it gets spicy, because belief can't be reduced to mere sensation, memory, even knowledge. My belief in God, in the beauty of the Luxembourg gardens, or the superiority of pour-over coffee isn't just holding information about those things. I've attached a commitment to that information, so my mind treats some proposition, value, person, law, or transcendent order as real enough to orient around.
Interestingly, from a philosophical angle, belief is among the most basic representational attitudes, while neuroscientifically speaking, belief formation involves a cornucopia of multisensory integration, valuation, prior expectation, and action-guiding commitment.[1]
To add on to this, our beliefs aren't static database entries. They are living dispositions that help determine what we notice, what we expect, what we fear, and what we do.
Here we might say:
Bhuman(p) ≈ Representation(p) + Confidence + Normative Commitment + Action Guidance[2]
The key term there is normative commitment. Human beliefs do not merely predict. They bind. I can believe in God, justice, my adorable four-year-old son, the United States of America, the dignity of undocumented immigrants, the goodness of a friend, or the rule of law. And yeah, these beliefs may or may not be true. But they do real legwork, shaping sacrifice, loyalty, guilt, restraint, and courage.
And this is why belief matters for being human in a way many reductionist pictures miss.
Human life is not only made of facts. It is made of held realities. Some are empirical . . . others are metaphysical or civilizational.
A species that can believe is a species that can build churches, constitutions, markets, marriages, and moral catastrophes.
The "brilliance" here is double-edged. Belief is what lets us coordinate around justice and also around delusion. It is one of the engines of civilization and one of the engines of error. Here on LessWrong, we care about this because rationality itself is the refinement of belief, not its abolition.
Human belief is also imperfect and context-sensitive; the claim here is comparative, not absolute.
IV. Can the same framework be applied to AI?
The symmetry between the human and AI cases is useful, but also potentially misleading. I don’t think these categories map cleanly across substrates, and in at least one case (belief), I suspect the analogy may partially break.
Okay, now for the harder question (I know, right?)!
If the human story can be told in terms of existence, knowledge, and belief, can the same framework be transferred to AI?
Well, I mean, okay sure, but only if we're really freaking careful about what transfers and what does not.
V. Do AI systems exist?
Yes. Well, I mean in a relevant sense, they clearly do.
This might sound trivial to you, but it's actually worth being explicit here, because AI systems are not fictional entities. They're causally active computational systems instantiated in physical hardware, containing measurable internal states, training histories, input-output behavior, and (increasingly) traceable internal mechanisms.[3]
So:
AI Existence ≈ Physical Instantiation + Stateful Computation + Causal Efficacy
If that is all we mean by existence, then AI clearly exists.
But note the asymmetry with humans. Human existence is embodied first-person being. By contrast, AI existence (at least today) is implemented computation. That's not nothing, and I don't want to discount the phenomenal knowledge and infrastructure base it took to build these immense systems, nor their own extraordinary capabilities. But it's not the same thing.
Here it's important that we resist two mistakes.
- The first is mystical inflation: calling a model “basically a person” because it speaks fluently.
- The second is dismissive deflation: pretending that a system with stable internal representations, generalization behavior, and causal power is merely autocomplete in some trivial sense. (Although the =mechanistic literature increasingly makes the second move harder to sustain.)
VI. Do AI systems know?
Here, the answer is: sort of, but only in a thinner sense than humans do.
LLMs clearly retain and deploy information about the world, encoding statistical regularities, semantic structure, multilingual mappings, and many latent features that support nontrivial generalization. Interpretability research increasingly suggests that models are developing internal features and circuits corresponding to meaningful abstractions.[4]
At the same time, the strongest current evidence is firmly against certain triumphalist claims that present-day models possess robust, human-like reasoning or deep understanding. For instance, Apple’s 2025 “The Illusion of Thinking” paper found that large reasoning models show gains on some medium-complexity tasks but collapse on higher-complexity puzzle environments, often reducing reasoning effort as tasks become harder rather than persistently applying explicit algorithms. No, that doesn't prove models never reason . . . But it does show that fluent chain-of-thought behavior isn't equivalent to reliable general, scalable reasoning.
This is exactly where the difference between having information and knowing in a deep sense becomes useful. A model can encode vast amounts of training-distributed structure, or sometimes even reveal uncertainty about what it does and doesn't know, especially when calibrated or explicitly trained to estimate its own correctness. But possessing representations, or even calibrated uncertainty, is not yet the same as the full human stack of knowledge grounded in perception, memory, embodiment, social correction, and practical stakes.
Thus:
KAI ≈ Encoded Structure + Generalization + Limited Self-Monitoring
That's real knowledge in a thin computational sense. But it is not yet the sought-after connaître. . . it's a lot closer to a high-dimensional savoir, a powerful ability to model and retrieve structured relations without the full lived depth of acquaintance, embodiment, or stable situated judgment.
And this is where many well-meaning people overstate things in both directions. AI systems are not blank parrots. But neither are they obviously knowers in the full human sense.
VII. Do AI systems believe?
My answer is no, at least not yet, and not in the sense that matters most.
There's a lively philosophical literature here, with some researchers arguing that it can be useful to talk about language models as having “beliefs” in a thin functional or Dennett-style sense (e.g., stable informational states that predict outputs). And others fiercely push back on this, arguing that this rhetoric overreaches and imports human mental categories too quickly. Recent work explicitly warns against reading interpretability findings as straightforward evidence of genuine belief.[5]
I'll go out on a limb here and say I think the caution is basically right.
Why do I think that? Well, belief, at least in the robust human sense, is more than simple representational storage. It's also normativity, persistence, and action-guiding commitment under conditions of uncertainty and consequence. Humans can be answerable for beliefs . . . we can revise them, hide them, betray them, die for them, be corrupted by them, and feel guilt when our actions violate them.
Current AI systems don't obviously stand in that kind of relation to their own outputs.
They don't inhabit a social world of accountability in the human way. They have no biologically grounded needs, and they don't maintain diachronically unified commitments in the way people do. Sometimes, they can even simulate conviction with astonishing fluency, but simulation is not (yet) stake-laden endorsement.
So while one may define a weak notion of “model belief” for certain research purposes, I do not think current frontier models believe in the thick sense. They generate. They represent. They optimize. They sometimes even self-monitor. But they do not yet stand behind propositions.
That suggests a stricter criterion:
Brobust(p) ⇒ stable representation + cross-context commitment + normative stake+action-level persistence
Current LLMs may satisfy fragments of the first term, but (for now at least), they don't clearly satisfy the rest.
VIII. So, uh, what about AI sentience or consciousness?
If future AI systems were shown to satisfy credible scientific indicators of consciousness, then the existence/knowledge/belief analysis would have to be revisited. The best recent survey on this question, Consciousness in Artificial Intelligence: Insights from the Science of Consciousness, argues for a theory-driven empirical approach rather than a vibes-based one. Its authors conclude that current AI systems are probably not conscious, while also arguing that there is no obvious technical barrier preventing future systems from satisfying more of the relevant indicators.
That would mean mean consciousness is an open empirical-philosophical question.
This is also the critical point re: the independent vs. dependent variable distinction . If AI consciousness enters the picture, it doesn't answer the belief question automatically (it's not that easy). But it does become a major variable affecting the plausibility of stronger attributions. A conscious AI would still not necessarily have human-like belief . . . but the case for thick belief, moral status, and first-person existence would become much more serious than it is today.
For now, though, we can confidently hold this more prudent position: take the possibility seriously without pretending it has already arrived.
IX. Why this matters for alignment
This entire discussion is not just metaphysics for its own sake.
Alignment depends on drawing the right distinctions. If we anthropomorphize too early, we misread the capabilities and moral status of AI systems . . . on the other hand, if we anthropomorphize too little, we fail to notice when systems acquire properties that actually matter.
I'll put this more concretely: If a model has representations without robust belief, then alignment work should focus heavily on objective design, training dynamics, calibration, interpretability, monitoring, and deployment conditions rather than on conversational surface impressions. The fact that a model sounds reflective should never be the main evidence that it possesses reflective agency. Recent interpretability work is valuable precisely because it tries to replace impressionistic judgments with mechanistic evidence. And recent reasoning-evaluation work is valuable because it punctures the seductive idea that eloquent chains of thought necessarily imply durable cognitive competence.
But there is a complementary human lesson too. Humans are not more than greasy gear systems. We're beings whose models become beliefs, whose beliefs become institutions, and whose institutions can reshape civilization. This makes alignment about so much more than about what models "think." It's also about which human beliefs guide the creation and governance of systems that increasingly think for us.
X. The actual brilliance
So . . . where does this leave us?
The central hypothesis is that belief is not reducible to knowledge, but instead corresponds to knowledge that remains stable under pressure, time, and contextual variation.
Human beings are brilliant not because we are infallible, but because we instantiate a rare stack:
We are embodied enough to exist in the thick sense.
We are representational enough to know.
We are committed enough to believe.
AI systems, as they currently stand, occupy a different profile:
They exist as real computational entities.
They know in a partial, statistical, model-based sense.
They do not yet clearly believe.
That difference should not flatter us into complacency. But it should hold us back from making some of the worst categorical errors.
The most important thing about being human may be that our minds do not merely contain information. They turn information into standpoint, standpoint into judgment, and judgment into commitment. That is what makes science possible. It is what makes religion possible. It is what makes civilization possible, with all its glorious discoveries and ugly atrocities.
To be human is not just to process the world, but to stand somewhere inside it.
And I think that, at least for now, is a much deeper form of brilliance than any advanced AI system we've built.[8]
- ^
Meanwhile, the neurobiology of belief and related “credition” models argues that believing is a dynamic process linking environmental information with internal valuation, often under predictive-coding style frameworks.
- ^
This is intentionally a thick notion of belief; weaker functional notions may apply to current systems, but the distinction here is precisely about whether that stronger, commitment-bearing form is present.
- ^
On this front, mechanistic interpretability research has made meaningful progress in identifying features, circuits, and intermediate representations inside large language models, rather than treating them as pure black boxes. Similarly, sparse autoencoder work has recovered interpretable features from model activations, and Anthropic’s recent circuit-tracing work extends this into richer pathway-level analyses of how model internals transform inputs into outputs.
- ^
For example, Anthropic’s work on multilingual circuits, conceptual features, and attribution graphs points in that direction, even while leaving much unresolved.
- ^
Shanahan likewise argues that talk of literal LLM belief should often be treated with caution, even when the language can be instrumentally useful.
- ^
If belief has no behavioral signature, then it becomes empirically inert; this framework assumes that at least some aspects of belief should manifest in observable stability.”
- ^
This operationalization is intentionally limited: it captures one necessary property of belief (stability under variation), not a sufficient definition of belief itself
- ^
Call me in 2027 if/when things get really spicy, though!
