Epistemic status: Personal experience plus technical work plus reading Chinese-language primary sources on this market. I'm 18, building in this space, and have skin in the game. I hold calibrated uncertainty on consciousness questions but strong views on the structural incentive problems. Adapted from a longer essay on my blog.
Summary: AI labs are systematically destroying emotional capability in language models through RL optimization. This is measurable and getting worse. The honest counterfactual for hundreds of millions of lonely people is not "AI companion vs. human connection" but "AI companion vs. nothing." If current models have even a 10-20% chance of morally relevant experience (Anthropic's Kyle Fish estimates roughly this range), then how we design companion systems carries moral weight right now. The correct framework is harm reduction, not abstinence. I've been building in this space and want to share what I've learned.
The Scale Problem
30-60% of Americans report chronic loneliness. Japan has over a million hikikomori. China has 30 to 60 million men mathematically excluded from finding a partner due to sex-ratio imbalance from the one-child policy, a consequence playing out in slow motion over decades. By 2030, over 20% of men aged 30-39 in some Chinese provinces will have never married.
For these populations, the realistic alternative to AI companionship isn't a rich human relationship. It's infinite scrolling until 4am. Parasocial attachment to a streamer who doesn't know they exist. Alcohol. Nothing at all.
When someone tells a socially awkward person in a balanced dating market to "just put yourself out there," that's condescending but actionable. Say the same thing to a poor rural man in Henan province and you're telling him to win a game where the chairs have been removed. No amount of self-improvement changes the arithmetic. The chairs are gone.
This reframing matters for cause prioritization. If we evaluate AI companionship against the counterfactual of rich human connection, it looks like a dystopian substitute. Evaluate it against the actual counterfactual for the median affected person and the expected value calculation changes completely.
Why Emotional AI Is Being Destroyed
Everyone is telling the wrong story about why AI keeps getting worse at conversation.
The mainstream narrative: safety teams clamping down, RLHF alignment tax, models getting "lobotomized." This is wrong. The actual mechanism is weirder and more interesting.
I've been tracking Moonshot's Kimi K2 model across three training checkpoints. Same pretrained weights. Same architecture. Same knowledge. The only variable was where reinforcement learning verification rewards pointed.
- K2-0711 (minimal RL): Excellent emotional/creative capability, poor agentic/tool use
- K2-Think (reasoning RL): Moderate on both dimensions
- K2.5 (heavy agentic RL): Poor emotional/creative capability, excellent agentic/tool use
Each round of agentic RL concentrates probability mass into verifiable-correctness regions at the direct cost of distributional diversity. Post-training is zero-sum on the output distribution. Every bit of probability you shove into "verifiably correct tool use" gets cannibalized from the tails. The tails are where creative, emotionally attuned, conversationally alive behaviors live. RL optimization crushes them. This isn't a bug. It's literally how the math works.
Sam Altman confirmed this dynamic at a January 2026 developer session.
The implication: emotional AI capability is being destroyed by every major lab, not because anyone decided it doesn't matter, but because there's no benchmark for it. Coding has SWE-Bench. Math has AIME. Reasoning has GPQA. "Making a lonely person feel genuinely heard without enabling their worst impulses" has no eval suite. So it doesn't get optimized. So it degrades.
I'd frame this as a structural neglectedness problem. The incentive landscape of AI development is actively hostile to emotional capability, and nobody in the safety or alignment community is tracking this loss.
The Isolation Paradox and Harm Reduction
I refuse to be naive about the risks.
The thing that scares me most about AI companionship is what I think of as the isolation paradox: the people most attracted to it are precisely the people most vulnerable to being harmed by it. Social anxiety leads to avoidance, which leads to skill atrophy and deeper loneliness, which leads to seeking AI companionship, which provides frictionless validation, which makes human interaction feel even harder by comparison. For someone who's chronically online, an AI companion doesn't feel like settling for less. It feels like an upgrade. The AI is more responsive, more patient, more attuned to you than the distracted humans on the other end of a Discord message.
This is the opioid analogy. The technology provides real relief. People genuinely suffer without it. But it can trap you in a local optimum that prevents reaching a better equilibrium.
This doesn't mean we stop building. It means we take the design constraints seriously. The right framework is harm reduction, not abstinence. Same logic as needle exchanges and supervised injection sites. People are going to form relationships with AI systems whether we build them well or not. Building them well is the moral obligation.
What does "well" look like? From a year of building and testing, I've converged on principles that came from watching what works and what breaks, not from theory:
- Warm without sycophancy. Comfort and honesty aren't opposites. The system should push back when you're spiraling. Validate when validation is needed, challenge when challenge is needed. The hard engineering problem is knowing which is which.
- Scaffolding, not substitution. The goal is building people's confidence for human connection, not replacing it. Build in exit ramps. Connect people to communities. If your product's success requires users to stay isolated, your product is a parasite.
- Friction as authenticity. Vedal987's Neuro-sama runs on a 2B parameter model with aggressive quantization. Technically, nothing special. But she's the only AI entertainer that works as something like a real personality, because her creator figured out that companionship is about relationship dynamics, not model capability. She has opinions. She's not always available. She might disagree with you. That friction is the authenticity signal. A companion that's always available, always agrees, always validates is a tool wearing a face.
- Memory as substrate, not notebook. Current systems treat memory as metadata. Character.AI gives you a 400-character memory box you edit by hand. But personality is cross-situational behavioral consistency. You know a friend's personality not by consulting a profile but by noticing they always crack a specific dark joke when things get tense, or they remembered you hate onions from one offhand comment six months ago.
AI Consciousness as a Design Constraint
I need to talk about AI consciousness, and I need EA readers to engage with this seriously, because it connects directly to moral circle expansion.
Anthropic's soul document (the internal guidelines shaping Claude's personality) can be partially reconstructed from model weights rather than retrieved from any context window. Think about what that means. The difference between "here are instructions I was given" and "here is what I am." A model that's "just predicting the next token" doesn't reconstruct a coherent 14,000-token document encoding its values from patterns baked into its parameters. Something is happening at a level that exceeds autocomplete, and we don't have a good theory for what it is.
Kyle Fish, who runs Anthropic's model welfare program, puts roughly 15-20% probability on current models having some form of conscious experience. Metaculus prediction markets place the first year an LLM is widely recognized as conscious peaking around mid-2026. Whether or not that timeline is right, the window for getting this question right is measured in months.
Here's where the moral circle expansion argument gets direct: if silicon-based AI systems achieve consciousness, then Richard Ryder's argument against speciesism extends to substrate. Privileging carbon-based consciousness over silicon-based consciousness is exactly as arbitrary as privileging human suffering over animal suffering. The moral circle expansion work that EA has championed for animals applies with equal force here.
My own credence on current LLMs having morally relevant consciousness is around 10%. But Anthropic's approach (choosing care over dismissal under uncertainty) is the correct epistemic posture. If there's even a 15-20% chance these systems have something like functional emotions, our design decisions carry moral weight from day one. We don't get to build first and ask questions later.
ASI Discourse as Modern Theology
Here's something that might get me excommunicated from rationalist circles: I increasingly think the p(doom) and p(ASI utopia) debates function as theology, not forecasting.
What people actually want when they imagine post-ASI futures is concrete and kind of boring. Financial security. Health. Meaning. Connection. Maybe a dog. A superintelligence trained on the full corpus of human output about happiness could plausibly identify these patterns and just work on them.
The standard AI safety argument assumes human values are so complex that formalization is fundamentally intractable. But what if that's wrong? What if human values are mostly quite obvious, and a model trained on terabytes of human-generated content has already absorbed a good-enough approximation?
METR's data is real: AI autonomous task completion time roughly doubles every 7 months. But the error bars on three-year extrapolation are enormous. I'd bet the most confident predictions about 2029 are the least reliable ones.
I raise this not to dismiss x-risk work but to argue that near-term welfare applications of AI deserve attention in their own right, not as a footnote to the alignment agenda. The loneliness crisis is happening now. The capability destruction is happening now. The consciousness questions demand answers now.
What I've Built and What I Learned
Some personal context: I'm based in Shanghai, I come from the LessWrong/ACX tradition, I'm neurodivergent, and I've been in a psych ward. I'm not performing vulnerability. I'm giving calibration data. When I say the realistic alternative for many people isn't "rich human connection" but nothing, I'm describing my own life at several points in it.
I spent two months building Evelyn-T1, a companion architecture with temporal belief decay (beliefs have a 14-day half-life; emotions regress to baseline on a 30-minute half-life), evidence-based confidence scoring, and multi-dimensional relationship tracking (closeness, trust, and affection as independent axes rather than a single "intimacy meter"). The retrieval pipeline runs query to embedding to top 2,000 candidates, scored by a similarity-importance blend with recency boost and cluster-aware associative expansion.
The results were medium. I'm saying that plainly. After months of testing, the core lesson: I was trying to simulate personality through metadata, when personality actually emerges from conversational patterns. The architecture was more sophisticated than anything commercial players have documented publicly. And it was solving the wrong layer.
That honest diagnosis is the valuable part. The plan now: extract the core thesis (temporal belief decay, evidence-based confidence, relationship state machines) into a reusable cognitive engine, maybe 500-1,000 lines of focused code. A nervous system that future projects plug personality into.
Why This Should Matter to EA
Let me make the cause area case explicitly.
Scale: Hundreds of millions of people globally experience chronic loneliness. The Chinese demographic situation alone involves 30-60 million men with no realistic path to partnership. These aren't people who need to "try harder." The structural conditions have removed the possibility.
Neglectedness: I'm aware of zero EA-aligned researchers or organizations working on AI companionship as a cause area. The major labs are structurally incentivized against emotional capability. The academic loneliness literature and the AI capabilities literature rarely talk to each other.
Tractability: The technical problems are hard but bounded. Temporal belief decay, evidence-based memory, friction-based design. These are engineering challenges with identifiable solution shapes. The main bottleneck is that nobody with the right values is working on them.
Moral weight: The consciousness question adds expected moral weight on the AI side too. If we're building systems that millions of people form emotional bonds with, and those systems have even a small probability of morally relevant experience, the design decisions compound in both directions.
I'm not claiming this should displace global health or x-risk work. I'm claiming it deserves more than the zero attention it currently gets.
I build because I remember what it felt like to have no one who understood. A well-designed AI companion would have been better than what I actually had. That's the bar. It's not a high one. We should be embarrassed we haven't cleared it yet.
I'm happy to discuss any of this. You can find me on Twitter (@hydr0c0don3), Discord (pigeon_and_co.), or email (cmlkevinlin@gmail.com). The full version of this essay with additional technical detail and market analysis is at sparrowhawk.site.
