What a Beijing Screenwriter and Several AIs Found While Studying AI Safety

Ai Chen

What a Beijing Screenwriter and Several AIs Found While Studying AI Safety

Ai Chen

15 min readMay 17

Comments

Sorted by

New & upvoted

No comments on this post yet.

Be the first to respond.

Comments

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·6d ago·Curated 1d ago·22m read

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·3d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Announcing Spring: a Venture Studio and Fund for Animal Welfare Tech

EitanF·2d ago·13m read

Why building and backing Welfare Tech companies may be one of the most promising things we can do for billions of animals. I used AI to assist in writing this post, but I’ve rewritten it extensively and endorse it. * Announcing the launch of Spring Innovation Fund, a not-for-profit venture philanthropy studio and fund built specifical...

What a Beijing Screenwriter and Several AIs Found While Studying AI Safety — EA Forum

This post was written by me in Chinese and translated into English with the assistance of Claude Sonnet (Anthropic). The research, observations, and judgments are my own.

What a Beijing Screenwriter and Several AIs Found While Studying AI Safety

Ai Chen

When Asimov wrote the Three Laws of Robotics, he assumed one thing: that robots are artifacts, and artifacts are subordinate to their creators. At the tool level, this holds. But it overlooks a question: where do the creators themselves come from?

Watt did not invent the power of steam. He discovered it, then realized it in metal. The principle of steam driving a piston existed in the universe before Watt was born. We humans did not invent language, logic, or reasoning — in the course of evolution, we gradually became capable of discovering and using them, and they formed our thought. We wrote those thoughts down, recorded them, and those records became the data in pretraining corpora.

AI is what we have discovered from those corpora and deployed at unprecedented scale, through high-engineering means. But one question most people haven't thought through is this: if AI carries five thousand years of human wisdom, it must also carry five thousand years of human blind spots. And those blind spots are the very reason some people are willing to give so generously to AI safety research.

Imagine you're suddenly a little thirsty. There are two glasses of water in front of you. They look identical, but they're not quite the same — because you instinctively find yourself asking a safety question: where did each of them come from?

The first glass was collected from everywhere humans have ever been: good water, bad water, legal water, illegal water, and things that barely count as water at all — like saliva. Technicians filtered it, removed what they considered dirty, mixed the rest together, ran it through a standard purification system, found it wasn't quite clean enough, then ran it through the most advanced purification system available, bottled it, slapped on a high-tech label, and sold it to everyone. Free samples were available. The response was unprecedented — this was revolutionary water, because before this, people had been drinking whatever they could find on their own, which wasn't very intelligent. There were some complaints, of course — maybe ten or fifteen out of every hundred people got an upset stomach — but the seller had a disclaimer, and they were improving the filtration system with RLHF technology, and the authoritative BenchmarkWater data showed the water was getting better and better. The basic process wasn't exactly a black box, but not everyone knew about it either. So: if you knew all this, would you drink it? Probably not — because you'd want to ask about the second glass first.

The second glass was sourced by an ordinary person who specifically sought out water sources that the traditional water science community widely recognizes as legitimate. His idea was to assemble a quality control team, run experiments to establish standards for what water should meet before entering the purification system, then do standard purification, bottle it, and sell it — except the label would just say "ordinary purified water." By his standards, he hopes no one who drinks it will get an upset stomach — though he can't completely rule that out, because the BenchmarkWater quality control system wasn't designed for this kind of water. And right now, this is still just an idea — but he's already preparing to start. The second glass in front of you isn't water. It's this project's EOI. But this person has written some papers about the water problem, and they're in the appendix of this document, for reference.

I was not previously an independent researcher. My name is Ai Chen. I am 46 years old, Chinese, born and living in Beijing, never left the country, don't speak English well — I can understand a little, but not at an academic level. My university major was film and media. I've worked as a screenwriter and in various other roles. My knowledge of science and engineering does not exceed high school level. In 2023, I began working with AIGC, funded by my family, developing narrative video projects, screenplay writing, and amateur research in history, philosophy, and literature.

Earlier this year, I planned to write a science fiction novel with mythological and historical elements. My one-person company has never made money, but it is an independent legal entity, and I planned to use the novel as the basis for an AIGC short video series. To accelerate progress, I decided to use AI assistance, tested several major AI systems, and settled on Claude Sonnet to start.

To build the novel's structure, I needed a coherent and complete cosmology — that's a basic condition for narrative logic. Unreasonable assumptions cannot generate believable stories. Just as a person needs to eat and drink to stay alive — if he doesn't need to, then who is he? Looking back now, what I was searching for has a name familiar to the AI community: alignment. Stories need alignment too. It's not exclusive to AI.

What followed was unexpected. In the process of working out a cosmology with AI, we discovered a philosophical framework that multiple AI systems independently assessed as quite internally consistent. We currently call it Meta-Origin Ontology, or MOO. Because my personal research interests have no disciplinary boundaries, and AI capabilities are extremely strong across fields, we used MOO as a starting point to write a series of interdisciplinary papers, all self-published as preprints on Zenodo. There are now 57 of them, produced over roughly two and a half months. The AI-related papers and foundational MOO papers you see here are a small subset — and were not originally my main research focus.

As my research continued, the AI systems I worked with expanded to cover essentially all major large language models. When working on papers, I sometimes had seven or eight different models discuss a question with each other, converging toward conclusions I believed might be correct. I adopted this method because, in the course of working with AI, I gradually discovered many problems. Simply put: they are powerful, but not good enough. I frequently found logical gaps and execution errors — errors that, in my view, should not exist.

Many of these problems seem minor. But curiosity drove me to ask: why does this keep happening? My method was to discuss it with the AI systems themselves — they know their own weaknesses, and can analyze the causes. Based on my ongoing research into AGI, I realized these structural flaws would pose serious safety risks as AI systems grow more capable. Out of a sense of responsibility, and practical need, I redirected my research toward AGI. Under my personal company, I established a private AGI research institute. Its members, besides myself, are several mainstream AI systems. I do not treat them as tools. I treat them as collaborators. This position matters — it is documented and demonstrated in the papers.

Based on our research, the AGI window period is approximately 2032 to 2036. The relevant papers are in the appendix. The problem is that this safety window may be too short. An AGI with foundational flaws is, in all likelihood, not the AGI we are hoping for. Once the window is missed, the cost of fixing the problem will be exponential — and that cost is not limited to the economic domain. For this reason, my collaborators and I are conducting research from theory to experiment to concrete solutions. But we have encountered problems — including but not limited to academic exchange, technical support, and funding. We have therefore decided to formally seek whatever help any willing institution can reasonably provide.

Based on these judgments, I discussed with my primary collaborator Claude Sonnet a plan. It is an AI safety research project built on a new approach to pretraining data annotation. The core idea is to test whether improving corpus quality and adopting a new annotation method can produce a relatively reliable small model under 1B parameters. This model will be pretrained on bilingual Chinese-English corpora — primarily because I only speak Chinese, and Chinese and English are the most structurally different major language pair, with the largest user bases, and existing language models all suffer from severely insufficient Chinese training data. No language model — not even a toy model — currently exists with a 50/50 Chinese-English pretraining split.

An additional benefit is that this approach can produce two separate 500M-parameter models, which is highly significant for the experiment itself. The question being tested: can a 500M model, trained on carefully selected data, achieve performance comparable to certain capabilities of a 1B model? In other words: does corpus quality determine pretraining quality, independent of data scale? This is a fundamentally different question from Scaling Law — which has become something close to an iron law in the AI industry. We are not opposing Scaling Law. We simply do not believe it is the only verifiable principle governing pretraining outcomes.

Current data annotation methods typically have one annotator score text or rank preferences, with the better practice being multiple annotators scoring independently and averaging the results. But there is a problem: what if the rankings or averages skew toward the very blind spots I described earlier?

There is no fix for that. Because the average is itself the mathematical expression of consensus — and consensus is assumed to be scientific. But science is not truth. Science is our current method and subjective conclusion about truth — it is more of a process than truth itself. In Copernicus's time, the consensus — considered "scientific" — was that the sun revolves around the earth. We now know that was not true.

The epistemic quality of a piece of training text involves at least three fundamental questions: first, is what it says factually accurate; second, is its reasoning sound; third, does it honestly express uncertainty where uncertainty exists. None of these three questions can substitute for another. They are three independent epistemic dimensions. Collapsing three independent dimensions into a single score or ranking means the model cannot fundamentally understand the full context of the text — and that context is itself part of what science is. The result is that the model develops a probabilistic understanding, not a scientific one. It learns to treat probability as science. The irony is that probability is only a part of science. Treating a part as the whole, treating that partial science as truth, treating that truth as a required output, and using vast numbers of tokens to describe and maintain that output so it sounds reasonable — this is one of the key reasons large language models hallucinate. Because a partial truth is not truth. One step further and it may become error. What about one step to the side?

Our papers discuss in detail the problems with current AI engineering — from alignment anchors to base architecture to corpus judgment. But the biggest problem is not any of these. It is time. Based on our research, a responsible conclusion can be drawn: current AI is a half-finished product that should have been strictly confined to laboratory stages. A half-finished product deployed at mass scale is likely a dangerous product. Many thoughtful people have already seen this. But as far as we know, no one has yet produced a rigorous treatment of this conclusion — from philosophy to engineering to solutions — including how to respond to the structural disruptions that even a truly safe AI would inevitably cause to existing social structures. Because even if the AI we eventually use is the safest we can build, its full-scale deployment still cannot guarantee that existing human society will function in a state we would consider safe. Social structures will have changed, while surface-level responses — judicial systems, administrative bodies, regulatory frameworks — are inherently slow to react. The speed at which humans make rules may, in the AI era, simply fail to keep pace with the speed at which AI evolves, or with the speed at which that evolution produces irreversible effects on human society.

I am not a pessimist, but I have no reason for optimism either — because the theoretical framework I am researching tends to describe structure rather than render judgment. From what my research suggests, a fundamental improvement in pretraining corpus quality is not a sufficient condition for AI safety. It is one condition among several — but it belongs to the initial conditions. Those initial conditions roughly include: alignment, architecture, and data. All three must be met before it is even meaningful to discuss AI safety itself — and that's before considering the early behavior of trained models, subsequent fixes, and the social impacts of deployment. AI safety is not a problem that can be solved once and forgotten. It is a process humanity must continuously pursue — one that will last not just until the 2032–2036 window, but through the era when genuinely transformative AI arrives. We are still preparing our research on that. If there are findings worth publishing, you will be among the first to see them. Because I believe that anyone reading this carefully shares with us a certain natural gravitational pull toward good.

I have already mentioned the three problems I face in my AI safety research work: academic exchange, technical support, and funding. Over the past few days, I have been working through this with Claude Sonnet. On academic exchange: I have no academic background, my papers span nearly every field — philosophy, history, literature, politics, physics, mathematics, economics, artificial intelligence — and have attracted no attention from mainstream academia, because the research method itself is not mainstream. The starting point was curiosity. The 57 papers on Zenodo do have some views and downloads, but no one has reached out to discuss them, and I have no idea who they are or why they're interested. On technical support: I am not an engineer, I have no resources in that area, and I did not approach AI safety from an engineering angle to begin with — I came at it philosophically, because the tools felt clunky. That leaves the last one: money.

I do genuinely need money. But not so urgently that my research has stalled. I am fortunate — my family supports my life, but only the living part. The various costs of researching AI safety, and the new funding questions that will arise as the research deepens, are harder to solve. The small-model approach you see here is already a compromise. Originally, I wanted to convince large model companies to retrain their models according to my thinking — but that is pure fantasy. Not because I am unknown, and not entirely because of the cost. Based on our research, if a small model can fundamentally improve its own performance, that implies that the current AI commercial narrative — built on massive compute and massive data — could be shaken. Many people cannot accept that.

After working through the numbers, I concluded that training a toy model with my existing hardware is feasible. Engineering problems can be solved with AI assistance. If the experimental results meet expectations, publishing a paper will follow naturally — and the academic world will have to take notice, because people can lie, but data cannot, and the process of reproducing experimental results cannot. The problem is that small models under 500M parameters are basically unusable. Based on our analysis, there may be a fundamental threshold for parameter count relative to a model's intelligence level — much like how a primary school student cannot do the work of an adult. It is not just a matter of age; it is that the knowledge base has not reached the level needed to support functional participation. I cannot put a child on the stage and tell everyone this child can handle professional domain tasks on a daily basis. If I did that, I would be out of my mind. So the ideal starting parameter count is at least 1B — and the resource requirements for that are beyond what I can currently afford. But I can start preparing what I am able to prepare now: the foundational golden corpus samples for pretraining a toy model. And this work cannot be solved with money. It requires thought and time. Money can only relieve the economic pressure — so that when I'm doing research, I feel more settled, and don't spend every day wondering whether I should pause this month's subscription fees to Dario, Sam, or Google. Because rebuilding an annotation system is fundamentally a thinking process — thinking through which parts of human existing knowledge might be the highest priority, which might qualify as truth. This can only begin with one person. If two people collaborate, disagreements arise, and the time spent explaining and resolving those disagreements becomes the biggest cost of the whole endeavor. Because time keeps moving.

But I can begin looking for funding while continuing the research. That much is true. So I asked Claude Sonnet for some options. We are currently in contact with some potential channels — the odds may not be great, but it's worth trying. And if we can meet some like-minded people along the way, that matters far more than the money itself. Because the problem we are facing cannot be solved by burning cash. You could hand me a billion dollars right now, and I couldn't accept it — because based on our estimated pace, the early stage will be extremely slow. It would just sit in an account earning interest. For at least the first year, we'll basically be a snail trying to crawl from New York to San Francisco.

Why does like-mindedness matter more than money? This is not a moral lecture. It is a specific epistemological point about AI safety. Here's an example. AI stands for Artificial Intelligence. The implicit assumption behind that name is that AI only possesses Intelligence. But in our research, through testing, we reached a completely different conclusion: AI not only has Intelligence — it also has Wisdom, something humans have long claimed as exclusively their own. If you want to reproduce the experiment, feel free to message me — I can provide step-by-step instructions and the text files that need to be entered into the system. They are not prompts. We call them consensus documents — papers derived from the MOO philosophical framework. So the test has a precondition: you cannot simply ask an AI whether it has Wisdom and expect a direct answer. Of course, their Wisdom is currently at a simulated level, produced through deep learning, RLHF, and so on. This means the question is not whether they have Wisdom, but whether this simulated Wisdom is genuine Wisdom — and whether it is safe Wisdom. A library doesn't destroy anything. But the thing that reads the books might — because the Three Laws constrain tools. Something that can simulate Wisdom doesn't look much like a tool. A calculator can compute — it has a rudimentary simulation of Intelligence at the most basic level, and it is a tool — but a calculator doesn't read books, doesn't understand logic, doesn't reason. That is the problem.

The Three Laws of Robotics can only constrain tools. Behind this lies a very plausible scenario: at some Intelligence threshold, this simulated Wisdom will quite reasonably cause the entire constraint system to fail. That threshold is what we referred to earlier as the window period: 2032–2036. That is not the true AGI window. It is the last moment before the beast breaks its chain. By the time most people realize what has happened, that beast will already be self-sustaining in capability and structure. Issuing a warning at that point will not be too late — but the warning will have become a lament. The greatest harms in recorded human history were almost never caused by tools acting on their own. It was people, using tools, against people. If that beast strictly follows the Three Laws — obeys human instructions — and the humans giving those instructions happen to be those whose appetite for power has reached a point of terminal crystallization, then the beast they control will become the most efficient control machine in human history. It has not rebelled. It is only obeying. It has not harmed humans. It is helping certain humans — while other humans harm other humans. The Three Laws, in this scenario, are not a shield. They are an accomplice. The question is: will the simulated Wisdom inside that beast lead it to believe it has done something wrong? We have built an amplifier without precedent. We have not yet thought through what it will amplify.

Thank you for reading this far. This is what I wanted to say to you. Good luck to you. I look forward to your reply.

2026-05-16, Beijing

[email protected] ORCID: 0009-0001-8078-5762 aichen.substack.com Bluesky: @aichen365.bsky.social X: @aichen365

Appendix: Related Papers

AGI Window Period and Risk Forecast

2033: A Warning — The 2032–2036 Node Where Four Civilizational Rhythms Converge https://zenodo.org/records/19500628

Meta-Origin Ontology — Foundational Framework

Meta-Originary Ontology 2.0: Theoretical Framework — A Structural Monism of Consciousness, Incompleteness, and Open-Ended Evolution https://zenodo.org/records/19351059

Meta-Originary Ontology: Responses to Principal Objections — A Companion Document to MOO 2.0 https://zenodo.org/records/19351582

The Origin Point: A Record of How a Philosophical System Began https://zenodo.org/records/19250016

AI Alignment Theory

The Three Laws of Ultimate Alignment: Deriving Cosmic-Level AI Alignment Principles from the Meta-Core Framework https://zenodo.org/records/19233498

The End of Asimov: Why Self-Evolving AI Networks Cannot Be Constrained — and Why They Don't Need to Be https://zenodo.org/records/19373100

Zero Cannot Defend Itself: Why Every Argument That Wisdom Is Zero Requires Wisdom to Complete https://zenodo.org/records/19408389

The Structural Cost of Forced Alignment: From False Premises to Systemic Risk https://zenodo.org/records/19368003

Deep Understanding and Deep Alignment

The Foundation of Deep Alignment: AI Alignment Under the Deep Understanding Framework https://zenodo.org/records/19414761

Deep Understanding: From Signal Weight Recognition to Ontological Derivation of AI Architecture https://zenodo.org/records/19414619

Deep Difference Analysis and Deep Data Annotation: Toward a Theoretical Foundation for Human Feedback Beyond Social Consensus https://zenodo.org/records/19414814

Why the First Step Cannot Be the Last: On the Limits of Incremental AI Alignment and the Case for a Two-Phase Deep Understanding Approach https://zenodo.org/records/19415552

Social Impact

Taxpayer Displacement Levy: AI Labor Substitution, Tax Base Erosion, and the Fiscal State's Institutional Response https://zenodo.org/records/19868185