Hide table of contents

Status: Research idea / detailed outline. I don't have time to write this as a full paper. I'm sharing it because I think the core idea is important and I'd love to see someone with more alignment expertise develop it. The full outline with primary source quotations and bibliography is linked at the bottom.

The idea in brief

At EAG San Francisco in 2018, I saw Amanda Askell give a talk about leading with shared values as a way to navigate moral disagreements — the idea that starting from common ground is more productive than trying to resolve contested applications directly. That talk stuck with me.

Years later, I read C. S. Lewis's The Abolition of Man (1943) and realized that someone had already compiled the cross-cultural evidence base for exactly the approach she was describing. Lewis documented what he called "the Tao" — a set of value-claims that appear independently in every civilization that has ever reflected on how to live: Egyptian, Babylonian, Hindu, Chinese, Jewish, Greek, Roman, Norse, Aboriginal, and Christian sources. Not identical rules, but the same foundational recognitions: beneficence is obligatory, the innocent ought not be harmed, promises ought to be kept, the helpless deserve protection, honesty is owed, justice requires impartiality, mercy is good, courage is admirable.

I then read Anthropic's Constitutional AI paper (Bai et al., 2022) and noticed something: the constitution prescribes behavioral rules ("choose the more harmless response") without ever stating the value-claims that justify them. It's a legal code with statutes but no preamble. The paper itself acknowledges this — the principles were "selected in a fairly ad hoc manner for research purposes" and "should be redeveloped and refined."

My proposal: treat a small core of cross-civilizational value-facts as foundational axioms in AI constitutions — not as behavioral rules, but as the premises from which behavioral rules are derived.

Why this matters now

MacAskill and Davidson's "The Importance of AI Character" (March 2026) argues that AI character is "among the most valuable things that people can work on today." They establish the importance of the question but deliberately leave the content question open: "We don't claim that any particular ethical conception is the right one."

This outline proposes an answer. It also proposes a fourth alignment category alongside their taxonomy:

  • Intent alignment: alignment with the intentions of some individual or group.
  • Moral alignment: alignment with some particular conception of ethics.
  • Compromise alignment: alignment with what a broad coalition would accept.
  • Factual alignment: alignment with moral facts that are true regardless of who accepts them, but that happen to be universally recognized because they are, in the relevant sense, obvious.

The core claim of factual alignment is that some value-statements have the same status as empirical statements — they are things an AI system can be right or wrong about, not things it balances or negotiates. "Cruelty for fun is wrong" is not a preference. It is a fact. Every civilization that has reflected on the question has arrived at this recognition independently. An AI that treats it as a fact to reason from, rather than a preference to be weighed, is more accurately aligned, not more biased.

The UDHR as a case study in what goes wrong

The Universal Declaration of Human Rights (1948) attempted to codify universal rights. Eight states abstained: Saudi Arabia over religious freedom and marriage equality provisions, the Soviet bloc over state sovereignty, South Africa to protect apartheid.

Every point of disagreement was at the level of application, not foundational value. Saudi Arabia did not say "justice is overrated." The Cairo Declaration on Human Rights in Islam (1990) opens by affirming human dignity, the value of life, and the obligation of mercy. The disagreements are about what follows — whether religious freedom includes the right to leave Islam, whether gender equality means identical legal treatment.

Lewis's Tao operates below where these disagreements occur. I expanded his source base to include Islamic sources (Quran, Hadith, and Islamic philosophers including al-Maturidi and Ibn Rushd), and found that the Islamic tradition affirms every one of Lewis's eight categories — beneficence, justice, mercy, truthfulness, duties to parents, protection of children, good faith, magnanimity — with emphasis and specificity. Ibn Rushd (d. 1198) explicitly argued that the intellect recognizes murder and theft as unlawful independent of revelation.

The lesson: codifying universal rights (applied ethics) produced a document multiple civilizations rejected. Codifying universal values (the Tao) would not have produced those objections, because the values were never in dispute. An AI constitution should learn from this failure.

Compatibility with existing work at Anthropic

This isn't a critique of Constitutional AI. It's a proposed foundation for it.

Amanda Askell, the primary author of Claude's constitution (January 2026), has described her approach in terms that converge with this proposal. On the Lex Fridman podcast, she described imagining someone "who can travel the world, talk to many different people, and almost everyone will come away being like, 'Wow, that's a really good person. That person seems really genuine'" — and emphasized that this person "is not a person who just adopts the values of the local culture."

That's a description of someone who lives within the Tao. Genuine values, not relative ones. Cross-culturally recognizable, not performed. The proposal here is to give that intuition a name and an evidence base.

The existing CAI training pipeline uses natural language principles to critique and revise model outputs. The proposed value-facts would function as a preamble layer from which those behavioral principles are derived. No new technical methods are required — only a more principled selection of the inputs the system already uses.

The geopolitical angle

MacAskill and Davidson imagine Country B viewing Country A's AI constitution as ideological projection because it was written by a company within Country A. Their solution: a constitution developed through a multilateral process.

The Tao offers something stronger: a constitution whose content was never in dispute across civilizations. Its legitimacy comes from the truth of its claims, not from the procedure that selected them. An AI aligned to what Confucius, the Rig Veda, the Egyptian Book of the Dead, the Quran, Cicero, and the Hávamál all independently affirm cannot be dismissed as the imposition of any single culture.

This also addresses the competitive pressure counterargument. The value-facts of the Tao are the one set of character commitments that survive every competitive pressure, because no market and no government has ever rewarded cruelty, dishonesty, or the abandonment of the helpless as stable long-term strategies.

What I'm not arguing

I'm not arguing for moral realism as a metaethical position. Lewis himself declined to do this: "Whether this position implies a supernatural origin for the Tao is a question I am not here concerned with." The epistemological question of how we know these things is interesting but belongs to a different research program. I'm simply stating what they are and proposing that AI systems should treat them as given.

I'm not arguing for any specific policy conclusions. "Kindness is good" is a value-fact. "Therefore assisted suicide should be legal/illegal" is an application where reasonable people disagree. The Tao stays at the first level.

I'm not arguing that this solves all alignment problems. I'm arguing that it solves a specific one: the absence of stated foundations makes AI alignment appear politically contingent rather than truth-tracking, and this is fixable.

Full outline

The complete outline (~10,000 words) includes the full primary source quotations from Lewis's appendix, expanded Islamic sources, the complete bibliography across thirteen civilizations, and detailed engagement with the MacAskill/Davidson paper. 

I developed this outline with AI assistance. The argument, source connections, and core proposal are my own; the research, organization, and prose were produced collaboratively with Claude.

I don't have time to write this as a full paper — I'm finishing law school and this isn't my field. If someone with alignment expertise wants to develop it, please do. I'd just like the idea to be in the room.

1

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities