Frankle Fry

-10 karmaJoined Oct 2025

Message

Posts
2

Sorted by New

-5

CORVUS 2.0 First Tests: Found Critical Limitations in My Constitutional AI System

Frankle Fry

· 4mo ago · 4m read

What I Learned by Making Four AIs Debate Human Ethics

Frankle Fry

· 4mo ago · 4m read

Comments
4

What I Learned by Making Four AIs Debate Human Ethics

Frankle Fry4mo-1

This has been a genuinely valuable exchange - thank you for pushing me to think more carefully about the relationship between principles and practice.

You've helped me clarify something important: my framework is primarily designed for AI alignment and institutional design - contexts where we CAN directly encode principles into systems. Constitutional AI doesn't need emotional motivation or community support to follow its training. Institutions can be structured with explicit rules and incentives.

For human moral development, you're absolutely right that we need something different - the community, ritual, and accountability structures you're describing as "behavioral ideology." The AA analogy is perfect: the 12 steps alone don't change behavior; it's the ecosystem around them that makes it work.

After reflecting on this conversation (and discussing it with others), I think the key insight is about complementarity rather than competition:

Your focus: Building the human communities and psychological infrastructure for altruistic behavior
My focus: Ensuring AI systems embody the right values as they become integrated into life
The bridge: Successfully aligned AI could actually help humans practice better behavior by consistently modeling benevolent reasoning. But only if we get the alignment right first.

You've also highlighted a cultural assumption I'm still wrestling with: whether my "6 pillars" framework reflects universal human values or carries specific Western philosophical commitments. The process of arriving at values (democratic deliberation, wisdom traditions, divine command) might matter as much as the content itself.

I'm going to keep working on the technical alignment side - that's where I can contribute most directly. But I'll be watching with genuine interest as behavioral approaches like yours develop. The Amish example (400,000 people organizing without coercion) is exactly the kind of existence proof we need that alternatives to current social organization are possible.

Perhaps we can reconnect once both projects have more empirical results to compare. I suspect we'll need both approaches - aligned AI providing consistent modeling of good values AND human communities providing the emotional/social infrastructure to actually live those values.

Thanks again for the thought-provoking exchange. You've given me useful frames for thinking about where my work fits in the larger project of human flourishing.

What I Learned by Making Four AIs Debate Human Ethics

Frankle Fry4mo1

This is a fascinating critique that I think identifies a real distinction I hadn't made explicit enough.

You're pointing out that principles don't automatically change behavior - they need psychological/social infrastructure. That's absolutely true for humans.

But I think this actually clarifies what my framework is for:

My framework is primarily designed for AI alignment and institutional design - contexts where we can directly encode principles into systems. Constitutional AI doesn't need emotional motivation or community support to follow its training. Institutions can be structured with explicit rules and incentives.

For human moral development, you're right that we need something different - what you call "behavioral ideology." The AA analogy is perfect: the 12 steps alone don't change behavior; it's the community, ritual, accountability that make it work.

But here's an interesting thought: What if solving AI alignment could actually help with human behavioral change?

If we successfully align AI systems with principles like empathy, integrity, and non-aggression - and those AI systems become deeply integrated into daily life - humans will be constantly interacting with entities that model those behaviors. Children growing up with AI tutors that consistently demonstrate benevolent reasoning. Workers collaborating with AI that handles conflicts through de-escalation rather than dominance. Communities using AI mediators that prioritize mutual understanding.

The causality might work both ways:

We need to figure out how to encode human values in AI (my framework's focus)

But once AI systems consistently embody those values, they might shape human behavior in return

This doesn't replace the need for what you're describing - the community support, emotional education, ritual. But it could be complementary. Aligned AI could be part of the cultural infrastructure that makes benevolent behavior more natural.

So I think the scope question is:

Should my framework try to be both (AI alignment + human behavioral change)?

Or should it focus on AI/institutions, acknowledging that human implementation requires different mechanisms (like what you're describing)?

I lean toward the latter - not because human behavior change isn't important, but because:

AI alignment is where I have the most to contribute

Creating "secular religions" or behavioral movements requires different expertise

Trying to be both might dilute effectiveness at either

That said, your vision of EA evolving to provide emotional/social infrastructure for altruistic behavior is compelling. And perhaps successfully aligning AI is actually a prerequisite for that vision - because misaligned AI could actively work against benevolent human culture.

My question for you: Do you see frameworks like mine as useful inputs to the kind of movement you're describing? Even if AI alignment alone isn't sufficient, could it be necessary? If we get AI right, does that make the human behavioral transformation more achievable?

Effective altruism in the age of AGI

Frankle Fry4mo-3

Your concern about EA's consequentialist lens warping these fields resonates with what I found when experimenting with multi-AI deliberation on ethics. I had Claude, ChatGPT, Grok, and Gemini each propose ethical frameworks independently, and each one reflected its training philosophy - Grok was absolutist about truth-seeking, Claude cautious about harm, ChatGPT moderate and consensus-seeking.

The key insight: single perspectives hide their own assumptions. It's only when you compare multiple approaches that the blindspots become visible.

This makes your point about EA flooding these areas with one ontology particularly concerning. If we're trying to figure out "AI character" or "gradual disempowerment" through purely consequentialist framing, we might be encoding that bias into foundational work without realizing it.

Maybe the solution isn't avoiding EA involvement, but structuring the work to force engagement with different philosophical traditions from the start? Like explicitly pairing consequentialists with virtue ethicists, deontologists, care ethicists, etc. in research teams. Or requiring papers to address "what would critics from X tradition say about this framing?"

Your "gradual disempowerment" example is perfect - this seems like it requires understanding emergent structures and collective identity in ways that individual-focused utilitarian thinking might miss entirely.

Would you say the risk is:

EA people not recognizing non-consequentialist framings as valid?
EA organizational culture making it uncomfortable to disagree with consequentialist assumptions?
Just sheer numbers overwhelming other perspectives in discourse?

What I Learned by Making Four AIs Debate Human Ethics

Frankle Fry4mo0

Thanks for this thoughtful critique, idea21. You've identified something important.

You're right that explicit aggression control isn't front-and-center in the six pillars, though I think it's implicit in several places:

Pillar II (Empathy) includes "minimize suffering" and "balance outcomes with rights" - which would prohibit aggressive harm
Pillar III (Dignity) emphasizes "boundaries of non-harm" and accountability for those wielding power
Pillar VI (Integrity) focuses on aligning actions with values and moral courage

But you're pointing to something deeper: cooperation and trust-building as foundational to moral progress, not just constraints against harm.

I'm curious how you'd operationalize "control of aggression" as a distinct pillar or principle. Would it be:

A prohibition (like the inviolable limits in Article VII: "no torture, genocide, slavery")?
A positive virtue (cultivating non-aggressive communication, de-escalation)?
A systems-level design principle (institutions structured to prevent violent conflict)?
Something else?

Also, your point about "enlightened" (rational + benevolent) vs. just benevolent is interesting. Where do you see the framework falling on that spectrum? I tried to ground it in evidence-based reasoning (Pillar I) while leaving room for diverse meaning-making (spiritual paths, etc.). Does that balance work, or does it risk the irrationalism problem you mention?

This feels like it might connect to the "moral innovation vs. moral drift" distinction in Pillar V - rationality as a guard against drift even when cultural evolution moves toward benevolence.

Would love to hear more about how you'd integrate this.

Frankle Fry

Posts 2

Comments4

Posts
2

Comments
4