Most of my life I have been studying how humans make moral choices. I have been working on human moral dilemmas for decades and eventually brought all that experience into AI alignment. I have been building a framework for moral verification that comes out of years of thinking about how people actually reason about right and wrong.
I test the framework constantly, try to break it with edge cases, and run it through every LLM I can to see where it fails. So far it has held up far better than I expected, which is why I am working to take it to the next stage. My goal is to get this work into the world so alignment research can benefit from a more structured and testable approach to machine ethics.
I am here to learn, and to connect with people who are trying to solve the same hard problems in AI safety. If anyone wants to talk about moral structure, decision verification, or alignment failure modes, feel free to reach out.
Maybe it's just me, but this looks like a win for Anthropic. Bad actors will do bad things, but I wonder why they would choose to use Anthropic instead of their own Chinese AI, where I would assume the security is less rigorous, at least to their own state actors, no? I had Claude quickly dig this up for me, and from what he said, it occurred as far back as mid-September 2025, which would indicate this release had intentional timing. Anthropic chose to announce during peak AI governance discussion, framing it to emphasize both the threat and defense value of their systems. The delay between September detection and November announcement allowed them to craft a narrative that positions Claude as both the problem and the solution, which is classic positioning for regulatory influence. Nothing wrong with that I suppose...?