Hi all,
I recently published a preprint outlining a framework I’ve been developing called The Rawlsian Architecture, and I’m looking for people who might be interested in helping build it. I thought I’d share a brief overview here and see if it resonates with anyone.
The core problem
Current LLMs don’t really distinguish between reasoning about facts and making moral judgements. Under the hood, both are handled as probabilistic prediction. There’s no real structural boundary between “what’s true?” and “what’s right?” That blur is a big part of why we see issues like sycophancy (models adjusting truth claims to fit what the user seems to want) and hallucination (outputs that are internally coherent and confident, but factually wrong).
The basic idea
Instead of trying to solve this purely through retraining or fine-tuning a single model, the framework tackles it at the system level. Think of it as a separation of powers.
There are three stages.
First, an Intake Agent takes the user’s request and reduces it to a clean abstract specification. The goal is to separate the substance of the request from contextual elements like tone, emotional cues, or conversational history.
Second, multiple Delegate Agents run in parallel. Each has a distinct role — safety, legal, ethical, and so on. They don’t see each other’s reasoning, and they don’t see the original user context. All they receive is the abstract specification. Their job is simply to generate valid options from their assigned perspective.
Third, a Decision Engine formally votes on those options. The voting mechanism is MILO, a system designed to minimise harm. A simple majority vote can easily pick an option that benefits most people while still seriously harming a minority. MILO instead selects the option that attracts the least overall objection, rather than the most overall support. Once a decision is made, a Delivery Agent composes the final response.
By keeping agents blind to one another, the architecture reduces the risk of collusion, anchoring, and groupthink — problems that are hard to avoid in a single monolithic model. The individual models may still be opaque, but the overall process becomes structured and auditable.
Related pieces
The preprint connects to two other frameworks I’ve developed.
The first is a Claim-Admissibility Framework which is a set of formal axioms for filtering out paradoxical constructions before they reach the decision stage. While models can usually manage simple paradoxes, they often struggle with more complex, multi-step antinomies. This framework is meant to catch those early.
The second is the MILO Voting System itself. It was originally designed for high-stakes human electoral settings, but its axiomatic structure translates naturally to multi-agent AI decision processes.
If you’re interested
The implementation side is intentionally open-ended. The research is still early, and there are real open questions worth exploring collaboratively.
If this sounds interesting and you’d like to read the papers or discuss the idea further, feel free to reply here or send me a message.
Also, I do apologise for the slight AI-written tone. I just passed it through Claude and ChatGPT and I think the end result is more readable compared to the original text, so I am going to use their output.
Thanks a lot.
