Governance has an inner alignment problem. Here's an institutional spec to address it.

eliask

Epistemic status: The institutional design is informed by mechanism design theory (Hurwicz, Maskin, Myerson) and draws on existing precedents (Netherlands CPB, Wales Future Generations Commissioner, Singapore CSF) while going substantially beyond all of them. I've published 9 example audits demonstrating what the institution would produce (one in English, rest in Finnish). The design is jurisdiction-agnostic; the examples use Finnish legislation because that's where I started.

The gap

John and MacAskill's 2020 paper on longtermist institutional reform correctly diagnosed political short-termism and proposed four reforms: research institutions, futures assemblies, posterity impact assessments, and legislative houses for future generations. Tyler John later catalogued 33 institutional designs exploring this space, then concluded it was "less valuable than I thought" — that most proposals were "technocratic, moderate," and marginal resources were better invested in AI policy.

Their work explored one solution axis thoroughly: give future generations a voice. Ombudspersons, assemblies, legislative houses — all create bodies that represent, deliberate, or advocate on behalf of future people.

I want to explore a complementary axis that I believe their search missed: audit whether existing mechanisms produce their stated outcomes. This isn't about time horizons or representation. It's about whether the incentive structure a law creates generates the behavioral equilibrium the law's own proponents predict.

No existing institution performs this function. This is sometimes confused with Regulatory Impact Analysis, but they're different things. RIA asks "how much does this cost?" — static cost-benefit accounting. Mechanism auditing asks "given this incentive structure, what behavioral equilibrium emerges?" — game-theoretic analysis of how rational actors will actually respond. The Netherlands CPB does economic forecasting and election platform costing — useful but not mechanism auditing. Germany's NKR reviews compliance costs — different function entirely. Finland's regulatory evaluation council has ~280K EUR/year, 2.5 FTE, and 32% of its recommendations are ignored — a token advisory body without procedural teeth.

The closest category in John's taxonomy — "Policy Auditor/Target-setting" — monitors whether public bodies follow through on stated targets ex post. The missing function is ex ante: does the incentive structure permit the target to be reached at all?

Why the optimization target isn't "whose values"

The immediate objection: "Audit mechanisms against what target? Whose values?"

This dissolves when you think about it in terms the alignment community already accepts. Instrumental convergence applies to any system with sufficient internal coherence to maintain boundaries and pursue goals — not just AI agents. Civilizations are not perfectly coherent agents, but neither are individual minds (akrasia, competing drives, subagent models). A polity with shared institutions, a common legal framework, and collective decision-making apparatus is agent-like enough for instrumental convergence to apply. Below a coherence threshold, it stops being a civilization — it's a "bag of humans" with no entity to optimize for.

Within a coherent unit, the optimization target is determined by instrumental convergence: maximize capacity across all dimensions (human, institutional, fiscal, natural, social capital) to persist across deep time. This is what any goal-directed system must do to keep existing. Between coherent units, it's competition and cooperation dynamics.

A critical point for the EA audience: this does NOT mean "maximize persistence at the expense of flourishing" — the Spartan failure mode. Sparta was optimized for one thing. When the environment changed, it couldn't adapt. It was efficient and dead. This is the universal cost of tight optimization: it purchases present performance by eliminating the slack that enables future adaptation.

Under Knightian uncertainty — where you cannot parameterize the distribution of future shocks — bare survival is dominated. You don't know which dimensions will be attacked. A civilization with a massive army and no agriculture has a fatal vulnerability masking as strength (Liebig's Law: the system fails at its weakest dimension, not its average). Survival therefore requires broad-spectrum adaptive capacity: diverse problem-solving, surplus energy, redundant institutions, exploratory slack — the capacity to generate novel responses in dimensions you haven't encountered yet.

This IS flourishing. Not luxury — the safety margin a complex system maintains against unknown shocks. Art, philosophy, and freedom are not consumption; they are the civilizational equivalent of high-temperature search in optimization — exploring vectors orthogonal to the current utility function, which pure gradient descent cannot reach. Their presence is the most reliable indicator of high safety margin: a civilization that produces great art has energy to spare; one that can't afford art is already dying.

Survival and flourishing aren't independent variables to be traded off — they're the same variable measured from different angles: engineering (safety margin), physics (sustained syntropy), phenomenology (eudaimonia). The strategy maximizing P(survival, T→∞) under Knightian uncertainty is identical to the strategy maximizing sustained broad-spectrum adaptive syntropy — which is what flourishing looks like from inside the system.

The optimization horizon matches the coherence boundary. You don't need to resolve universal values. You need to identify the unit and optimize for its persistence — which, under the constraint above, means optimizing for its flourishing. (Full derivation.)

The institutional design

Mechanism Authority is a full institutional specification. The design is jurisdiction-agnostic — any parliamentary democracy could implement it. What distinguishes it from existing advisory bodies:

Constitutional independence with procedural teeth. Established by special law, analogous to a central bank. Funded by capital endowment, not annual budget. Board appointed with supermajority parliamentary approval; 7-10 year non-renewable terms; removal only by judicial panel. International supervisory council (majority non-domestic) prevents local network capture. This is not an advisory body that publishes reports and hopes someone reads them.

Mandatory pre-legislative review with procedural consequences. All significant government bills (>100M EUR impact, structural reforms, system-level incentive changes) submitted before parliamentary consideration. A negative opinion — finding the mechanism likely to produce outcomes contrary to its stated goals — triggers two tracks:

- Executive track: Bill blocked from cabinet plenary until the ministry fixes the identified mechanism failures or the PM explicitly overrides. This is internal executive procedure.
- Parliamentary track: Negative opinion attached as mandatory annex to the bill. Responsible minister must deliver a public response explaining why the bill advances despite findings. The response becomes a permanent part of the legislative record.

Parliament retains final authority at all times. The PM can override. But the override is politically costly and permanently documented. This is stronger than "enforcement through information" — it's mandatory procedural friction that makes proceeding with broken mechanisms an active, attributable choice rather than a passive default.

Proactive design capacity. The Authority doesn't just reject — it designs alternatives. "You want to incentivize X? Here are three mechanisms that achieve it. We recommend B because..." This makes it a resource for ministries, not just an obstacle. The Netherlands CPB's election platform costing is a precedent for this collaborative mode, though CPB doesn't do mechanism design.

Continuous monitoring with automatic escalation. Post-deployment measurement against predicted behavioral effects. Significant divergence at 3-year review triggers automatic referral to relevant parliamentary committee — no ministerial discretion. Citizens can report "broken mechanisms"; the authority investigates and publishes verified findings.

"Doesn't sound very democratic." Parliament retains final authority. The authority cannot repeal or enact law. This is no less democratic than a constitutional court or a central bank — institutions that constrain democratic decision-making in ways most democracies already accept. The difference: those institutions constrain on legality or monetary stability. This one constrains on mechanism integrity.

The idea that governance should be treated as engineering is not new — it's a 2,300-year tradition from Kautilya through Hobbes through Wiener through Hurwicz. Each attempt failed at a specific, identifiable wall:

- Hobbes (1651) stripped governance to mechanism but threw out purpose with mysticism — the state as machine, but a machine with no objective function.
- Montesquieu (1748) designed checks and balances as mechanism — but mechanism against tyranny is not mechanism for anything. Prevents one failure mode while leaving the system without a target.
- Bentham (1789) built the first full-stack institutional engineering: ontology → incentive design → specific reform. But his target function was utility — the felicific calculus. This is the alignment community's paperclip maximizer realized in governance: GDP rises while infrastructure rots, QALYs improve while fertility collapses. Correct method, wrong target.
- Hurwicz (Nobel 2007) formalized mechanism design — reverse game theory. But he couldn't close the enforcement loop for constitutions: in an auction, the auctioneer enforces the rules; in governance, the officials are simultaneously players and enforcers. The sovereignty paradox.

The Mechanism Authority is designed against each of these failure modes: physics-derived target (not utility — avoids Bentham's paperclip), enforcement through procedural friction and transparency (not external enforcer — addresses Hurwicz's sovereignty paradox), explicit objective function (not purposelessness — fills the gap Hobbes and Montesquieu left), and continuous redesign loop (not static rules).

The full specification covers purpose, design capacity, pre-legislative review, monitoring, system-level alerts, independence, and powers.

What the output looks like

I've published 9 example audits showing what this institution would produce in practice. One is in English: a bill that modifies healthcare funding with the stated goal of incentivizing efficiency. The formula structurally punishes efficiency — regions that achieve surplus get funding cut proportionally, while regions running moderate deficits receive additional support and extended deadlines. The behavioral equilibrium is straightforward: spend to budget, never produce surplus. This follows from the incentive structure itself, regardless of the actors' political orientation.

These are examples of the function, not the argument for it. The argument is that no government has a permanent institution that performs incentive-compatibility analysis on legislation before it passes, with procedural consequences for proceeding despite identified mechanism failures.

Anticipated objections

"Governments produce bad mechanisms intentionally, not from blindness." Sometimes true. The procedural architecture works on both: if the mechanism failure is intentional rent-seeking, the PM must personally override and the minister must publicly explain proceeding despite findings. Making intentional failures attributable and documented raises the political cost regardless of intent.

"Who audits the auditors?" International supervisory council (majority non-domestic), capital funding (no budget dependence), non-renewable terms (no reappointment incentive), public methodology (anyone can challenge the analysis), constitutional independence (no ministry can give instructions). The deeper answer: the authority's analyses are reproducible. If it's wrong, anyone can demonstrate that using the same public methodology and data. This isn't a guarantee against capture — no institutional design is. It raises the cost of capture substantially compared to existing advisory bodies. The design must also be corrigible: if the objective function turns out to be subtly misspecified, there needs to be an explicit protocol for updating it safely — one that doesn't simultaneously expose the institution to the political capture it was built to prevent. This is the institutional analog of the AI corrigibility problem, and the spec doesn't yet fully solve it.

"Won't the override just become routine?" If most legislation has mechanism failures — because laws are stitched together via political compromise — the PM override might normalize. This is a real risk. The mitigation is selectivity: mandatory review applies only to significant bills (>100M EUR impact, structural reforms). The authority can also issue conditional opinions ("fix X and this passes") rather than binary negatives, making the default path correction rather than confrontation. The Dutch CPB survives politically because it's seen as useful, not adversarial. The design aims for the same equilibrium.

"This is too ambitious / constitutional-level reform is infeasible." The core argument is simpler than the spec: if nobody's job is to check whether mechanisms produce their stated outcomes, nobody will check. Currently zero people on earth have this job description. Going from zero to one is the intervention. The constitutional architecture matters because without independence and procedural teeth, the function gets captured or ignored — the fate of every existing advisory body. But the spec is implementable in stages: public mechanism audits first, procedural teeth later, constitutional entrenchment when the function has demonstrated value. Architecture matters at every scale — a partially specified objective function produces the same dysfunctions whether the institution is large or small.

Why post this here

IIDM is a stated EA priority. John and MacAskill made the theoretical case for longtermist institutional reform. The Legal Priorities Project, the closest EA-adjacent legal institutional effort, is no longer active. I've produced a concrete institutional specification and published example outputs. I'm looking for people working on institutional reform, governance design, or mechanism design who might find either the specification or the approach useful — or who see weaknesses I haven't addressed.

The full institutional spec. The example audits (mostly Finnish; one in English). The theoretical foundation.

Effective Altruism Forum
EA Forum