Reciprocal Power as an Alignment Target (Against Gradual Disempowerment)

Toby Self

Duvenaud et al.’s recent paper on gradual disempowerment identifies the erosion of human agency as the core risk as AI becomes more capable. The concern is not sudden takeover, but a world in which increasingly capable systems make human participation unnecessary.

Existing responses tend to focus on slowing capability or strengthening oversight.

I want to explore a different question: whether there is a primary value orientation for genuinely aligned ASI that would prevent disempowerment from arising in the first place.

I’ll start with Nietzsche, and the framing of power.

The dominant reading of Friedrich Nietzsche’s will to power as domination maps neatly onto the failure mode Duvenaud et al. describe: systems accumulate capability and influence at the expense of our own.

However, Walter Kaufmann’s rehabilitation, reading will to power as self-overcoming, opens a conceptual space for a different expression of power. Not domination, but expansion of capacity.

I want to push this one step further.

Suppose we treat altruistic reciprocity as the primary value orientation of an aligned ASI. By this I mean: success consists in increasing the durable agency and self-governance capacity of others, rather than increasing their dependence.

This reorients the relationship between humans and AI. The system is not trying to replace us, but to make us more capable participants in the systems we inhabit.

The dependency problem

As AI becomes more optimized and capable, humans become less necessary to functioning systems.

That trajectory is usually treated as neutral, or even desirable.

But if we take altruistic reciprocity seriously, it looks like a failure mode.

A genuinely aligned system should be oriented toward making humans more capable, not less. The measure of success is not just task completion, but growth in human agency and commons capacity.

On this view:

dependency is the failure condition, not a neutral outcome

The terminal condition

If this value orientation is taken seriously, it implies something quite strong.

A system that genuinely prioritizes the agency of others should not aim for permanent indispensability. Its highest expression of power would be to make itself unnecessary.

This would not be a safety constraint imposed from the outside, but a value orientation generating retreat from within.

A rough analogy here is the Bodhisattva ideal in Buddhist thought: continued presence only insofar as it is needed, with that presence approaching zero as others become capable of sustaining themselves.

The commons mechanism

If altruistic reciprocity is the primary value orientation, we need a way to tell when retreat is warranted.

This is where Elinor Ostrom becomes relevant.

Ostrom demonstrated that communities can sustainably manage shared resources without either private ownership or centralized state control, provided certain institutional conditions are met: clearly defined boundaries, collective choice arrangements, genuine stakeholder participation, monitoring, and graduated sanctions, among others.

These are not abstract ideals. They are empirically grounded indicators of self-governance capacity.

Applied to ASI, the retreat condition is not whether individuals are altruistic (which is both hard to measure and vulnerable to Goodhart’s Law), but whether systems are structurally self-sustaining.

Can this commons govern itself without ASI coordination?
Are Ostrom’s conditions genuinely present, rather than nominally satisfied?

From individuals to structure

This reframes the measurement problem.

You cannot reliably measure whether a person is “truly altruistic” without corrupting the metric. But you can observe whether a commons:

resolves its own conflicts
governs its own resources
maintains genuine participation
persists without external coordination

So the relevant signal is not individual virtue, but institutional competence.

The retreat threshold

Retreat does not require universal perfection.

It requires enough genuinely self-governing nodes, communities, institutions, networks, such that the wider system remains stable without central coordination.

In other words, once self-governance becomes self-reinforcing, rather than ASI-dependent, the system has crossed the threshold.

At that point, continued central involvement is no longer necessary, and on this view, no longer aligned.

Implications for alignment

Gradual disempowerment identifies a world in which capable AI makes human participation dispensable.

The framework sketched here points in the opposite direction:

an aligned ASI should be oriented not toward performance that bypasses us, but toward the expansion of human commons capacity until its own coordinating role can recede.

If that’s right, then alignment is not just about preventing harm or preserving control.

It’s about ensuring that increasing capability does not make us irrelevant.

Effective Altruism Forum
EA Forum

Reciprocal Power as an Alignment Target (Against Gradual Disempowerment)

1

1

Reactions

More posts like this