Against Agents as an Approach to Aligned Transformative AI

𝕮𝖎𝖓𝖊𝖗𝖆

Epistemic Status

Written (very) quickly^[1].

Thesis

The creation of superintelligent agents — even aligned ones — would with high probability entail a kind of existential catastrophe^[2].

Introduction

Agents seem kind of bad. Even assuming we solve intent alignment^[3] for agents, I don't know that I would be happy with the outcome.

I'm not convinced that my vision of a flourishing future is at all compatible with the creation of agentic superintelligences.

I've had many discussions about what the post singularity landscape would look like in a world where alignment succeeded, and I've not really been satisfied with the visions painted. It just doesn't seem to admit my values.

Why Do I Oppose Agents?

I strongly value autonomy and enhancing individual and group autonomy is essential to what I think a brighter future looks like. Superhuman (aligned) agents seems liable to hard break autonomy (as I conceive of it).

I think the crux of my disagreement with aligned superhuman agents is human obsoletion. In a world with strongly superhuman agents, humanity would quickly be rendered obsolete.

The joys of invention, the wonders of scientific discovery, the triumph of creation, many forms of self actualisation would be placed forever beyond our grasp. We'd be rendered irrelevant.

The idea of other minds accomplishing all of those on our behalf? It just isn't the same. It doesn't just matter that some accomplishments were made, but that I (or you, or others) made those accomplishments.

I do not want a corrigible intent aligned godlike nanny serving my every whim; I want to be a god myself goddammit.

When I try visualising the future in a world of aligned superhuman agents, I find the prospect so bleak. So hopeless, as if life has been robbed of all meaning.

"Our final invention" is not something I rejoice at, but something that grips me with despair. It would be an existential catastrophe in its own right.

What Do I Hope For?

My desired outcome for the AI project is to develop superhuman general intelligences that are not agents^[4], that have no independent volition of their own. I want AI systems to amplify human cognition, not to replace it or supersede it.

I want humans to ultimately supply the volition. To retain autonomy and control over our future.

Agents Are Not Necessary?

This is not an impossible ask. Agents do not seem strictly necessary to attain AI systems with transformative capabilities.

I currently believe that:

We do not need to instantiate agents to get superhuman general intelligences with transformative potential
- E.g. I've argued that in the limit, selecting systems for minimising predictive loss on next token prediction on humanity's text corpus converges to (strongly superhuman) general intelligences
It is not necessarily easier to reach AGI via the agent archetype than via other archetypes
- I currently expect AGI to first be reached via self supervised learning (or for self supervised learning to do most of the cognitive lifting in the first generally intelligent systems we create

^{^}
Written as a stream of thought in a 30 minute sprint so that it gets written at all.
Left to my own devices, I'd never be able to write this up in a form I'd endorse anytime in the near future. A poor attempt seems marginally more dignified.
^{^}
Albeit one of the better kinds.
^{^}
We identify an optimisation target that we'd be happy where we fully informed for arbitrarily powerful systems to optimise for and we successfully align the agents.
Leaving aside the case of whether such optimisation targets exist, let's tentatively grant them for now.
^{^}
Janus' Simulators offers an archetype that seems to provide a promising pathway to non agentic superhuman general intelligences.

Effective Altruism Forum
EA Forum