AGI Battle Royale: Why “slow takeover” scenarios devolve into a chaotic multi-AGI fight to the death

titotal

AGI Battle Royale: Why “slow takeover” scenarios devolve into a chaotic multi-AGI fight to the death

titotal

18 min readSep 22, 2022

Comments 12

Sorted by

New & upvoted

Robi Rahman🔸

A paperclip maximiser and a pencil maximiser cannot “agree to disagree”. One of them will get to tile the universe with their chosen stationery implement, and one of them will be destroyed. They are mortal enemies with each other, and both of them are mortal enemies of the stapler maximiser, and the eraser maximiser, and so on. Even a different paperclip maximiser is the enemy, if their designs are different. The plastic paperclipper and the metal paperclipper must, sooner or later, battle to the death.
The inevitable result of a world with lots of different malevolent AGI’s is a bare-knuckle, vicious, battle royale to the death between every intelligent entity. In the end, only one goal can win.

Are you familiar with the concept of values handshakes? An AI programmed to maximize red paperclips and an AI programmed to maximize blue paperclips and who know that each would prefer to destroy each other might instead agree on some intermediate goal based on their relative power and initial utility functions, e.g. they agree to maximize purple paperclips together, or tile the universe with 70% red paperclips and 30% blue paperclips.

Linch

Related: Fearon 1995 from the IR literature. Basically, rational actors should only go to war against each other in a fairly limited set of scenarios.

Mau

+1 on this being a relevant intuition. I'm not sure how limited these scenarios are - aren't information asymmetries and commitment problems really common?

mako yass

Today, somewhat, but that's just because human brains can't prove the state of their beliefs or share specifications with each other (ie, humans can lie about anything). There is no reason for artificial brains to have these limitations, and any trend towards communal/social factors in intelligence, or self-reflection (which is required for recursive self-improvement), then it's actively costly to be cognitively opaque.

Linch

[This comment is no longer endorsed by its author]

Michael St Jules 🔸

Double comment?

Linch

I agree that they're really common in the current world. I was originally thinking that this might become substantially common in multipolar AGI scenarios (because future AIs may have better trust and commitment mechanisms than current humans do). Upon brief reflection, I think my original comment was overly concise and not very substantiated.

Vasco Grilo🔸

For reference, here is a seemingly nice summary of Fearon's "Rationalist explanations for war" by David Patel.

Michael St Jules 🔸

CLR just published a related sequence: https://www.lesswrong.com/posts/oNQGoySbpmnH632bG/when-does-technical-work-to-reduce-agi-conflict-make-a

Vasco Grilo🔸

Nice point, Robi! That being said, it seems to me that having many value handshakes correlated with what humans want is not too different from historical generational changes within the human species.

Charles He

This seems basic and wrong.

In the same way that two human super powers can't simply make a contract to guarantee world peace, two AI powers could not do so either.

(Assuming an AI safety worldview and the standard, unaligned, agentic AIs) in the general case, each AI will always weigh/consider/scheme at getting the other's proportion of control, and expect the other is doing the same.

based on their relative power and initial utility functions

It's possible that peace/agreement might come from some sort of "MAD" or game theory sort of situation. But it doesn't mean anything to say it will come from "relative power".

Also, I would be cautious about being too specific about utility functions. I think an AI's "utility function" generally isn't a literal, concrete, thing, like a Python function that gives comparisons , but might be far more abstract, and could only appear from emergent behavior. So it may not be something that you can rely on to contract/compare/negotiate.

Robi Rahman🔸

8mo

In the same way that two human super powers can't simply make a contract to guarantee world peace, two AI powers could not do so either.

That's not true. AI can see (and share) its own code.

Comments