Hide table of contents

This post explores a big-picture idea - that our civilization is more resilient when power is fragmented, in contrast with concentrated power.  A world with fragmented power cannot be dominated or destroyed by a small group, and so is more resilient.  Concentrated power is what enables existential risk or lock-in of bad values.

This is the common thread across existential risks, which come mainly from enormous destructive power in the hands of individuals or small groups.  This applies to nuclear weapons, bio risk, and most especially AI risk.  Even an AI that is well-aligned with its developer or owner is a source of concentrated power, and a source of existential risk in the wrong hands.  I argue that avoiding concentrated power is the key to AI safety, as even a misaligned or malevolent AI is not a problem if its power is small.

This post first gives some examples of fragmented and concentrated power, and examples of humans intentionally designing systems to disperse power for greater resilience.  It then proposes general strategies to apply the principle of fragmented power to reduce AI risk - reducing interconnectedness, building a society of AIs, imbuing AI with an aversion to power, and colonizing space with diverse societies.

In short, for AI risk or existential risk in general, our strategies should aim to disperse power in order to improve resilience.

Examples of Fragmented Power

Both biological systems and human society demonstrate how fragmented power creates resilience.

A biological organism can survive the death of any one cell, and repair commonly-occurring damage (e.g. wounds healing on their own).  It has an immune system to recognize infections or cancerous cells trying to amass too much power, and stop them.  Furthermore, the organism does not depend on living forever - it reproduces and relies on its descendants.  An ecosystem of organisms does not depend on any one individual.  A local disaster will not cause extinction, as others of the same species survive elsewhere.

Human society is similar.  No one individual is critical, and local disasters do not cause extinction.  Even if an individual amasses power, they can exercise power only through other humans - a difficult feat to preserve forever.  Human psychology contains mechanisms - humor, gossip, jealousy, justice - to guard against domination by powerful individuals.  Even the most powerful individual will eventually die.  (For example, the empires of Alexander the Great and Genghis Khan did not survive after their deaths.)  And historically, geographically separate civilizations could serve as opponents of or refuges from failed or malevolent civilizations.

A market economy also demonstrates resilience.  Individuals pursuing their own goals will fill the unmet needs of others, and fill in gaps as situations change.  For example, even in the massive disruption of early COVID lockdowns, there was still food in the grocery store - due to a million adaptations by individuals and companies.

Examples of Concentrated Power

Modern human society has developed concentrated power that reduces resilience:

  • Dictatorial rule over large territories
  • More destructive military technologies, most especially nuclear and biological weapons
  • A globally integrated economy with key dependencies provide by only a small number of participants
  • Global computer networking and automation
  • Worldwide governance, if that governance were to turn more powerful
  • In the future, advanced artificial intelligence

These sources of concentrated power give individuals or small groups enormous potential for destruction.

Even if a lever of power is not intentionally exercised for harm, a mistake could lead to catastrophe if it triggers worldwide effects.  So anything introducing more connections and dependencies reduces resilience.

Examples of Intentional Dispersal of Power

In some domains, humans have recognized the value of dispersing power, and intentionally designed systems to do so:

  • Systems of government often aim for checks and balances to limit the power of individual government officials.  For example, this was a key design principle for the U.S. Constitution.
  • Individual rights such as freedom of speech protect against domination and against society-wide groupthink.
  • Large-scale software systems are designed with no single point of failure, and designed to withstand individual hardware failures.  They further distribute data and computation across multiple physical locations to be resilient against local disasters.  And they maintain backups to recover from system failures.
  • Safety-critical systems are isolated from outside influence.  For instance, nuclear missile control systems are not connected to the Internet.
  • Blockchains aim to build a system that cannot be dominated by one individual, organization, or cartel.
  • Moral pluralism guards against ideological extremism.

These design choices intentionally sacrifice some efficiency for the sake of resilience.  Democracy may be slow and indecisive.  Redundancy in software systems uses more hardware, raising cost.  Disconnected systems are less convenient.

It's often more efficient to concentrate power, so there's a strong incentive to do so.  We must be intentional to design for fragmented power, and be willing to pay the cost.

Applying Fragmented Power to AI Risk

To be resilient, our civilization must not concentrate power too much.  And this concentration of power is the reason we are concerned with AI safety.  Misaligned AI, or AI in the hands of malevolent individuals/groups, is not a problem on its own.  It only becomes a problem when that AI can exercise substantial power.

How might we use this insight to protect against AI risk?  Some possibilities:

  1. Create a well-aligned and powerful "policeman" AI system which exercises its power only to stop misaligned AI systems from being created or exercising power.
  2. Have many AI systems, no one of them too powerful.  This aims to create something analogous to human society, where individuals check the power of each other.
  3. Imbue AI systems with an aversion to exercising power.
  4. Keep the AIs contained in a box with minimal influence on the physical world.
  5. Keep the world sufficiently disconnected that no individual or group can dominate everywhere.  For example, space colonization might protect against this.

The last two of these would drastically reduce the potential benefit of AI, and AI developers/owners would have strong incentive to violate them - so we would be unwise to rely on these as solutions.  But #1, #2, & #3 are somewhat related and worth thinking about.  The space colonization approach of #5 might also have potential.

"Policeman" AI

Assuming a world where many individuals/organizations can create AI systems, safety ultimately depends on stopping the misaligned AI systems that will inevitably be created.  We need something analogous to police in human society, or analogous to an organism's immune system.

This policeman can allow other AIs to exist, as long as they don't accumulate too much power.  So I can have an AI assistant that helps plan a vacation, but that assistant wouldn't help me plan world domination.

The most straightforward solution is to build an AI system to take this job.  This system would itself be very powerful, but aligned to exercise that power only to prevent others from accumulating power.

But this works only if (1) we can create a well-aligned AI system and (2) the first organization to develop a powerful AI system wants to give it this policeman function.  Both of these are risky to depend upon.

A Society of AIs

Perhaps the "police" could be not a single all-powerful AI system, but a common project for a society of AIs, much as individual humans serve as police officers.

A society of AIs must protect against concentrated power in its ranks, whether by an individual AI or by a group of AIs.  But an AI has advantages in amassing power that a human does not - an AI can live forever, can easily copy itself, and can improve its own intelligence or design.  Perhaps we might be wise to develop the society of AIs within a system that artificially introduces some frailties similar to humans - finite lifespan, imperfect copying via something akin to sexual reproduction, evolutionary pressure.  This would require that we only run AIs within some hardware/software framework that rigorously enforces the frailties, which seems difficult to guarantee.

Aversion to Exercising Power

An aversion to exercising power could improve resilience.  We might apply this to a dominant "policeman" AI, so that it will do only the minimum required to prevent other misaligned AIs from gaining power.  Or we might introduce it as a "psychological" trait in a society of AIs.

How would we turn this general idea into something actionable?  Much as regularization in ML training penalizes large weights, the AI system's objective function or training rewards would penalize heavy-handed exercise of power.

The main difficulties are:

  • How much to penalize exercise of power.  It seems difficult to properly balance minimal use of power vs the AI's other objectives.  If the aversion to power is too strong, we lose much of the benefit that AI could bring.  If it is too weak, we risk domination by power-mad misaligned AI.
  • How to define power and how to grade different forms of power.  Of course murder would be heavily penalized.  What about subtle psychological manipulation of a large group of people to swing the outcome of an election?  Is that a large exercise of power, or a small one?
  • What about groups of like-minded AIs?  If a billion copies of an AI each exercise a small amount of power, does this amount to a large exercise of power that they should be averse to?

Space Colonization

Most speculatively, in the farther future space colonization could create pockets of society that are minimally connected, much as historical human societies were isolated by geographic barriers.  If a society controlling a solar system or galaxy has a defender's advantage against outside invaders (by no means clear this must be the case, but possible), a diversity of societies could persist.  Or if we spread at close to the speed of light, descendants on opposite sides of the expansion wave would not have contact with each other.  A strategy of deliberately sending differently-designed AI probes in different directions could create a diversity of descendants and create resilience in the sense that goodness will exist in some portion of the universe.  Fragmentation of power would be maintained by the laws of physics.

What To Do?

So what should we do?  The high-level takeaway from this post is that our strategies should aim to disperse power in order to improve resilience.  The most promising strategies seem to be:

  • Reduce interconnectedness in the physical world.
  • Build a society of AIs that will check each others' power, and in particular will fight against the emergence of any center of excessive power.
  • Imbue AIs with a "psychological trait" of aversion to exercising power.
  • In the long term, seed space with a diversity of societies.

Honestly this list is not very actionable, but I hope it might provide inspiration for more specific ideas.

2

0
0

Reactions

0
0

More posts like this

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f