Update: As of may 2022 I have basically given up on this line of thinking and am happy to delete the post.

(Made minor edits on 3 Jan 2022)


Here's my thoughts on what I see as a potential research area for those studying exponential technology and risks. This piece may be more useful to strong long-termists, but should also be useful to others. I make no claims regarding whether it should lie on the current margin of funding - this might depend on factors like what your AI timeline is or how long-termist you are - but I felt it was important enough to be worth sharing. And even that I'm not entirely sure. I haven't really gotten feedback so I'd super keen on any feedback.

Here's the singular question that this potential research area will attempt to answer:

Problem statement

How do you design an institution that can responsibly wield exponential technology?

(With a focus on technology that is capable of causing mass suffering or death)

So this would include questions like - how do you design an institution that can responsibly wield nuclear weapons? how do you design an institution that can responsibly wield mind control technology? And so on.

Potential approach

I also have some thoughts on a particular way in which this problem could be approached.

 - Design should be attempted from first principles, rather than merely inheriting all the designs of existing institutions and suggesting incremental improvements.

 - Design could be attempted independent of the technology itself being controlled. This is more contentious because a lot of research today focusses on very specific technology (nukes, biohazards, AI), rather than designing for unspecified technology that will only exist in the future.

I will try justifying this particular approach to solve the problem later in the post, but it's far from set in stone. Before that I'll talk about the problem.

Exponential technology available grows by default (political VWH)

I find the framing of a "default case" or attractor state particularly helpful when thinking long-term. The long-termist endeavour often assumes that such default cases are useful, that given enough time, some things are inevitably going to happen. I've read AI alignment theorists go so far as to say all-out nuclear war wouldn't save us from misaligned AI because we will inevitably develop misaligned AI again, given enough time for civilisation to recover and restart the innovation process. That would be an example of the unusual yet not entirely useless insights that could come out of thinking in such a frame.

Being able to predict things that will inevitably happen, no matter what, is particularly useful from a long-termist frame because it can point at specific things where intervention is knowably useful, in a world where it is highly unknown what the long-term impact of anything is.

Let's look at some potential default cases.

The default case, as civilisation grows and time passes, is that we discover more and more technology. This default case remains constant as long as the institutions and incentive structures that enable it are constant (or atleast don't change much).

The default case is that a lot of this technology is exponential in nature. I'll define exponential technology to be technology that has components of speed and scale. Namely that it can be deployed quickly and it can impact large numbers of people once discovered. Note that, for the most part, this is a good thing. We want technology such that something invented today can within 20 years meaningfully improve the lives of 7 billion people.

The default case is that the innovation process that discovers such technology is highly unpredictable. Bostrom models the human innovation process as an urn out of which balls are drawn out at random, in his paper on the Vulnerable World Hypothesis. This default case can change. There are proposals such as differential tech progress that may try to make some types of innovation more predictable or slow. I'll get back to this later.

The default case is that some of this technology exposes new ways in which individuals can exert power over each other. For instance, stealing resources in the past required someone to physically break into your shelter and rob you. Today it might be a dictator locking you out of your bank account, or being a victim of identity fraud. 

The default case is that some of this power-exerting technology is exponential. Meaning again, that it can be deployed with speed and at scale. Weapons of war have gone from bullets to nuclear bombs to autonomous drone swarms - each subsequent innovation reduces the time and human resources that need to be expended to have impact at scale.

Current institutions break by default

Institutions that weild power require certain internal incentive and power structures between their members to function, and avoid principal-agent problems. Incentives include economic incentives, these could be salaries and bonuses for following rules, and being fired or penalised for not following them. There could be social incentives - such as being looked up to or looked down on based on how you behave. 

The more power an institution needs to weild, the more potent the internal power structures may need to be as well. The captain of a nuclear submarine is unlikely to be obedient to orders solely because they'll lose their salary if they don't. Social ostracisation, imprisonment or threat to life could be more powerful incentives. So could be intrinsic motivations of the person - if they genuinely believe it is the "right thing" for them to do to follow orders.

Institutions today are not capable of responsibly weilding technology of infinite power; there is a finite power level beyond which principal-agent problems re-appear. To some extent this has already happened, there is only so much control that democratic vote today has over how their own country uses nukes or bioweapons or surveillance technology.

Random examples of very powerful technology we don't have yet: (biological) viruses that can activate lethality based on electronic signal, drone swarms that can navigate hostile airspace and control populations with limited human input, narrow AI with sufficiently high capabilities at human persuasion, digital minds over which we have root access, and so on.

The default case is we eventually invent a technology that is sufficiently powerful that our current institutions cannot weild them in a stable and responsible fashion.

What changes these default cases?

This is an open question that is likely already being considered by researchers, and not the main focus of my post.

The default case that we always have structures that enable innovation can change in a number of ways. Funding can be reduced, research may be socially unpopular, a catstrophic event may set our research process back in time, and so on.

The default case that this innovation is unpredictable, might be changeable by impacting researchers' ethics, by funding differential tech progress and so on. It's an open question whether differential tech progress is sustainable long-term.

The default case that innovation will be used by power-seeking actors and implemented in the real world, might be changeable. A responsible global governance institution could prevent such technology from being implemented, by controlling supply chains or the flow of information. Creating such a responsible institution is definitely something that falls within the scope of this post.

Institutions that need to control powerful technology that has both been invented and implemented irl, also need to be designed to responsibly weild them.

What does it mean for an institution to weild power "responsibly"?

This is a difficult question that can easily break down to philosophical disputes, but some properties that seem mostly desirable are:

 - Democratic - It should be difficult for such institutions to be captured by the interests of the few against the interests of the many. For instance a majority ethnic group or even a single nation should not gain control over the rest of the world. Some level of democratic control may be desirable. Note however that this is an uphill battle, because control over any technology is somewhat centralised by default and decentralising this requires solving principal-agent problems. Solving this may also benefit from other interventions such as cultural shifts but that's outside the scope of the post.

 - Non-extremist - It should be difficult for such institutions to be captured by the interests of people that explicitly desire human catastrophe, or people who are willing to significantly risk human catastrophe at the altar of a better future, and so on.

 - Self-limiting - Institutions whose values are narrow in nature, should not develop their power beyond necessary because human values are not narrow. For instance an army should (usually) not act to dissolve all other democratic institutions (iff we continue to retain such division of responsbilities and values between institutions).

 - Monopolising - Some institutions will have seek to ensure that the powerful tech they weild does not fall in the hands of people or groups outside of the institution itself.

 - No accidents - Institutions should reduce the likelihood of errors causing catastrophic outcomes, where errors here refers to errors that come into being without the deliberate intention of any human inside or outside of the institution

There may be more desirable properties that are required, this is an open question.

Why design institutions from first-principles?

There have been suggestions that we should act to reform existing institutions such that they are more long-termist or can responsibly weild more power. This however inherits all the baggage from existing institutions that, for the most part, have been designed with different objectives in mind. Thinking from first-principles allows for more creative thinking.

Thinking in terms of default cases or attractor states is another tool developed specifically by long-termists.

It's possible that first-principles thinking will invent more such tools and mentals models.

Institutions designed on paper from first principles still need to be implemented in the real world. At this point there is a question of whether new institutions should be built or existing institutions should be modified. But this question can be tackled once some ideal designs have been obtained.

Why design institutions independent of the technology?

This is something I'm unsure of, it may be that institutions designed with very specific technology in mind are useful. For instance an institution designed deliberately to weild nukes, or an institution designed deliberately to prevent anyone from deploying misaligned AI.

But it could also be helpful to establish some base principles that are useful no matter how powerful the tech is. To borrow a mental model from AI alignment theorists, they often emphasise the importance of not assuming that AGI will be below a certain intelligence level, because you can't know for certain. Similarly, given a highly unpredictable innovation process, perhaps we should not make assumptions on an upper limit to how powerful the tech we may discover in the future. And therefore we should design institutions that can safeguard those too.

Potential objections to this research

 - "Differential tech progress is sustainable and we can ensure we will never invent very powerful technology or need institutions to weild it." - I'd be keen on research that shows this as true, but until then I'm definitely inclined to think research producing dual-use technology will continue to exist.

 - "AI alignment will be solved and then we won't need to safeguard the human innovation process ourselves." - I agree that if your priors for AI alignment being solveable are high, that would be a more promising direction. My proposal does assume we don't solve AI alignment soon.

 - "We should stop all technological progress and innovation." - If your reason for stopping tech progress is the lack of institutions that can responsibly use (or atleast prevent misuse) of its outputs, that is all the more reason to study institution design. Even proving that designing such institutions is intractably hard, requires work in institution design. If your reason for stopping technological progress is something else, I'd be keen to know what it is.

 - "First-principles design is intractable and misses important situation-specific details" - This could easily be true, I don't have a strong opinion on it, just intutions.

I can't immediately think of more objections but I haven't spent too much time on it - so I'm super keen to hear more viewpoints. Both on objections, and anything else that adds value to this topic.


5 comments, sorted by Click to highlight new comments since: Today at 10:32 AM
New Comment

Just stumbled upon this post--I like the general vein in which you're thinking. Not sure if you're aware of it already, but this post by Paul Christiano addresses the "inevitable dangerous technology" argument as it relates to AI alignment. 

 - "First-principles design is intractable and misses important situation-specific details" - This could easily be true, I don't have a strong opinion on it, just intutions.

I think this objection is pretty compelling. The specific tools that an institution can use to ensure that a technology is deployed safely will ultimately depend on the nature of that technology itself, its accessibility/difficulty of replication, the political/economic systems it's integrated into, and the incentives surrounding its deployment. (Not an exhaustive list.)

Usually, any type of regulation or "responsible power-wielding" comes with tradeoffs (to freedom, efficiency, equitability, etc.), and it'll be hard to assess whether these accepting these tradeoffs is prudent without a specific technology in mind.

That said, I think it can still be a worthwhile exercise to think about how we can build governance practices that are robust to worst-case scenarios for all of the above. I can imagine some useful insights coming out of that kind of exercise!

Thanks for your reply. I did in fact realise the same thing later on! I could send you my attempt or link it here if you're interested.

Thanks for posting your attempt! Yeah, it does seem like you ran into some of those issues in your attempt, and it's useful information to know that this task is very hard. I guess one lesson here is that we probably won't be able to build perfect institutions on the first try, even in safety-critical cases like AGI governance.

Im not sure how to upload a doc on phone so I'm just copy pasting the content.

If this isn't valuable for the forum I'm also happy to take it down (either the comment or the post)

This is mostly a failed attempt at doing anything useful, that I nevertheless wish to record.

See also: https://forum.effectivealtruism.org/posts/AiH7oJh9qMBNmfsGG/institution-design-for-exponential-technology

There is a lot of complexity to understanding how tech changes the landscape of power and how bureaucracies can or should function. A lot of this complexity is specific to the tech itself.

For instance, nuances of how nukes must be distributed can affect how its command structure can look like, or how the uranium supply chain must look like. Or nuances of what near-term AI cannot or cannot do, for surveillance and enforcement, can affect the structure of bureaucracies that wish to wield this power. Or nuances of how compute governance can work, for AGI governance structures.

I wondered if it possible to abstract away all of this complexity by simply assuming any tech allows for a relation of power between two people, one exerts control on the other. And then build theory that is invariant to all possible relations of power and therefore all possible technologies we may invent in the future.

Toy example

I wished to build theory for how institutions can hold very powerful tech. Here’s a toy example I thought of.

A magical orb with following properties Wearer can wear it for 10 seconds, and instantaneously make a wish that destroys all other orbs still being formed in the universe. Wearer can wear it for 60 seconds, and become an all-knowing all-intelligent all-capable god. Also assume: Anybody anywhere in the universe can start chanting (publicly known) prayers for 60 seconds to create a new orb All chanting is audible to everyone in the universe

Challenge Design a stable social technology that uses an orb to ensure no other orb exists in the universe, but also ensure this orb doesn’t fall to someone who become a god.

Why define the problem this way?

Even when defining the problem I realised I had to make more assumptions that I initially thought I did.

I had to assume all chanting is audible to everyone, to sidestep the problem of who controls surveillance tech and how powerful is it?

I had to assume anybody can chant, to sidestep the problem of who can chant, and also not get into pre-emptive solutions such as keeping the prayers a secret. Note also that I considered prayers as the way to create orbs, and not some existing physical resource. This is because physical resources also allow for solutions that control their supply chain, which has more complexity (such enriched uranium for nukes or compute governance for AGI). I did not want those kinds of solutions; I only wanted solutions actually using the orb.

How to solve this problem?

Clearly the value of ensuring the orb is safe exceeds the value of lives of those in the bureaucracy built around the orb (assuming a bureaucracy is built). Hence I will casually refer to death threats and deaths as a form of control. In an ideal world where it is possible, I would prefer less coercive or destructive ways to design the bureaucracy - this is again just a way to remove complexity when attempting a first solution.

For starters you want one person using the orb. You also want people who can threaten to kill the person using the orb, if they wear it for more than 10 seconds at a time. You want the persons threatening to further be threatened if they don’t do this threatening, either by each other or by other people. You want a predictable second person to gain access to the orb if the first person attempts to misuse the orb and is killed. Such that you can now repeat the game theory as is, on the second person.

This attempt at a solution already contains a huge amount of complexity.

One possible way to have these threats is to give everyone guns. You now need to look at errors in firing, you need to look at errors in human’s behaviours (such as people who panic or are asleep and don’t fire). You need to look at the physical positions of the people, they are no longer agents, but agents with three-dimensional locations. You need line of sight for people to be able to surveill each other’s actions. You need ways to safely rotate shifts as people can’t be awake 24x7.

You also need mechanisms for people outside this room and society at large, to trust what happens inside is legitimate and safe, and not be willing or capable to barge in or blow up the room.

Better technology can maybe get better solutions here. A computer could fire shots more reliably than a human. A camera could avoid the requirement for line of sight. But that just introduces more complexity, such as how you will replace the cameras, who controls the bureaucracy that manufactures the cameras and how you’ll ensure they never run out, what if cameras have errors, and so on.

And these problems might be solveable but they’re high complexity which is what I was trying to avoid. Even though the orb has complexity abstracted away, I need to use technologies besides the orb to secure it, and this reintroduces all the complexity of very specific technologies.