The Rival AI Deployment Problem: a Pre-deployment Agreement as the least-bad response

HaydnBelfield

Introduction: The rival AI deployment problem

Imagine an actor is faced with highly convincing evidence that with high probability (over 75%) a rival actor will be capable within two years of deploying advanced AI. Assume that they are concerned that such deployment might threaten their values or interests. What could the first actor do? Let us call this the ‘rival AI deployment problem’. Three responses present themselves: acquiescence, an agreement before deployment, and the threat of coercive action.

Acquiescence is inaction, and acceptance that the rival actor will deploy. It does not risk conflict, but does risk unilateral deployment, and therefore suboptimal safety precautions, misuse or value lock-in. An agreement before deployment (such as a treaty between states) would be an agreement on when and how advanced AI could be developed and deployed: for example, requirements on alignment and safety tests, and restrictions on uses/goals. We can think of this as a ‘Short Reflection’ - a negotiation on what uses/goals major states can agree to give advanced AI. This avoids unilateral deployment and conflict, but it may be difficult for rival actors to agree, and any agreement faces the credible commitment problem of sufficiently reassuring the actors that the agreement is being followed. Threat of coercive action involves threatening the rival actor with setbacks (such as state sanctions or cyberattacks) to delay or deter the development program. It is unilaterally achievable, but risks unintended escalation and conflict. All three responses have positives and negatives. However, I will suggest a pre-deployment agreement may be the least-bad option.

The rival AI deployment problem can be thought of as the flipside of (or an addendum to) what Karnofsky and Muehlhauser call the ‘AI deployment problem’: “How do we hope an AI lab - or government - would handle various hypothetical situations in which they are nearing the development of transformative AI?”. Similarly, OpenAI made a commitment in its Charter to “stop competing with and start assisting” any project that “comes close to building” advanced AI for example with “a better-than-even chance of success in the next two years”. The Short Reflection can be thought of as an addendum to the Long Reflection, as suggested by MacAskill and Ord.

Four assumptions

I make four assumptions.

First, I roughly assume a ‘classic’ scenario of discontinuous deployment of a singular AGI system, of the type discussed in Life 3.0, Superintelligence and Yudkowsky’s writings. Personally, more of a continuous Christiano-style take-off seems more plausible to me, and more of a distributed Drexler-style Comprehensive AI Services seems preferable to me. But the discontinuous, singular scenario makes the tensions sharper and clearer, so that is what I will use.

Second, I roughly assume that states are the key players, as opposed to sustained academic or corporate control over an advanced AI development and/or deployment project. Personally, state control of this strategically important technology/project seems more plausible to me. In any case, state control again makes the tensions sharper and clearer. I distinguish between development and deployment. By ‘deployment’ I mean something like ‘use in a way that affects the world’ materially, economically, or politically. This includes both ‘starting a training run that will likely result in advanced AI’ and ‘releasing some system from a closed-off environment or implementing its recommendations’.

Third, I assume that some states may be concerned about deployment by a rival state. They might not necessarily be concerned. Almost all current policymakers are not particularly aware of, or concerned about, advanced AI. This is quite reasonable, given their context and the significant uncertainty over the future development of advanced AI. However, over the coming decades it could be that some states could come to view the deployment by some other states of sufficiently advanced AI as threatening, for example to their economies, values or prestige. Their interests, sovereignty and status are typically key concerns for major states.

Fourth, I assume states have knowledge of their rival’s program. Of course, states may lack crucial information - they may not realise how close their rivals are. There are many cases of states having inaccurate information. The USA was surprised by how quickly the USSR gained a nuclear weapon in the 1940s, and the West’s failure to discover the Soviet biological weapons program in the 1970s and 80s has been described as the greatest intelligence failure of the Cold War. On the other hand, the US overestimated how close the Nazis were to a bomb, and how many missiles the USSR had in the late 1950s (the imagined ‘missile gap’). This type of intelligence failure may be less likely now, given improvements in cyber espionage. Nevertheless, the USA seemed surprised in recent years by Chinese hypersonic missiles and silo-building. An AI development project could be hidden in data centres/companies running many other systems, progress could be quicker than intelligence communities can operate, and intelligence analysts may overconfidently rely on cyberespionage or not pick up on crucial signs due to lack of specialist knowledge.

In any case, I will assume for the sake of this analysis a fairly discontinuous scenario with states leading development, and that other states are concerned about their rivals and know with high confidence the state of their rival’s development program. What are their options - what are possible responses to the rival AI deployment problem?

Responses

1. Acquiescence

The first response is acceptance that the other side will deploy their capability, that there is nothing much that one can do. Perhaps one will attempt to persuade or cajole the deployer to not deploy, or to limit how it deploys, but not take much action beyond that. This is likely to be the response of non-’Great Power’ countries - low and middle-income countries, etc. Does this also apply to ‘great powers’ or peer competitors - for example the P5?

This has typically been thought of as unlikely – major states are not generally comfortable with any (perceived) threat to their values or interests. They often (over)react to perceived threats to their status, security or economies. This may apply even more so for great powers and hegemons. For example, Allison’s The Thucydides Trap argues that challenges to, and transfers of, hegemony are often characterized by conflict. He argues that 12/16 cases ended in war – from Sparta and Athens, to Imperial Germany and the British Empire. A leading state today is unlikely to feel neutral about the deployment of advanced AI.

However, there are a few historical cases of the acquiescence scenario occurring. Most prominently is the USSR over the 1989-1991 period, when it accepted the end of its existence without a major external or civil war. Other examples of states accepting the end of their survival as a unified entity include the postwar decolonisations – many of which did not feature major wars from the imperial power. The British, French and Dutch Empires essentially ceased to exist. Examples of states accepting the transfer of hegemony include the transfer from British to US hegemony in the mid 20^th century (from Pax Britannica to Pax Americana). At a more granular level, there are also examples of states accepting their rivals acquiring significant new technologies and weapons capabilities without conflict. For example, the USA did not strike the USSR in the 1940s and 1950s in the two periods when it had the nuclear bomb, and then the thermonuclear bomb, and the USSR did not (see e.g. Tannenwald on the nuclear taboo). US R&D into missile defence did not provoke conflict with the USSR/Russia (though this could be because after 40 years and $40bn, it doesn’t work).

2. Agreement before deployment

If a rival state will be capable of deploying advanced AI in two years, another response to the rival deployment problem could be negotiation – coming to some agreement as to when and how that rival will deploy that AI. This agreement could be between two leading states, most/all of the major states, or indeed all states – perhaps through the UN.

2.1 Agreement on Alignment and Uses/Goals – the Short Reflection

An agreement could have two key clauses. First, no deployment until provably aligned. Second, some agreement on the uses to which this advanced AI would be put, the goals which this advanced AI system would pursue.

The first clause may be broadly acceptable, as no-one (apart from an omnicidal few) desires the ‘paperclip’ scenario of a misaligned AGI, though this clause needs more clarity on how provable alignment can and should be demonstrated. The long-term AI alignment and governance communities would be especially keen on an alignment clause. We have paid far less attention to the second, on uses and goals. But this is to ignore that many states may have strong preferences over the form of deployment by a rival state.

On the one hand, some uses/goals seem relatively uncontroversial and broadly acceptable, such as medical science research (eg drug discovery, ‘solving cancer’ etc), clean energy research (eg better solar, or fusion) or goods and services such as new animated films. On the other hand, some uses/goals are more controversial and broadly unacceptable, such as those promoting a particular narrow ideology or set of values. To offer an extreme caricature just to make the point, totalitarian ideologies like Nazism or despotic ideologies like “Worship the Kims” would not be acceptable to the vast majority of people. Deployment for such ends would be viewed as undesirable, much like the deployment of technically misaligned advanced AI.

Some uses/goals might be somewhere between the two. This especially applies to ‘dual-use’ research. Research on fundamental physics, for example, could give us new insight into the universe and new energy sources, but could also discover new weapons technologies. Transhumanism, human enhancement, uploading, and digital personhood all seem in the middle too. Unfortunately, “prevent global catastrophic and existential risks” - which to the alignment community might be a primary use - is likely to be controversial to states too, insofar as it could affect security (affecting nuclear and conventional forces) and sovereignty (surveillance and interference). Even something seemingly anodyne like asteroid deflection could be dual-use.

My purpose here is not to suggest which particular uses, purposes and goals should be in such an agreement. It is merely to suggest that such an agreement is likely to be needed, and seems achievable, though it may be complex and time-consuming to reach. One overall path that is sometimes discussed in the long-term AI alignment and governance communities is:

AGI development -> deployment -> long reflection.

I am suggesting that some of the work of the long reflection will have to be done before deployment. Let us call that the ‘short reflection’: a negotiation and agreement on some of the uses, purposes and goals of an AGI. The path is then:

AGI development -> short reflection + agreement -> deployment -> long reflection.

2.2 The credible commitment problem, monitoring and verification possibilities

One key question is whether a deploying state can make a credible commitment to abide by whatever agreement is reached. States seem unlikely to just take these commitments on trust – that is more the ‘acquiescence’ response. If they cannot trust and verify that these commitments are being followed, then we are back to coercive action and dangerous escalation. What assurances can be given?

Some assurances could be at the training data level – perhaps another state could have access to training data and programs to ensure e.g. that it does not contain lots of fundamental physics papers, or is not playing thousands of war games against itself. Another assurance could be at the compute level – preapproval, or a ‘heads up’ advance warning, of big experiments and training runs. Perhaps there could be something equivalent to permissive action links (PALs) to physically require at least two people to begin some usage of compute. One could even have the equivalent of international PALs, where two states mutually give one another permissions to prevent the beginning of some usage, or stop it in progress. There are many other possibilities that could be explored, such as leveraging trusted hardware, tamper-proof hardware, zero-knowledge proofs, shared secrets, or partitioned training runs between rivals.

Other assurances include more traditional methods. An agreement would likely require extensive monitoring and verification.

Let us consider two forms a development project could take: a central world project, and monitoring of national projects. A central world project might function like a joint research project – like the International Space Station, CERN or ITER (see Fischer). These are big, costly, capital-intensive centralized projects – perhaps most likely as an analogy if advanced AI development requires substantial amounts of compute, such that pooling of financial and material (e.g. chips) resources is appealing. This has advantages for credible commitments – other states have unfettered access to the development program, and indeed are mutually governing it.

But perhaps more likely is mutual monitoring of national projects. Such monitoring around nuclear weapons, and indeed chemical weapons, is (surprisingly?) extensive and intrusive. We can learn several lessons from successful intrusive arms control regimes like Nunn-Lugar Cooperative Threat Reduction, the Chemical Weapons Convention and New START.

Two main targets for monitoring suggest themselves: compute and ‘talent’ (AI experts), though data might also be a possibility. One could monitor a developer state’s compute stocks (where are the big data centres, what chips do they have) and flows (how many chips are they producing and importing, where are those going, etc). One could also monitor what that compute is being used for – are there projects using above some threshold of compute, and if so for what purpose? This could involve on-site inspections and satellite/plane surveillance of facilities, construction and heat signatures for electrical consumption. Sensors and cameras could be placed in data centres, or monitoring programs in software, to monitor compute and energy usage – both amounts and purposes.

One could also monitor a developer state’s talent stocks (who’s in what groups) and flows (how many new ML PhDs are being trained, how many are immigrating to the country, etc). One could also monitor what those people are working on. This could be more intrusive to people’s privacy than compute monitoring, but note that similar monitoring has occurred e.g. for Soviet scientists after the Cold War. Tools in this case could involve interviews with members of the national program, to catch inconsistencies between stories and other data. There could also be some protection of whistleblowers – ideally at a national level, but more likely rivals sheltering such whistleblowers. It is through whistleblowers that the Soviet biological program (Alibek) and the US cyber programs (Snowden, Sanger’s contacts) came to light.

It is unclear to me whether the credible commitment problem can be addressed sufficiently – significantly more research is needed.

3. (Threat of) coercive action

Faced with the rival deployment problem, and a possible or perceived threat to state or regime sovereignty and status, a state might be tempted to respond with the threat of coercive action. Three possible options present themselves: sanctions, clandestine operations, and further escalation. I will discuss each in turn. The major problem is that it may be hard to avoid inadvertent, unintended escalation to greater threats, raising international tensions and risking conflict.

Throughout I will return to the example of Iran’s attempt to acquire a nuclear weapon, which has long been regarded as an unacceptable risk by the international community. This is especially true of Israel: successive Israeli governments have viewed the prospect as a major (indeed in this case existential) risk to Israel, as part of the wider ‘Begin doctrine’. This example neatly demonstrates the three possible options. Iran was threatened with sanctions, clandestine operations, and further escalation.

3.1 Sanctions

One option is sanctions on the developing state. The sanctions regime has grown significantly in the last twenty years, with extensive anti-terrorist financing and anti-money laundering systems put in place (and with Western states’ decreased willingness to engage in humanitarian intervention). Sanctions might be financial, trade-related or digital. Many of these options will be much more familiar to readers now, following the Russian invasion of Ukraine and the coordinated global response.

Financial sanctions target the financial system. Banks and other financial institutions are blocked from loaning to or handling the transactions of sanctioned individuals or companies. For example, many financial transactions are settled through SWIFT. The USA is able to lock particular companies and states out of this system. The status of foreign exchange reserves is important – for example Russia and China have very large reserves. In our case, sanctions could be brought against particular companies in the AI supply chain (fabs, cloud providers, etc) or more generally against the entire economy.

Trade sanctions have some overlaps, as often it is importer/exporters access to the financial system through which trade sanctions are enforced. In our case, trade sanctions could include export controls, either narrowly focused on semiconductors and rare earths, or more generally on many consumer and industrial goods.

Finally, we should consider a novel form of sanctions that might be relevant to our case. We can call these ‘digital sanctions’. Digital sanctions could impede a developer state’s access to compute or data. For example, three US cloud providers have at least 60% of the global cloud market. The US Government could compel cloud providers to not offer services to particular companies. States could also affect data flows. For example, many states have ‘data nationalisation’ requirements to compel tech companies to process their citizens’ data in their own territories. These could be strengthened to limit the flow of data to a developer state. At the limit, this could involve disabling or cutting undersea cables to the rest of the world (though this will be less of a concern with the increase in satellite internet, and the possibility of ‘splinternet’).

There are ways around all these sanctions, and recalcitrant states have often been willing to endure years of pain – there is a vibrant debate about whether ‘sanctions work’. Sanctions are likely to be less effective the more powerful and rich the state is. A developer state could seek to rely on its own financial, digital and trade networks and those of its allies. Sanctions are more likely to delay rather than prevent a determined adversary. Sanctions might not work – and if they do not, might escalate to further coercive action.

Also we should note this option is mainly available to the USA and its allies: as the controller of the world’s reserve currency - the dollar - and home of major internet companies and ICANN. However, some limited sanctions might be possible from China or a wider international coalition against the USA – especially trade sanctions.

Sanctions were used by the Obama Administration and its allies to encourage Iran to agree to the ‘Iran Deal’ or JCPOA – stopping its nuclear program in exchange for sanctions relief. The sanctions were later partly reimposed by the USA after the Trump Administration pulled out of the Iran Deal. The Biden Administration and allies

3.2 Clandestine operation

A second option is a clandestine operation. This could involve tradecraft (human spies, special forces, etc) or offensive cyber operations. Ben Buchanan offers three categories of cyber attacks that can be usefully extended to clandestine operations in general: espionage (stealing information), sabotage (degrading some capability) and destabilisation (such as the Russian interference in the 2016 US election).

Espionage could involve exfiltrating (copying, stealing) the model/system, the code itself – so as to deploy it oneself, ideally before one’s rival. This could be done through human assets (defections, stealing plans like Fuchs, etc) or cyber assets. Sabotage targets could include servers in data centres, power plants or grids that power them, or data poisoning attacks on the training data. Destabilisation - through the use of dis/misinformation or targeted leaks - could target a rival's population to undermine state funding for a project or the organisational leadership of the project. Both sabotage and destabilisation would arguably only delay rather than prevent a determined adversary. Again, cyber operations might not work – and if they do not, might escalate to further coercive action.

This option would be analogous to the USA and Israel’s ‘Olympic Games’/Stuxnet cyber sabotage attack. In the late 2000s the worm targeted the Iranian nuclear program, successfully destroying a significant percentage of Iran’s centrifuge and delaying their enrichment program. The USA also may have interfered in North Korea’s missile launches. In both cases, this may only have delayed the rival states. (Luckily, there is no evidence yet of great powers interfering in one another’s nuclear systems.)

3.3 Further escalation

Further escalatory steps could involve the threat of conventional military strikes on infrastructure. For example, Israel carried out a strike on a possible Syrian nuclear program, and has threatened to strike the Iranian program.

Conclusion

In this short piece, I have argued that if the deployment of advanced AI by one state is viewed as threatening by another state, then that state is therefore faced with the rival deployment problem – what should they do faced with this possible challenge to their values and interests? Acquiesence seems unlikely, and threatening a coercive response could be catastrophic.

Given those two constraints, while it may seem unlikely at this stage, a predeployment agreement might be the least-bad option – and at least worthy of more study and reflection. In particular, more research should be done into possible clauses in a pre-deployment agreement, and into possibilities for AI development monitoring and verification.

We can distinguish between naive and sophisticated partisan/booster strategies. A naive ‘partisan’/booster approach is to try to get your chosen side to ‘win’ some race. But this is not free - the other side gets a say. That makes this policy dangerous. This is because the ‘losing’ side could try to coerce the ‘winning’ side with threats, which could inadvertently escalate to conflict. A more sophisticated version still seeks to support a particular state (for a variety of reasons including retaining negotiating leverage or having a fallback option) but seeks a different ultimate strategy. I am not saying that people should not have preferred states, or that states should not now be investing in their capabilities, merely that attempting to ‘win an endgame’ is a dangerous crunch-time strategy.

Promising solutions might draw on all three responses. In the Cold War, arms control and deterrence (arguably) acted in reinforcing, supportive ways to prevent nuclear war. Unilateral deployment could be disastrous, locking-in one set of values or sparking conflict. Deterrence of unilateral deployment through extremely careful escalatory threats could prompt states back to the pre-deployment negotiations. And finally, any multilaterally agreed deployment is likely to have some degree of acquiescence and trust.

OscarD🔸Oct 9 20223

I agree that reaching a pre-deployment treaty is the best option (indeed, this seems outright good rather than just the least bad to me)

You touched on this in the conclusion, but I feel it is a sufficiently important point to foreground more: I think the threat (perhaps implicit) of violence is a key element of successful negotiation and agreement-building. You didn't much discuss why the deployer state would agree to be constrained by an agreement in section 2, but it seems to me the strongest reason for them to do so would be the credible threat of being invaded if they do not. I think such a threat would be credible, as even if the deployer state is the pre-eminent global power at that time, as seems fairly likely, the harm to other nations' interests of idle acquiescence could be immense. Thus, rival countries would plausibly be highly motivated to band together to fight a rogue deployer nation before the AI is deployed and (putatively) invasion becomes impossible because the deployer has such an impressive lead in military and other technology.

I think given this credible threat, the deployer would likely be willing to negotiate in good faith, in which case your credible commitment issue comes to the fore.

There was an incomplete sentence at the end of 3.1 "The Biden Administration and allies "

Effective Altruism Forum
EA Forum