Hide table of contents

This post discusses a promising model for verifying compliance with international regulations on AI development. It is written by Damin Curtis & Alexander M. Wyckoff, and discusses a proposal by Yonadav Shavit. Crossposted on the AI Alignment Forum.


The Verification Problem

To cope with emerging security challenges, states will have to create new regulatory frameworks to reign in the development of dangerous AI models/capabilities. To be effective, any laws or agreements will need a credible verification mechanism behind them, yet how to create such a mechanism is an open technical/policy question. 

Frameworks to limit the proliferation of powerful weapons systems have been developed before, such as the International Atomic Energy Agency (IAEA) and the Nuclear Non-Proliferation Treaty (NPT). Through internationally agreed upon frameworks, inspections, and tracking of hazardous materials, the IAEA and NPT have successfully limited the development of nuclear WMD while allowing parties to develop useful technologies such as nuclear energy and biochemical research labs.

Developing a similar framework for verifying the peaceful development of AI models is, of course, difficult. The infrastructure and process of training safe versus unsafe models can be nearly identical, and unsafe protocols difficult to identify in training modules. What’s more, a method of verification must not endanger the privacy/intellectual property rights of the proving party (“the prover”). However, there is growing consensus that compute governance may be the keystone of AI governance, and there are strategic bottlenecks in the supply chain of compute-providing semiconductors that may allow for effective monitoring. 


Proposal Overview

A recent (2023) proposal for such a framework by Yonadav Shavit seems promising on all of these fronts, as outlined in his paper, What Does it Take to Catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training Via Compute Monitoring. (Note that the following proposal is not our own, it is the work of Yonadav Shavit. The wording in this paper, and the discussion surrounding it, is our own.)

Shavit proposes that all advanced chips be manufactured with hardware which saves occasional “snapshots” of the training transcripts running on the chip, which can then be viewed by inspectors. An international regulatory agency would inspect to ensure chips are not being trained for illegal/dangerous models, and simultaneously monitor chip supply chains and deliveries to keep informed on which parties are stockpiling enough chips to train large, new AI models. We believe this proposal holds promise as a potential underpinning of an (inter)national verification system for powerful AI model developments.


The proposed system works at three levels:

1. Chip level

In order to be approved for use in AI training infrastructure, new chips will be hard-wired to save log-weight snapshots of the code they’re running. These snapshots will be reported to regulators along with (hashes of) training transcripts, allowing regulators to verify whether the chips have been used for a rule-violating training run. Regulators can physically inspect random samples of chips to ensure that their snapshots were reported, and to ensure that tampering has not occurred to undermine the validity of remote snapshot taking & reporting.

2. Data Center level

The data center (prover) provides its training process transcript, allowing the prover & verifier to confirm that retraining with that transcript provides the same “logged weight-snapshot”, showing that the training is being done as claimed. The verifier then ensures that the training transcript’s properties are in line with agreed regulations; for example, making sure none of the algorithm code is relevant to achieving breaches of data security or other unlawful use cases.

3. Supply Chain level

Cutting-edge chip supply chains have highly concentrated bottlenecks/choke points. These concentrated providers will record where chips are being sold to, and report to the verifier. When doing checks, the verifier can randomly sample chips at training centers to ensure chips were acquired through legally reported sources/transactions, ensuring that none have been covertly acquired. Ensuring chips are not illegally sold would help prevent training runs from happening outside of this monitoring system. 



Reduces the Necessity of Unilateral Actions

In the absence of a verification mechanism, governments will increasingly deploy costly unilateral export restrictions to try and limit one another’s development of harmful models. For example, in 2022, the US enacted export controls to deny the People’s Republic of China access to advanced computer chips, in part to restrict development of AI models that might be put to harmful purposes (such as weapons development or human rights violations). This measure may not have been necessary if a mechanism had existed for the PRC to credibly commit to exclusively pursue models with agreed-upon capabilities, in which case China could continue with unobjectionable imports and the US would not have to restrict its own companies’ investments and sales to the PRC. This dynamic is similar to how the existence of nuclear verification systems allows more nations to build peaceful nuclear power plants without causing fear of weapons proliferation.  

Empowers safety/regulatory minimums, increasing security for all

This framework could also empower governments to enforce safety minimums. Without government-enforced safety precautions, companies are incentivized to skirt safety lest they fall behind their competitors. This also applies at the international level; as nations race to acquire advanced AI capabilities and compete for high-end research investments, governments are incentivized to “race to the bottom”, reducing regulations in order to speed up development and attract research lab investments. A verification framework could empower international agreements on safety minimums, setting a common floor for all and averting unsafe development norms. (Notably, such a development could slow the “race to the bottom” even if only a small number of countries agreed to join this verification framework. Today, countries have an incentive to sacrifice safety for speed due to lack of knowledge of whether their rivals have done the same. If even a few important countries joined the framework, those outside the framework could feel more confident in the safety behavior of those within, reducing their uncertainty and their incentive to preemptively cut down on safety.)

Respects privacy & sovereignty

This framework does not require AI developers to disclose sensitive, proprietary information about their training models, nor does it require monitoring of individual’s private computing devices (it only monitors large training centers). Participant countries must give their continued consent by allowing inspectors to access their chips’ snapshots and training process transcripts. A country or data center could refuse compliance at any point, which ensures this process respects national sovereignty. If the prover follows the verifier's steps, rule-violation is unlikely to go undetected. If the prover does not comply with verifiers, this will itself be cause for suspicion, as is the case with the nuclear facility verification processes. Participation should be driven by a universal interest in ensuring the international community upholds safety norms in advanced AI development. 


Limitations & Weaknesses

We conclude by noting some potential weaknesses of this proposal. These may also serve as ideas for further work on the subject of developing credible verification systems. Like with our security threats, we divide our framework’s weaknesses into three major areas of concern: 

1. Only Monitors Larger Compute Centers. 

This framework only calls for monitoring of large-scale compute clusters. This proposal also does not prevent the training of smaller models, which don’t require large quantities of compute to train. These may still have concerning capabilities such as facial recognition, weapons targeting, or misinformation risk. 

2. Pre-existing models and chips. 

This framework only prevents the training of new large models going forward; it does not prevent application of models that have already been created, nor does it necessarily monitor chips and data centers that were already on the market prior to the implementation of this framework. 

What’s more, this system only seeks to prevent the development-- not the propagation-- of illegal models. If a model were somehow created outside of this monitoring system, it could be easily copied and used. 

3. Dual capability. 

We are also concerned about the dual capability of some microchips, especially in the future. As AI computation becomes cheaper, microchips meant for applications such as in medical technology could be repurposed for training harmful AI models. This framework would need to evaluate not only the advertised training capabilities of chips, but also potential dual capability, ideally ensuring these programs are not illegally transferred to use at an unauthorized training center. 

The framework does partially account for this, as verifiers can randomly sample chips at known training centers to ensure that all chips were obtained from verified sources in recorded transactions.



We believe that this framework has potential as a means for verifying compliance with international regulations on AI development, empowering governments to pursue beneficial agreements on AI development and substantially improving the AI safety/governance landscape. We hope that this post increases the visibility of this proposal and sparks further discussion of its feasibility and improvement. 

Written by Damin Curtis & Alexander M. Wyckoff





More posts like this

Sorted by Click to highlight new comments since:

Cool post; researching these issues seems like one of the most important things in AI governance to me!

Some questions I have (for future research) are:

  1. How hard is it to distinguish approved from unapproved training runs with these snapshots that the chips would provide? Is this just about establishing that the length of the training run is below a certain threshold, or does it assess whether the training run follows a previously submitted recipe that was approved to be safe by an authorizing body? 
  2. How long would it take to implement these mechanisms at the hardware level, and who would have to be on board to make this happen? (E.g., if the US govt simply passed legislation that prohibited future chip innovation unless these machanisms are installed, would that be enough to get it done?)

Thanks for your comment & questions! These are great questions for further research. I don't know enough to comment on the first question. But as for the second, we're lucky that right now, the advanced chip supply chain has multiple tight bottlenecks, and is largely controlled by US-allied advanced democracies (Taiwan, Korea, Japan, Netherlands, UK, US, etc). This is part of why the US was able to effectively cut off China's access to obtaining the most advanced chips. So there is a window of opportunity, where the most important countries could agree to require their companies to implement this framework, and require certain buyers to comply with the framework as well. Countries generally can require their companies to manufacture a certain way, and can also set import/export restrictions on chips to ensure transactions are compliant. 

Curated and popular this week
Relevant opportunities