Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”

Matthew Wearden

This is a linkpost for https://matthewwearden.co.uk/2023/10/30/response-to-coordinated-pausing-an-evaluation-based-coordination-scheme-for-frontier-ai-developers/

Watercolour image of a frontier AI lab, bathed in warm sunlight. People are working together to discuss pausing of frontier AI research — Generated by DALL-E 3

Note: this is a x-post from my blog, Thoughts on AI , where I discuss a variety of topics in AI governance, particularly corporate governance.

Introduction

The corporate governance team at the Centre for the Governance of AI recently published a great paper, “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”, authored by Jide Alaga and Jonas Schuett. The following post contains a set of responses and comments to the paper - all such responses are based on personal insights and opinions that I have on the content that I hope add to the conversation. Any negative tone that may come across in the post does not represent my feelings on the paper overall - I think it is a fantastic, practical piece of research that I hope is read by policymakers both in frontier AI labs and governments.

A Note on Good Intentions

It may appear that some of my comments take a rather pessimistic view of frontier AI labs and their interests. In truth, I believe that many of these labs are full of individuals genuinely trying to do the right thing, who are aware of the risks they are dealing with. In my mind, this good faith should be given to almost any individual working at a frontier lab, but it absolutely should not be extended to the labs themselves. Any such organisation exists in an environment that strongly rewards certain types of decision making, and a collection of entirely justifiable, well-meant decisions can still lead to very bad outcomes. Good governance should not rely on the good intentions of organisations, but instead seek to make the exercising of those good intentions as likely as possible to align with good outcomes, whilst making any bad intentions as painful and cumbersome as possible to execute on.

Limitations of the Mutual Auditor Model

The main area where my current opinions disagree with those of the authors are on the efficacy and feasibility of the Mutual Auditor model in the paper. There are two key disagreements presented below.

It is unlikely that there will be a single auditor for the industry

Many of the strengths of the mutual auditor model lie in the coordination that is possible due to all frontier AI labs using a single auditor. This is a scenario that I believe is very unlikely to exist in practice, primarily because auditing is a profitable industry, with both demand and space for multiple organisations to enter the market.

Unless there are legal requirements to use one auditor, a frontier lab will seek to find any organisation that can a) evaluate their systems sufficiently to demonstrate they have found a reasonable auditor, and b) be loose enough with their audits that the lab’s own opinions on the safety of their models can heavily influence the outcome of the audit. This incentive mechanism has been shown through many other industries to be enough to attract new market entrants, and there is no compelling reason I can find to believe why this wouldn’t be true of frontier AI research. Amongst others, the Big Four are known to build teams of experts ready to enter new auditing markets in a variety of technical fields.

Given that there is no longer one single auditor, the coordinated pausing model breaks down. Agreements on the terms of coordinated pausing would need to be established between auditing firms, and there is no reason to assume that these would be sufficiently cautious to prevent the severe risk scenarios that the paper is intending to address. In such a world, a new race to the bottom may well begin between the auditors as they seek to attract firms away from their competitors.

There are two things that I can currently imagine would change my mind about this:

If I were to see examples of other industries that were largely audited by one or two firms, I would be much more optimistic that the single auditor model is feasible
If there were to be a set of practical and sound policies that could be implemented between multiple auditing firms, I would be much more convinced that the mutual auditor model could still work with multiple auditors in the market.

Auditors implementing pauses face too many legal risks

Any auditor that is asking frontier labs to pause model deployments, or even research, will face significant legal challenges from their clients if they attempt to enforce such policies. Organisations that attempt to restrict the competitiveness of private firms without very strong grounds for doing so may be held liable for the loss of profit they cause. Any pause will be met with strong challenges for why it was started, as well as challenges against the conditions for its ending. This can be seen often in the finance industry, with lengthy and expensive legal battles ensuing. This disincentivises an auditor to implement such pauses, decreasing their efficacy significantly.

There are significant legal questions to be answered here, and I am not qualified to give opinions here. I would be enthusiastic to see research demonstrating that this issue is less important than I currently believe it to be.

Limitations of the Voluntary Pausing and Pausing Agreement Models

I would first like to state that I agree with the authors of the paper that both the Voluntary Pausing and Pausing Agreement models are valuable intermediate steps to longer term solutions. However, there are a couple of limitations of the models that I don’t believe were addressed in the original paper and I would like to mention here.

Private Deployments

One issue with both of these models is that they do not sufficiently address the risks posed by the private deployments of dangerous models to individual clients of the frontier labs. As such deals are likely to be considered confidential, proprietary information, the enforcers of pauses in either model (the public and other AI labs) are unlikely to be aware of such deployments. Though I do not have financial information to back the following claim up, I think that such private deployments are likely to constitute major proportions of the revenue of frontier labs. As such, the incentives to make risky decisions are higher in such deals.

These risks would be less salient to me if regulations were introduced to force the disclosure of such private deployments to the public, or data showed that private deployments constituted much smaller proportions of revenue than I had imagined.

Contractual Penalties

Similar to the point raised above about auditors enforcing penalties and the anti-trust issues described in the paper, I have strong concerns over the efficacy of contractual penalties as they are described in the Pausing Agreements model. My main concern is that there are very few business incentives for AI firms to maintain and uphold such contractual agreements with their competitors. The point I made at the start of this post about good intentions is relevant here - as much as everyone at these companies wants to do the right thing, this is too often overridden by the nature of the environment they operate in - organisations that are at all driven by the incentives of capital markets cannot be relied on to self regulate when the dangers are so great. Most likely, disagreements on the terms of any such contractual penalties will inevitably arise and they will be quietly dropped to avoid reproach by authorities enforcing antitrust laws.

I am more optimistic about a model where enforcement comes through pausing or revoking memberships to industry bodies, where the enforcer is a related but separate entity to any of its individual members. Work to establish and deepen the relationships between a frontier AI industry body and its members would be valuable.

Observations on Key Research Areas

I am strongly encouraged by the work in this paper that there are feasible options for implementing a coordinated pausing policy in the future. From the paper, I see a few key research areas that require prioritisation before any such policy could be implemented, which I thought were worth listing below.

Model Evaluations

It hardly needs pointing out, but development of effective model evaluation methodologies are a fundamental requirement to the development of any pausing policies. For this and many other reasons, evaluations research and threshold definitions are a must for the industry.

Model Similarity and Research Relevance

For any pauses to be implemented, measures of model similarity must be created. Without them, it will be impossible to define what work at labs needs to be paused. This is probably the single largest bottleneck besides model evaluation research to any such policies being implemented.

Legal Research

Any enforcement of a pause is likely to be met with legal challenge, even targeted against regulators. Research into relevant case studies from other industries, as well as research into the development of strongly binding contracts will be extremely valuable going into the future.

Incident Reporting Schemes

In order for coordinated pausing strategies to work successfully, risk incidents must be correctly identified and reported to relevant organisations. Work to develop practical incident reporting, whistleblowing and safe harbour schemes should be developed as a priority to enable this.

Model Registers and Disclosure Requirements

One key requirement for the success of a pausing policy is the development of model registers. These registers should categorise models by their capabilities, architecture and deployment, and are ideally coordinated by regulators that can enforce disclosure requirements, especially at the pre-training and pre-deployment stages. Specific, practical policy proposals for disclosure and notification schemes should be considered a high priority, as should work to build infrastructure for a register of models and their capabilities.

Open-Sourcing Regulation

Once models become open sourced, work done to restrict their usage will become almost entirely useless. Further research into policy proposals to prevent the open sourcing of frontier models will be important for ensuring that the regulation of proprietary models remains relevant.

Corporate Governance

For pauses to effectively be implemented within organisations, strong corporate governance structures need to be developed. Without this, it is possible that research may be conducted despite the formal position of the company, potentially still leading to dangerous outcomes.

SummaryBot1y1

Executive summary: The post responds to a paper on coordinated pausing for AI labs, arguing it has limitations around feasibility of a single auditor, legal risks of pausing, and issues with voluntary and contractual approaches. It suggests key research areas like evaluations, similarity measures, legal issues, reporting schemes, registers, and governance.

Key points:

A single mutual auditor for all AI labs is unlikely; competition means multiple auditors, undermining coordination.
Auditors face legal risks trying to enforce pauses, disincentivizing this.
Voluntary and contractual pausing have loopholes around private deployments and weak incentives.
Key research areas include evaluations, model similarity measures, legal issues, incident reporting, model registers and disclosure, preventing open sourcing, and corporate governance.
The paper has good intentions but organizational incentives often override individual intentions.
Intermediate voluntary and contractual approaches are positive steps but have limitations.
Strong industry governance is needed for policies like pausing to work.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Effective Altruism Forum
EA Forum