Hide table of contents
Watercolour image of a frontier AI lab, bathed in warm sunlight. People are working together to discuss pausing of frontier AI research
Generated by DALL-E 3

Note: this is a x-post from my blog, Thoughts on AI , where I discuss a variety of topics in AI governance, particularly corporate governance.

Introduction

The corporate governance team at the Centre for the Governance of AI recently published a great paper, “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”, authored by Jide Alaga and Jonas Schuett. The following post contains a set of responses and comments to the paper - all such responses are based on personal insights and opinions that I have on the content that I hope add to the conversation. Any negative tone that may come across in the post does not represent my feelings on the paper overall - I think it is a fantastic, practical piece of research that I hope is read by policymakers both in frontier AI labs and governments.

A Note on Good Intentions

It may appear that some of my comments take a rather pessimistic view of frontier AI labs and their interests. In truth, I believe that many of these labs are full of individuals genuinely trying to do the right thing, who are aware of the risks they are dealing with. In my mind, this good faith should be given to almost any individual working at a frontier lab, but it absolutely should not be extended to the labs themselves. Any such organisation exists in an environment that strongly rewards certain types of decision making, and a collection of entirely justifiable, well-meant decisions can still lead to very bad outcomes. Good governance should not rely on the good intentions of organisations, but instead seek to make the exercising of those good intentions as likely as possible to align with good outcomes, whilst making any bad intentions as painful and cumbersome as possible to execute on.

 

Limitations of the Mutual Auditor Model

The main area where my current opinions disagree with those of the authors are on the efficacy and feasibility of the Mutual Auditor model in the paper. There are two key disagreements presented below.

It is unlikely that there will be a single auditor for the industry

Many of the strengths of the mutual auditor model lie in the coordination that is possible due to all frontier AI labs using a single auditor. This is a scenario that I believe is very unlikely to exist in practice, primarily because auditing is a profitable industry, with both demand and space for multiple organisations to enter the market.


 

Unless there are legal requirements to use one auditor, a frontier lab will seek to find any organisation that can a) evaluate their systems sufficiently to demonstrate they have found a reasonable auditor, and b) be loose enough with their audits that the lab’s own opinions on the safety of their models can heavily influence the outcome of the audit. This incentive mechanism has been shown through many other industries to be enough to attract new market entrants, and there is no compelling reason I can find to believe why this wouldn’t be true of frontier AI research. Amongst others, the Big Four are known to build teams of experts ready to enter new auditing markets in a variety of technical fields.


 

Given that there is no longer one single auditor, the coordinated pausing model breaks down. Agreements on the terms of coordinated pausing would need to be established between auditing firms, and there is no reason to assume that these would be sufficiently cautious to prevent the severe risk scenarios that the paper is intending to address. In such a world, a new race to the bottom may well begin between the auditors as they seek to attract firms away from their competitors.

 

There are two things that I can currently imagine would change my mind about this:

  • If I were to see examples of other industries that were largely audited by one or two firms, I would be much more optimistic that the single auditor model is feasible
  • If there were to be a set of practical and sound policies that could be implemented between multiple auditing firms, I would be much more convinced that the mutual auditor model could still work with multiple auditors in the market.

Any auditor that is asking frontier labs to pause model deployments, or even research, will face significant legal challenges from their clients if they attempt to enforce such policies. Organisations that attempt to restrict the competitiveness of private firms without very strong grounds for doing so may be held liable for the loss of profit they cause. Any pause will be met with strong challenges for why it was started, as well as challenges against the conditions for its ending. This can be seen often in the finance industry, with lengthy and expensive legal battles ensuing. This disincentivises an auditor to implement such pauses, decreasing their efficacy significantly.

 

There are significant legal questions to be answered here, and I am not qualified to give opinions here. I would be enthusiastic to see research demonstrating that this issue is less important than I currently believe it to be.

 

Limitations of the Voluntary Pausing and Pausing Agreement Models

I would first like to state that I agree with the authors of the paper that both the Voluntary Pausing and Pausing Agreement models are valuable intermediate steps to longer term solutions. However, there are a couple of limitations of the models that I don’t believe were addressed in the original paper and I would like to mention here. 

Private Deployments

One issue with both of these models is that they do not sufficiently address the risks posed by the private deployments of dangerous models to individual clients of the frontier labs. As such deals are likely to be considered confidential, proprietary information, the enforcers of pauses in either model (the public and other AI labs) are unlikely to be aware of such deployments. Though I do not have financial information to back the following claim up, I think that such private deployments are likely to constitute major proportions of the revenue of frontier labs. As such, the incentives to make risky decisions are higher in such deals.

 

These risks would be less salient to me if regulations were introduced to force the disclosure of such private deployments to the public, or data showed that private deployments constituted much smaller proportions of revenue than I had imagined.

Contractual Penalties

Similar to the point raised above about auditors enforcing penalties and the anti-trust issues described in the paper, I have strong concerns over the efficacy of contractual penalties as they are described in the Pausing Agreements model. My main concern is that there are very few business incentives for AI firms to maintain and uphold such contractual agreements with their competitors. The point I made at the start of this post about good intentions is relevant here - as much as everyone at these companies wants to do the right thing, this is too often overridden by the nature of the environment they operate in - organisations that are at all driven by the incentives of capital markets cannot be relied on to self regulate when the dangers are so great. Most likely, disagreements on the terms of any such contractual penalties will inevitably arise and they will be quietly dropped to avoid reproach by authorities enforcing antitrust laws.

 

I am more optimistic about a model where enforcement comes through pausing or revoking memberships to industry bodies, where the enforcer is a related but separate entity to any of its individual members. Work to establish and deepen the relationships between a frontier AI industry body and its members would be valuable.

 

Observations on Key Research Areas

I am strongly encouraged by the work in this paper that there are feasible options for implementing a coordinated pausing policy in the future. From the paper, I see a few key research areas that require prioritisation before any such policy could be implemented, which I thought were worth listing below.

Model Evaluations

It hardly needs pointing out, but development of effective model evaluation methodologies are a fundamental requirement to the development of any pausing policies. For this and many other reasons, evaluations research and threshold definitions are a must for the industry.

Model Similarity and Research Relevance

For any pauses to be implemented, measures of model similarity must be created. Without them, it will be impossible to define what work at labs needs to be paused. This is probably the single largest bottleneck besides model evaluation research to any such policies being implemented.

Any enforcement of a pause is likely to be met with legal challenge, even targeted against regulators. Research into relevant case studies from other industries, as well as research into the development of strongly binding contracts will be extremely valuable going into the future.

Incident Reporting Schemes

In order for coordinated pausing strategies to work successfully, risk incidents must be correctly identified and reported to relevant organisations. Work to develop practical incident reporting, whistleblowing and safe harbour schemes should be developed as a priority to enable this.

Model Registers and Disclosure Requirements

One key requirement for the success of a pausing policy is the development of model registers. These registers should categorise models by their capabilities, architecture and deployment, and are ideally coordinated by regulators that can enforce disclosure requirements, especially at the pre-training and pre-deployment stages. Specific, practical policy proposals for disclosure and notification schemes should be considered a high priority, as should work to build infrastructure for a register of models and their capabilities.

Open-Sourcing Regulation

Once models become open sourced, work done to restrict their usage will become almost entirely useless. Further research into policy proposals to prevent the open sourcing of frontier models will be important for ensuring that the regulation of proprietary models remains relevant.

Corporate Governance

For pauses to effectively be implemented within organisations, strong corporate governance structures need to be developed. Without this, it is possible that research may be conducted despite the formal position of the company, potentially still leading to dangerous outcomes.


 

Comments1


Sorted by Click to highlight new comments since:

Executive summary: The post responds to a paper on coordinated pausing for AI labs, arguing it has limitations around feasibility of a single auditor, legal risks of pausing, and issues with voluntary and contractual approaches. It suggests key research areas like evaluations, similarity measures, legal issues, reporting schemes, registers, and governance. 

Key points:

  1. A single mutual auditor for all AI labs is unlikely; competition means multiple auditors, undermining coordination.
  2. Auditors face legal risks trying to enforce pauses, disincentivizing this.
  3. Voluntary and contractual pausing have loopholes around private deployments and weak incentives.
  4. Key research areas include evaluations, model similarity measures, legal issues, incident reporting, model registers and disclosure, preventing open sourcing, and corporate governance.
  5. The paper has good intentions but organizational incentives often override individual intentions.
  6. Intermediate voluntary and contractual approaches are positive steps but have limitations.
  7. Strong industry governance is needed for policies like pausing to work. 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f
Recent opportunities in AI safety
20
Eva
· · 1m read