This paper is about (1) "government intervention" to protect "against the risks from frontier AI models" and (2) some particular proposed safety standards. It's by Markus Anderljung, Joslyn Barnhart (Google DeepMind), Jade Leung (OpenAI governance lead), Anton Korinek, Cullen O'Keefe (OpenAI), Jess Whittlestone, and 18 others.
Abstract
Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term “frontier AI” models — highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model’s capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models. Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them. We consider several options to this end, including granting enforcement powers to supervisory authorities and licensure regimes for frontier AI models. Finally, we propose an initial set of safety standards. These include conducting pre-deployment risk assessments; external scrutiny of model behavior; using risk assessments to inform deployment decisions; and monitoring and responding to new information about model capabilities and uses post-deployment. We hope this discussion contributes to the broader conversation on how to balance public safety risks and innovation benefits from advances at the frontier of AI development.
Executive Summary
The capabilities of today’s foundation models highlight both the promise and risks of rapid advances in AI. These models have demonstrated significant potential to benefit people in a wide range of fields, including education, medicine, and scientific research. At the same time, the risks posed by present-day models, coupled with forecasts of future AI progress, have rightfully stimulated calls for increased oversight and governance of AI across a range of policy issues. We focus on one such issue: the possibility that, as capabilities continue to advance, new foundation models could pose severe risks to public safety, be it via misuse or accident. Although there is ongoing debate about the nature and scope of these risks, we expect that government involvement will be required to ensure that such "frontier AI models” are harnessed in the public interest.
Three factors suggest that frontier AI development may be in need of targeted regulation: (1) Models may possess unexpected and difficult-to-detect dangerous capabilities; (2) Models deployed for broad use can be difficult to reliably control and to prevent from being used to cause harm; (3) Models may proliferate rapidly, enabling circumvention of safeguards.
Self-regulation is unlikely to provide sufficient protection against the risks from frontier AI models: government intervention will be needed. We explore options for such intervention. These include:
- Mechanisms to create and update safety standards for responsible frontier AI development and deployment. These should be developed via multi-stakeholder processes, and could include standards relevant to foundation models overall, not exclusive to frontier AI. These processes should facilitate rapid iteration to keep pace with the technology.
- Mechanisms to give regulators visibility into frontier AI development, such as disclosure regimes, monitoring processes, and whistleblower protections. These equip regulators with the information needed to address the appropriate regulatory targets and design effective tools for governing frontier AI. The information provided would pertain to qualifying frontier AI development processes, models, and applications.
- Mechanisms to ensure compliance with safety standards. Self-regulatory efforts, such as voluntary certification, may go some way toward ensuring compliance with safety standards by frontier AI model developers. However, this seems likely to be insufficient without government intervention, for example by empowering a supervisory authority to identify and sanction non-compliance; or by licensing the deployment and potentially the development of frontier AI. Designing these regimes to be well-balanced is a difficult challenge; we should be sensitive to the risks of overregulation and stymieing innovation on the one hand, and moving too slowly relative to the pace of AI progress on the other.
Next, we describe an initial set of safety standards that, if adopted, would provide some guardrails on the development and deployment of frontier AI models. Versions of these could also be adopted for current AI models to guard against a range of risks. We suggest that at minimum, safety standards for frontier AI development should include:
- Conducting thorough risk assessments informed by evaluations of dangerous capabilities and controllability. This would reduce the risk that deployed models possess unknown dangerous capabilities, or behave unpredictably and unreliably.
- Engaging external experts to apply independent scrutiny to models. External scrutiny of the safety and risk profile of models would both improve assessment rigor and foster accountability to the public interest.
- Following standardized protocols for how frontier AI models can be deployed based on their assessed risk. The results from risk assessments should determine whether and how the model is deployed, and what safeguards are put in place. This could range from deploying the model without restriction to not deploying it at all. In many cases, an intermediate option—deployment with appropriate safeguards (e.g., more post-training that makes the model more likely to avoid risky instructions)—may be appropriate.
- Monitoring and responding to new information on model capabilities. The assessed risk of deployed frontier AI models may change over time due to new information, and new post-deployment enhancement techniques. If significant information on model capabilities is discovered post-deployment, risk assessments should be repeated, and deployment safeguards updated.
Going forward, frontier AI models seem likely to warrant safety standards more stringent than those imposed on most other AI models, given the prospective risks they pose. Examples of such standards include: avoiding large jumps in capabilities between model generations; adopting state-of-the-art alignment techniques; and conducting pre-training risk assessments. Such practices are nascent today, and need further development.
The regulation of frontier AI should only be one part of a broader policy portfolio, addressing the wide range of risks and harms from AI, as well as AI’s benefits. Risks posed by current AI systems should be urgently addressed; frontier AI regulation would aim to complement and bolster these efforts, targeting a particular subset of resource-intensive AI efforts. While we remain uncertain about many aspects of the ideas in this paper, we hope it can contribute to a more informed and concrete discussion of how to better govern the risks of advanced AI systems while enabling the benefits of innovation to society.
Commentary
It is good that this paper exists. It's mostly good because it's a step (alongside Model evaluation for extreme risks) toward making good actions for AI labs and government more mainstream/legible. It's slightly good because of its (few) novel ideas; e.g. Figure 3 helps me think slightly more clearly. I don't recommend reading beyond the executive summary.
Unfortunately, this paper's proposals are unambitious (in contrast, in my opinion, to Model evaluation for extreme risks, which I unreservedly praised), such that I'm on-net disappointed in the authors (and may ask some if they agree it's unambitious and why it is). Some quotes below, but in short: it halfheartedly suggests licensing. It doesn't suggest government oversight of training runs or compute. It doesn't discuss when training runs should be stopped/paused (e.g., when model evaluations for dangerous capabilities raise flags). (It also doesn't say anything specific about international action but it's very reasonable for that to be out of scope.)
On licensing, it correctly notes that
Enforcement by supervisory authorities penalizes non-compliance after the fact. A more anticipatory, preventative approach to ensuring compliance is to require a governmental license to widely deploy a frontier AI model, and potentially to develop it as well.
But then it says:
Licensing is only warranted for the highest-risk AI activities, where evidence suggests potential risk of large-scale harm and other regulatory approaches appear inadequate. Imposing such measures on present-day AI systems could potentially create excessive regulatory burdens for AI developers which are not commensurate with the severity and scale of risks posed. However, if AI models begin having the potential to pose risks to public safety above a high threshold of severity, regulating such models similarly to other high-risk industries may become warranted.
Worse, on after-the-fact enforcement, it says:
Supervisory authorities could “name and shame” non-compliant developers. . . . The threat of significant administrative fines or civil penalties may provide a strong incentive for companies to ensure compliance with regulator guidance and best practices. For particularly egregious instances of non-compliance and harm ["For example, if a company repeatedly released frontier models that could significantly aid cybercriminal activity, resulting in billions of dollars worth of counterfactual damages, as a result of not complying with mandated standards and ignoring repeated explicit instructions from a regulator"], supervisory authorities could deny market access or consider more severe penalties [viz. "criminal sentences"].
This is overdeterminedly insufficient for safety. "Not complying with mandated standards and ignoring repeated explicit instructions from a regulator" should not be allowed to happen, because it might kill everyone. A single instance of noncompliance should not be allowed to happen, and requires something like oversight of training runs to prevent. Not to mention that denying market access or threatening prosecution are inadequate. Not to mention that naming-and-shaming and fining companies are totally inadequate. This passage totally fails to treat AI as a major risk. I know the authors are pretty worried about x-risk; I notice I'm confused.
Next:
While we believe government involvement will be necessary to ensure compliance with safety standards for frontier AI, there are potential downsides to rushing regulation.
This is literally true but it tends to misinform the reader on the urgency of strong safety standards and government oversight, I think.
On open-sourcing, it's not terrible; it equivocates but says "proliferation via open-sourcing" can be dangerous and
prudent practices could include . . . . Having the legal and technical ability to quickly roll back deployed models on short notice if the risks warrant it, for example by not open-sourcing models until doing so appears sufficiently safe.
The paper does say some good things. It suggests that safety standards exist, and that they include model evals, audits & red-teaming, and risk assessment. But it suggests nothing strong or new, I think.
The authors are clearly focused on x-risk, but they clearly tone that down. This is mostly demonstrated above, but also note that they phrase their target as mere "high severity and scale risks": "the possibility that continued development of increasingly capable foundation models could lead to dangerous capabilities sufficient to pose risks to public safety at even greater severity and scale than is possible with current computational systems." Their examples include AI "evading human control" but not killing everyone or disempowering humanity or any specific catastrophes.
I'd expect something stronger from these authors. Again, I notice I'm confused. Again, I might ask some of the authors, or maybe some will share their thoughts here or in some other public place.
Updates & addenda
Thanks to Justin, one of the authors, for replying. In short, he says:
I think your criticism that the tools are not ambitious is fair. I don't think that was our goal. I saw this project as a way of providing tools for which there is broad agreement and that given the current state of AI models we believe would help steer AI development and deployment in a better direction. I do think that another reading of this paper is that it's quite significant that this group agreed on the recommendations that are made. I consider it progress in the discussion of how to effectively govern increasingly power AI models, but it's not the last word either. :)
We also have a couple disagreements about the text.
Thanks to Markus, one of the primary authors, for replying. His reply is worth quoting in full:
Thanks for the post and the critiques. I won't respond at length, other than to say two things: (i) it seems right to me that we'll need something like licensing or pre-approvals of deployments, ideally also decisions to train particularly risky models. Also that such a regime would be undergirded by various compute governance efforts to identify and punish non-compliance. This could e.g. involve cloud providers needing to check if a customer buying more than X compute [has] the relevant license or confirm that they are not using the compute to train a model above a certain size. In short, my view is that what's needed are the more intense versions of what's proposed in the paper. Though I'll note that there are lots of things I'm unsure about. E.g. there are issues with putting in place regulation while the requirements that would be imposed on development are so nascent.
(ii) the primary value and goal of the paper in my mind (as suggested by Justin) is in pulling together a somewhat broad coalition of authors from many different organizations making the case for regulation of frontier models. Writing pieces with lots of co-authors is difficult, especially if the topic is contentious, as this one is, and will often lead to recommendations being weaker than they otherwise would be. But overall, I think that's worth the cost. It's also useful to note that I think it can be counterproductive for calls for regulation (in particular regulation that is considered particularly onerous) to be coming loudly from industry actors, who people may assume have ulterior motives.
Note that Justin and Markus don't necessarily speak for the other authors.
GovAI has a blogpost summary.
Jess Whittlestone has a blogpost summary/commentary.
GovAI will host a webinar on the paper on July 20 at 8am PT.
Markus has a Twitter summary.
The paper is listed on the OpenAI research page and so is somewhat endorsed by OpenAI.