Luke Muehlhauser recently posted this list of ideas. See also this List of lists of government AI policy ideas and How major governments can help with the most important century.
The full text of the post is below.
About two years ago, I wrote that “it’s difficult to know which ‘intermediate goals’ [e.g. policy goals] we could pursue that, if achieved, would clearly increase the odds of eventual good outcomes from transformative AI.” Much has changed since then, and in this post I give an update on 12 ideas for US policy goals that I tentatively think would increase the odds of good outcomes from transformative AI.
I think the US generally over-regulates, and that most people underrate the enormous benefits of rapid innovation. However, when 50% of the experts on a specific technology think there is a reasonable chance it will result in outcomes that are “extremely bad (e.g. human extinction),” I think ambitious and thoughtful regulation is warranted.
First, some caveats:
- These are my own tentative opinions, not Open Philanthropy’s. I might easily change my opinions in response to further analysis or further developments.
- My opinions are premised on a strategic picture similar to the one outlined in my colleague Holden Karnofsky’s Most Important Century and Implications of… posts. In other words, I think transformative AI could bring enormous benefits, but I also take full-blown existential risk from transformative AI as a plausible and urgent concern, and I am more agnostic about this risk’s likelihood, shape, and tractability than e.g. a recent TIME op-ed.
- None of the policy options below have gotten sufficient scrutiny (though they have received far more scrutiny than is presented here), and there are many ways their impact could turn out — upon further analysis or upon implementation — to be net-negative, even if my basic picture of the strategic situation is right.
- To my knowledge, none of these policy ideas have been worked out in enough detail to allow for immediate implementation, but experts have begun to draft the potential details for most of them (not included here). None of these ideas are original to me.
- This post doesn’t explain much of my reasoning for tentatively favoring these policy options. All the options below have complicated mixtures of pros and cons, and many experts oppose (or support) each one. This post isn’t intended to (and shouldn’t) convince anyone. However, in the wake of recent AI advances and discussion, many people have been asking me for these kinds of policy ideas, so I am sharing my opinions here.
- Some of these policy options are more politically tractable than others, but, as I think we’ve seen recently, the political landscape sometimes shifts rapidly and unexpectedly.
Those caveats in hand, below are some of my current personal guesses about US policy options that would reduce existential risk from AI in expectation (in no order).
- Software export controls. Control the export (to anyone) of “frontier AI models,” i.e. models with highly general capabilities over some threshold, or (more simply) models trained with a compute budget over some threshold (e.g. as much compute as $1 billion can buy today). This will help limit the proliferation of the models which probably pose the greatest risk. Also restrict API access in some ways, as API access can potentially be used to generate an optimized dataset sufficient to train a smaller model to reach performance similar to that of the larger model.
- Require hardware security features on cutting-edge chips. Security features on chips can be leveraged for many useful compute governance purposes, e.g. to verify compliance with export controls and domestic regulations, monitor chip activity without leaking sensitive IP, limit usage (e.g. via interconnect limits), or even intervene in an emergency (e.g. remote shutdown). These functions can be achieved via firmware updates to already-deployed chips, though some features would be more tamper-resistant if implemented on the silicon itself in future chips.
- Track stocks and flows of cutting-edge chips, and license big clusters. Chips over a certain capability threshold (e.g. the one used for the October 2022 export controls) should be tracked, and a license should be required to bring together large masses of them (as required to cost-effectively train frontier models). This would improve government visibility into potentially dangerous clusters of compute. And without this, other aspects of an effective compute governance regime can be rendered moot via the use of undeclared compute.
- Track and require a license to develop frontier AI models. This would improve government visibility into potentially dangerous AI model development, and allow more control over their proliferation. Without this, other policies like the information security requirements below are hard to implement.
- Information security requirements. Require that frontier AI models be subject to extra-stringent information security protections (including cyber, physical, and personnel security), including during model training, to limit unintended proliferation of dangerous models.
- Testing and evaluation requirements. Require that frontier AI models be subject to extra-stringent safety testing and evaluation, including some evaluation by an independent auditor meeting certain criteria.
- Fund specific genres of alignment, interpretability, and model evaluation R&D. Note that if the genres are not specified well enough, such funding can effectively widen (rather than shrink) the gap between cutting-edge AI capabilities and available methods for alignment, interpretability, and evaluation. See e.g. here for one possible model.
- Fund defensive information security R&D, again to help limit unintended proliferation of dangerous models. Even the broadest funding strategy would help, but there are many ways to target this funding to the development and deployment pipeline for frontier AI models.
- Create a narrow antitrust safe harbor for AI safety & security collaboration. Frontier-model developers would be more likely to collaborate usefully on AI safety and security work if such collaboration were more clearly allowed under antitrust rules. Careful scoping of the policy would be needed to retain the basic goals of antitrust policy.
- Require certain kinds of AI incident reporting, similar to incident reporting requirements in other industries (e.g. aviation) or to data breach reporting requirements, and similar to some vulnerability disclosure regimes. Many incidents wouldn’t need to be reported publicly, but could be kept confidential within a regulatory body. The goal of this is to allow regulators and perhaps others to track certain kinds of harms and close-calls from AI systems, to keep track of where the dangers are and rapidly evolve mitigation mechanisms.
- Clarify the liability of AI developers for concrete AI harms, especially clear physical or financial harms, including those resulting from negligent security practices. A new framework for AI liability should in particular address the risks from frontier models carrying out actions. The goal of clear liability is to incentivize greater investment in safety, security, etc. by AI developers.
- Create means for rapid shutdown of large compute clusters and training runs. One kind of “off switch” that may be useful in an emergency is a non-networked power cutoff switch for large compute clusters. As far as I know, most datacenters don’t have this. Remote shutdown mechanisms on chips (mentioned above) could also help, though they are vulnerable to interruption by cyberattack. Various additional options could be required for compute clusters and training runs beyond particular thresholds.
Of course, even if one agrees with some of these high-level opinions, I haven’t provided enough detail in this short post for readers to know what, exactly, to advocate for, or how to do it. If you have useful skills, networks, funding, or other resources that you might like to direct toward further developing or advocating for one or more of these policy ideas, please indicate your interest in this short Google Form. (The information you share in this form will be available to me [Luke Muehlhauser] and some other Open Philanthropy employees, but we won’t share your information beyond that without your permission.)
(Copied with permission.)
Many of these policy options would plausibly also be good to implement in other jurisdictions, but for most of them the US is a good place to start (the US is plausibly the most important jurisdiction anyway, given the location of leading companies, and many other countries sometimes follow the US), and I know much less about politics and policymaking in other countries.
For more on intermediate goals, see Survey on intermediate goals in AI governance.
This paragraph was added on April 18, 2023
sides my day job at Open Philanthropy, I am also a Board member at Anthropic, though I have no shares in the company and am not compensated by it. Again, these opinions are my own, not Anthropic’s.
There are many other policy options I have purposely not mentioned here. These include:
- Hardware export controls. The US has already implemented major export controls on semiconductor manufacturing equipment and high-end chips. These controls have both pros and cons from my perspective, though it’s worth noting that they may be a necessary complement to some of the policies I tentatively recommend in this post. For example, the controls on semiconductor manufacturing equipment help to preserve a unified supply chain to which future risk-reducing compute governance mechanisms can be applied. These hardware controls will likely need ongoing maintenance by technically sophisticated policymakers to remain effective.
- “US boosting” interventions, such as semiconductor manufacturing subsidies or AI R&D funding. One year ago I was weakly in favor of these policies, but recent analyses have nudged me into weakly expecting these interventions are net-negative given e.g. the likelihood that they shorten AI timelines. But more analysis could flip me back. “US boosting” by increasing high-skill immigration may be an exception here because it relocates rather than creates a key AI input (talent), but I’m unsure, e.g. because skilled workers may accelerate AI faster in the US than in other jurisdictions. As with all the policy opinions in this post, it depends on the magnitude and certainty of multiple effects pushing in different directions, and those figures are difficult to estimate.
- AI-slowing regulation that isn’t “directly” helpful beyond slowing AI progress, e.g. a law saying that the “fair use” doctrine doesn’t apply to data used to train large language models. Some things in this genre might be good to do for the purpose of buying more time to come up with needed AI alignment and governance solutions, but I haven’t prioritized looking into these options relative to the options listed in the main text, which simultaneously buy more time and are “directly” useful to mitigating the risks I’m most worried about. Moreover, I think creating the ability to slow AI progress during the most dangerous period (in the future) is more important than slowing AI progress now, and most of the policies in the main text help with slowing AI progress in the future, whereas some policies that slow AI today don’t help much with slowing AI progress in the future.
- Launching new multilateral agreements or institutions to regulate AI globally. Global regulation is needed, but I haven’t yet seen proposals in this genre that I expect to be both feasible and effective. My guess is that the way to work toward new global regulation is similar to how the October 2022 export controls have played out: the US can move first with an effective policy on one of the topics above, and then persuade other influential countries to join it.
- A national research cloud. I’d guess this is unhelpful because it accelerates AI R&D broadly and creates a larger number of people who can train dangerously large models, though the implementation details matter.
See e.g. p. 15-16 of the GPT-4 system card report for an illustration.
E.g. the lack of an off switch exacerbated the fire that destroyed a datacenter in Strasbourg; see section VI.2.1 – iv of this report.
Full text crossposted with permission.