Prior to this EAG, I had only encountered fragments of proposals for AI governance: "something something national compute library," "something something crunch time," "something something academia vs industry," and that was about the size of it. I'd also heard the explicit claim that AI governance is devoid of policy proposals (especially vis-a-vis biosecurity), and I'd read Eliezer's infamous EAG DC Slack statement:
My model of how AI policy works is that everyone in the field is there because they don't understand which technical problems are hard, or which political problems are impossible, or both . . .
At this EAG, a more charitable picture of AI governance began to cohere for me. I was setting about recalling and synthesizing what I learned, and I realized I should share—both to provide a data point and to solicit input. Please help fill out my understanding of the area, refer me to information, and correct my inaccuracies!
Eight one-on-ones contributed to this picture of the governance proposal landscape, along with Katja's and Beth's presentations, Buck's and Richard Ngo's office hours, and eavesdropping on Eliezer corrupting the youth of EAthens. I'm sure I only internalized a small fraction of the relevant content in these talks, so let me know about points I overlooked. (My experience was that my comprehension and retention of these points improved over time: as my mental model expanded, new ideas were more likely to connect to it.) The post is also sprinkled with my own speculations. I'm omitting trad concerns like stop-the-bots-from-spreading-misinformation.
Crunch Time Friends
The idea: Help aligned people achieve positions in government or make allies with people in those positions. When shit hits the fan, we activate our friends in high places, who will swiftly unplug everything they can.
My problem: This story, even the less-facetious versions that circulate, strikes me as woefully under-characterized. Which positions wield the relevant influence, and are timelines long enough for EAs to enter those positions? How exactly do we propose they react? Additionally, FTX probably updated us away from deceptive long-con type strategies.
Residual questions: Is there a real and not-ridiculous name for this strategy?
Slow Down China
The chip export controls were so so good. A further move would be to reduce the barriers to high-skill immigration from China to induce brain drain. Safety field-building is proceeding, but slowly. China is sufficiently far behind that these are not the highest priorities.
I'm told there are many proposals in this category. They range in enforcement from "labs have to report compute usage" to "labs are assigned a unique key to access a set amount of compute and then have to request a new key" to "labs face brick wall limits on compute levels." Algorithmic progress motivates the need for an "effective compute" metric, but measuring compute is surprisingly difficult as it is.
A few months ago I heard that another lever—in addition to regulating industry—is improving the ratio of compute in academia vs industry. Academic models receive faster diffusion and face greater scrutiny, but the desirability of these features depends on your perspective. I'm told this argument is subject to "approximately 17 million caveats and question marks."
Evaluations & Audits
The idea: Develop benchmarks for capabilities and design evaluations to assess whether a model possesses those capabilities. Conditional on a capability, evaluate for alignment benchmarks. Audits could verify evaluations.
Industry self-regulation: Three labs dominate the industry, an arrangement that promises to continue for a while, facilitating cooperation. Even if each believes itself locked in a secret arms race with the others, there are still opportunities to find common ground when engineers from different labs, like, bump into each other at EAG or whatever. Perhaps industry standards emerge due to public pressure, employee pressure, and each lab's desire for the others to not destroy the world. Journalism probably plays an important role here. Perhaps the threat of government regulation yields consensus: "if we don't solve this ourselves, the government will step in and tread on us." Beth suggested this dynamic inspired safety regulation among airlines. In general, ARC Evals is bullish on cooperation from labs, even for evaluations that impose significant costs. But we should move fast: we might be in a critical period when labs are willing to commit to restrictions on capabilities that they still regard as speculative/unlikely. This work raises some info hazard concerns, but it seems like people are aware of and managing the risks.
Government regulation: Regulators are often playing catch-up with industry, particularly in tech. Labs expecting to stay one step ahead of the government undermines the above argument about the threat of government intervention. On the other hand, government commands uniquely powerful enforcement abilities, and they can outsource the technical work to third parties. Government evals and audits might widen the revolving door, but whether this is good or bad probably turns on some questions about who gets reallocated away from capabilities research and how the talent pipeline responds. Policymakers might prefer different benchmarks or evaluations than industry selects; given that ARC Evals already occupies the industry niche, why not expand to government as well?
Interactions: Why not both? From what I heard, it seems likely that this is the correct approach. Multiple benchmarks and evaluations would accomplish more comprehensive coverage of the space of capabilities. Redundancy would also check against defections from the industry norms and exploitation of regulatory blindspots. One popular story is that government can allow industry consensus to coalesce, then codify those evaluations to bind market entrants. Going the other direction, maybe an agenda-setting role for government would contribute to faster/better norms. For example, in some finance firms, respect for regulators and "playing by the rules" is baked into the company culture. A different indirect channel might involve the government validating the media's suspicion toward AI, increasing public scrutiny.
These benefits aside, there could be more of a tradeoff between the two strategies if labs perceive government involvement as antagonistic, positioning the two as enemies with antithetical goals. In this event, corporate capture and lobbying might conceivably result in net-worse evals than if we had exclusively targeted industry self-regulation. I heard that in Europe, Meta is known for outright opposition to government oversight, whereas Microsoft takes a more insidious tack by proclaiming their desire for regulation, then proposing definitions so broad that even PyTorch is included, provoking broad industry resistance. I'm not too worried about this argument because it sounds like projecting individual psychology onto an organization, and suspicion is government's default posture toward big tech right now, so government regulation shouldn't surprise labs. Insofar as EA needs to prioritize right now, it might be worth considering whether industry or government is more promising, but many people asking this question seem significantly better-positioned to work on one or the other.
Residual questions: Do evals still fall under "AI governance" if they're exclusively implemented by labs? Buck talked about evals in the context of technical research. If we knew how to design alignment evaluations conditional on strong capabilities, wouldn't alignment be solved? How does our confidence in alignment evals decrease as capabilities improve?
The idea: Cultivate company cultures that encourage whistleblowing by supporting whistleblowers and framing it as a benefit to the company.
My problem: The longer you've worked somewhere, the more likely you are to be privy to juicy whistleworthy happenings. But those veteran employees are precisely the people who stand to lose the most from retaliation. To make matters worse, the fewer people involved in a misdeed, the less anonymity available for the whistleblower. All this amounts to the most important potential whistleblowers facing the highest costs. In contrast, whistleblowers' warm-and-fuzzies compensation is more invariant to the importance of the whistleblowing. This leads me to believe we should focus on incentivizing whistleblowing by improving the payoffs such that whistleblowing is worth the costs, even for veteran employees. (Obviously you don't want a policy that potential whistleblowers can Goodhart by sitting on their information to qualify for a larger reward. We want something resembling a centipede game, so someone snitches at T_0, with payoffs calibrated to deter collusion.)
Residual questions: How does any of this work? What are the current reward schedules for whistleblowing? Who sets them and how are rewards determined? Do X-risks introduce new considerations in that process? How does whistleblowing interact with tort law?
The idea: Now we get to the galaxy-brain plays. If some AI model would violate anti-trust laws, educating labs on the regulatory environment might preclude the development of these models. Importantly, this proposal focuses on proactively informing labs. Waiting for labs to stumble into legal trouble is less promising; anti-trust lawsuits take years, so by the time a lawsuit concludes, the damage will have already been done.
My problem: Anti-trust law is not optimized for AI safety applications, which raises the specter of unintended consequences. In particular, I think we should carefully attend to the counterfactual capabilities that labs substitute when they forego a project on anti-trust grounds. I'm all for another tool in our toolbox, but as the saying goes: when your only tool is an anti-trust hammer, every problem looks like an anti-competitive nail.
Residual questions: How does any of this work? What models and capabilities would run afoul of anti-trust law and which would not? How well-informed are labs already and how much of a delta could we really achieve here? Are there instances where we would prefer labs becoming ensnared in a lawsuit, perhaps to agitate public scrutiny?
AI is more obviously dual-use than, say, nuclear technology, but its strategic applications still capture the pentagon's imagination. We want to keep the DoD out of the AI game, both to limit domestic capabilities research and to avoid securitizing AI into an international arms race. Thankfully, current models like Chat GPT are too unreliable for military adoption in the US.
Military AI development would draw on massive military budgets and presumably employ less-aligned engineers than industry labs. Military AI products are unlikely to compete in the market with models trained in labs due to restrictions on making military software public, even as open source. Apparently, for example, the Air Force has refused to release some software that would benefit commercial aviation. So DoD entry shouldn't exacerbate race conditions among labs; to the extent DoD competes with labs, it would be over talent. The bigger concern is the research performed by the DoD itself.
There is a lot of room for us to improve influence. As far as I know, there is only one aligned person in the DoD focused on AI. Moreover, a recent report from the National Security Commission on AI neglected X-risks, despite Jason Matheny serving on its commission, which does not inspire confidence.
Residual questions: How would DoD operations interact with government regulation efforts? With industry self-regulation standards? Does optimizing models for military applications yield different/worse alignment problems than for civilian use? How much would DoD training programs grow the number of capabilities researchers? Which positions wield the relevant influence?