Hide table of contents

Glasswing was an overlooked governance precedent. Through it, Anthropic recognised that when capabilities rapidly advance and frontier models could cause serious harm, they have a responsibility to control who gets to access their tech.

Here, I deal with the governance implications from Anthropic taking on that responsibility. (I also note that there is an interesting technical question as to when that capability-leap threshold is crossed and public access to a model should be withheld, that I’m not qualified to contribute to).

By deciding which organisations or state-adjacent institutions could use Mythos before it was released, Anthropic made themselves the effective arbiter of who has access to strategically important tech. They decided who could access a model with powerful offensive capabilities, and who could prepare themselves against it, making judgement calls based on no agreed criteria and with no accountability for their decisions. I have previously outlined how these decisions could plausibly have significant consequences, like increasing coup risk.

This post begins from the perspective that this seems like a bad governance arrangement. The decision over who has access to such valuable technology should probably be determined by a governing body, not whichever frontier lab develops the most capable model. In this case, Mythos was developed by a safety-conscious team at Anthropic; in future it could be developed elsewhere. At a minimum, labs should make these decisions based on agreed rules and be accountable to an external body for the decisions that they make.

While I believe these decisions should be made outside the lab, I want to first deal with the reality of the precedent set by Glasswing, before suggesting a better institutional arrangement than the one we have. In the world where access decisions continue to be made by whichever lab develops the most capable tech, what rules should govern their actions? What criteria should determine who is granted access to their model? And what sort of regulatory arrangement would incentivise them to make good decisions? I argue that a reasonable criterion generates decisions of such political complexity that no private actor has the legitimacy to make them, which is why a regulatory framework is needed.

Democratic resilience

Here I develop one example of a criterion we might want a lab to use when deciding how to control access to their model: promoting democratic resilience. (In this section, I am assuming that Anthropic are making decisions on the basis of some guiding principles, not just on the vibes of who they already work with or trust).

As Anthropic now decide who can or cannot defend critical infrastructure, they have acquired a form of structural geopolitical power that would historically trigger obligations to uphold international governance norms. Anthropic did not choose this responsibility, but the structure of Glasswing means they function as a geopolitical actor, and I propose we recognise the obligations that come with that status. Existing international governance frameworks assert that when actors control infrastructure states depend on for sovereign functions, they trigger certain obligations. Frontier labs are crossing that threshold.

What international governance norms are typically attached to this form of structural power?

The UN Guiding Principles on Business and Human Rights establish that corporations have a responsibility to avoid contributing to human rights violations and undertake Due Diligence to prevent or mitigate adverse human rights impacts from their activities, even where they aren’t the direct perpetrator. A lab that grants access to a model knowing it could be used to undermine democratic institutions therefore has a complicity problem under existing governance frameworks. Democratic resilience isn’t a term the UNGPs use, but contributing to coup risk falls plausibly within the human rights harms they are designed to prevent. If Anthropic adopted democratic resilience as a principle to inform their access-control decision making, how might they proceed?

First, Anthropic would need to decide which states are sufficiently democratic as to warrant the opportunity to bolster defences before advanced models are deployed within their state. There would need to be a nominated adjudicator or process to settle contested cases, where a government claims they ought to be equipped with the tools to defend critical government infrastructure from attack before a model is released within their territory.

In contested states, Anthropic would need to either pick winners or decide not to act and let their power disputes play out without interference (this could create calls they have a responsibility to act, where frontier tech could help an ally to secure control of the state). The history of US governments or corporations picking winners in contested states is an infamous one. I doubt anyone would argue that Anthropic settling such disputes is a good idea.

Anthropic would also need a process for when a close democratic ally is backsliding into a non-democratic regime. Political scientists have long debated how to measure democracy; this is particularly difficult to do in real-time, where interpretations are shaped by events or allegiances of the day.

And how would labs navigate relationships with powerful non-democratic states who demand access to advanced models once they are shared with less powerful democratic allies? Could a host of middling powers feasibly ignore threats from Putin or Xi to share access to an advanced model after a frontier lab has shared it with them to Shore Up their defences?

These are just some of the initial questions that Anthropic would need to answer. It is immediately obvious that a small number of private actors should not make decisions of such geopolitical importance, nor should they do so without democratic accountability. I will address the most likely counterargument to this before concluding.

A likely objection

One objection to my argument here is that promoting democratic resilience is a rather complicated or lofty principle for a lab to follow. When I asked for suggestions on LessWrong, one commenter suggested labs might follow the principle ‘"first, do no harm.” New models shall be available first to those who are credibly defenders of the common good’.

This is actually a very good, simple suggestion. It is also a good example to demonstrate why labs cannot unilaterally make these decisions. Whose conception of ‘harm’, ‘credible defenders’ and ‘the common good’ should they implement? Without an institutional regulator, it will be the staff at whichever frontier lab happens to develop the most capable model. Their interpretation of these concepts will almost certainly be shaded by some (un)conscious bias towards American or Chinese exceptionalism.

Taking the narrower view of distributing tech to ‘credible defenders’ within the cybersecurity industry is a more tractable solution, but still offers a technical solution to a political problem; to an American tech company, the credible defenders are those sitting on critical infrastructure used by American consumers. To the citizen of a middle power whose government could plausibly be challenged by the proliferation of dangerous cyberattacking capabilities, their government’s cybersecurity agency might be a more credible defender to equip with access to the most harmful AI model.

The fact that even a narrower, more tractable criterion immediately generates these scenarios is precisely the problem. These are questions that have deep roots in geopolitics and state sovereignty; the institutional apparatus that responsible access-control requires should reflect that. The decisions cannot - and should not - be solved by well-meaning staff in a lab.

Towards a governance arrangement for access-control decisions

Controlling access to a harmful model was a useful governance precedent, but the access-control decision should not have been made by Anthropic. There is too much geopolitical sway at stake for that power to be concentrated in the hands of whichever frontier lab is winning the AI race.

How frontier labs should navigate access-control decisions in the interim - before institutional governance arrangements are established - is a question I haven’t fully answered here, and one I’ll return to.

In my next post, I plan to propose a sensible institutional design for governing access to harmful models. Some initial thoughts for criteria that the arrangements could be based on:

  1. Access decisions should be justified against criteria decided by an external body.
  2. They should be regulated by an independent, democratically accountable body (appointed by a democratic body would suffice).
  3. No single lab should be the sole decision-maker for access to the most advanced capabilities.
  4. AISI should have mandatory access to the most advanced models, and the ability to refer to a regulator who can prevent public access to advanced models until cybersecurity capabilities have caught up.

 

Thanks for reading, I would love to hear any thoughts.

3

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities