This post is written in my personal capacity, and does not necessarily represent the views of OpenAI or any other organization. Cross-posted to the Alignment Forum.

In the previous post of this sequence, I argued that intent-aligned AIs would, by default, have incentives to break the law. This post goes into one particularly bad consequence of that incentive: the increased difficulty of making credible pre-AGI commitments about post-AGI actions.

Image by OpenAI's DALL·E

In AGI policy and strategy, it would often be useful to adopt credible commitments about what various actors will do post-AGI. For example, it may be desirable for two leading nations in AGI to agree to refrain from racing to build AGI (at the potential cost to AGI safety) and instead split the economic upside from AGI, thereby transforming a negative-sum dynamic into a positive-sum one.[1] Nations might agree to forego their own development of militarily useful AI systems if they receive security assurances from states that do choose to pursue such systems,[2] thus reducing the number of actors pursuing potentially dangerous military AI development, and therefore reducing the risk of a catastrophic accident. One can imagine similar types of pre-AGI agreements concerning space colonization post-AGI, the welfare of digital minds, democratic control of AGI, and many more important issues in the post-AGI world.

Lawless AI poses an enormous problem for such agreements. Agreements between entities, such as states or companies, require performance by agents of the parties: a state or corporation only acts through its agents. But if one party cannot convincingly show that their agents will carry out orders to comply with such agreements, the other party will be very reluctant to enter into an agreement with them. All the more so if the principal cannot prevent its agents from working against aspects of the agreement that are detrimental to the principal.

I will use the following as a toy problem to show what I mean more concretely: X and Y are two leading AGI development firms considering agreeing to split the profits from AGI development if they both agree to common minimum implement safety measures to prevent racing.[3] How might this play out?

X might worry that Y will develop an AGI agent, A(Y), with the objective of maximizing the value of Y's shares. A(Y) can do this in ways that would undermine X's recourse to Y's assets. For example, A(Y) could create a subagent that surreptitiously earns money and directly mails checks to the shareholders of Y, as a sort of pseudo-dividend. That money might never pass through Y, so X would have a hard time reaching it, even if they detected this. X might also worry that A(Y) would also corrupt the legal process in Y's favor, so that even if they had a good legal claim, they could not vindicate it in court at any reasonable cost.

With these concerns in mind, the deal is unlikely to happen.

How might LFAI improve the situation? X and Y could both agree that:

  1. All sufficiently advanced AI agents ("Qualifying Agents") that they train, use, benefit from, or deploy will be LFAI.
  2. The other party may, through a neutral third-party bound by confidentiality,[4] inspect Qualifying Agents and their development to verify that the trained agent will be LFAI when deployed.
  3. Qualifying Agents will be specifically instructed to assume that they are bound by the agreement.[5]
  4. The parties will generally cause all of their agents to give full effect to the agreement.
  5. In any proceeding where the parties are adverse, neither party shall use any artificial intelligence technology to its legal or strategic benefit, unless the other party has free and equal access to the same technology.[6]
  6. In case of any ambiguity in the terms of the agreement, the ambiguity shall be resolved in favor of more constraints on Qualifying Agents and their principal(s).

The resulting agreement would probably not be watertight, but the parties would have much greater assurances of post-AGI performance than they would absent LFAI. Referring back to the example above, A(Y) could not surreptitiously earn money outside of Y and pass it on the Y's shareholders, nor could A(Y) illegally corrupt adjudications of the agreement. LFAI, then, could make the parties much more likely to be willing to enter stabilizing pre-AGI agreements that require post-AGI performance. This seems like a substantial improvement.


  1. Cf. Amanda Askell et al., The Role of Cooperation in Responsible AI Development (2019) (preprint), https://arxiv.org/abs/1907.04534. ↩︎

  2. Of course, this could be analogized to similar agreements regarding nuclear disarmament, such as Ukraine's fateful decision to surrender its post-Soviet nuclear arsenal in exchange for security assurances (which have since been violated by Russia). See, e.g., Editorial, How Ukraine Was Betrayed in Budapest, Wall St. J. (Feb. 23, 2022), https://www.wsj.com/articles/how-ukraine-was-betrayed-in-budapest-russia-vladimir-putin-us-uk-volodymyr-zelensky-nuclear-weapons-11645657263?reflink=desktopwebshare_permalink. Observers (especially those facing potential conflict with Russia) might reasonably question whether any such disarmament agreements are credible. ↩︎

  3. We will ignore antitrust considerations regarding such an agreement for the sake of illustration. ↩︎

  4. So that this inspection process cannot be used for industrial espionage. ↩︎

  5. This may not be the case as a matter of background contract and agency law, and so should be stipulated. ↩︎

  6. This is designed to guard against the case where one party develops AI super-lawyers, then wields them asymmetrically to their advantage. ↩︎

26

3 comments, sorted by Click to highlight new comments since: Today at 3:06 AM
New Comment

This is a really valuable idea and is certainly an area we should research more heavily. I have some brief thoughts on the 'pros' and some ideas that aren't so much 'cons' as 'areas for further exploration (AFFE)'. The AFFE list will be longer due to the explanation necessary, not because there's more AFFE than Pros :)

Pros:

  • Law tends to be quite a precise field, which lends itself to CompSci more than many other areas
  • Law (generally) evolves to reflect society's current moral beliefs and values
  • Law has a huge focus on 'unintended consequences' which is a big alignment area too
  • Law has spent thousands of years grappling with the ideas surrounding the inclusion of varying degrees of sentience, intelligence, and personhood - both in civil and criminal aspects. Different rights, duties,  obligations, levels of consent and levels of culpability for humans, corporations, animals, children, etc.  Therefore the possibility exists to turn some of this experience to AI alignment and also dealing with the consequences of when AI goes wrong.
     

AFFE:

  • Law doesn't always reflect morals we want it to. In the 1940s, an AI system under this type of governance would have turned any Jews it found in to the Nazis because it was the lawful thing to do, despite the fact it is objectively to a human the wrong thing to do. Further examples are turning in escaped slaves it encountered, fully collaborating with Gestapo hunting partisans, full collaboration with Russians seizing Ukrainian territory, and more. These are all extreme examples sprinkled throughout the past - but the point is we don't know what the future holds. Currently here in the UK the government are throwing lots of effort and resources into passing laws to restrict basic protest rights under the Policing and Crime Bill. It's not impossible in 30 years time for the UK (or anywhere else) to be a police state or authoritarian regime. An AI rigidly sticking to the law would be misaligned morally. You did focus on this when discussing a balance of outcomes so it's not so much a weakness as an area we need to explore much more.
     
  • For Common Law systems such as the UK (and I believe USA? Please correct if wrong), the AI utilising case law for its moral alignment would change faster than we could reasonably foresee. Just look at the impact R v Cartwright (1569) had on Shanley v Harvey (1763) by not mentioning the race of the slave who was to be considered a person and freed because he was on English soil. For Civil Law systems such as those in mainland Europe this would be a much easier idea, but Common Law systems would need a bit more finesse. Or just not including case law and relying on legislation, which could be a bit more barebones.
     
  • Different countries have different laws. Some companies in the USA don't (or can't) operate in the EU and in the UK because GDPR affords Europeans more data rights than the USA does on its own citizens, which means that for data processing companies which even allow access to their websites from European jurisdiction would face large fines for doing to EU resident data what they regularly do to US data with no problem. I can see this being an issue with moral alignment. When we say for AI to follow law - whose law? US rights and freedoms laws imposed on European AI would be a collossal backstep for human and civil rights in Europe not seen since WW2, whereas European copyright and patent laws imposed to US AI would be a huge backstep for the US (don't get me started on the EU's patent laws!).  If the AI just follows the laws where the AI is based, it will be difficult to operate internationally for all except physical robot systems which could follow the same rules as people currently do without much effort given to changing its thinking as soon as it passed physical boundaries or borders. Perhaps we could create an international agreement on core concepts such as the Universal Declaration of Human Rights (UDHR), the International Covenant on Civil and Political Rights (ICCPR), the International Covenant on Economic, Social, and Cultural Rights (ICCPR), and the European Convention on Human Rights (ECHR). We did it for unifying laws about rights to avoid another holocaust - why not about alignment to avoid a similar, yet potentially larger, catastrophe?




All in all this is a really well thought out idea for AI alignment and I am very hopeful it gets more exploration in the future. I've often felt that much current AI policy research is 'all jaw, no teeth' in that much of the focus is getting AI aligned in a simple lab or thought environment instead of a messy, complex human one. 

Potential benefits also include getting many more legal scholars into EA, which is a talent we are sorely lacking as a community and many other areas and projects would also benefit from in the future.
 

Thanks a ton for your substantive engagement, Luke! I'm sorry it took so long to respond, but I highly value it.

Law doesn't always reflect morals we want it to. In the 1940s, an AI system under this type of governance would have turned any Jews it found in to the Nazis because it was the lawful thing to do, despite the fact it is objectively to a human the wrong thing to do. Further examples are turning in escaped slaves it encountered, fully collaborating with Gestapo hunting partisans, full collaboration with Russians seizing Ukrainian territory, and more. These are all extreme examples sprinkled throughout the past - but the point is we don't know what the future holds. Currently here in the UK the government are throwing lots of effort and resources into passing laws to restrict basic protest rights under the Policing and Crime Bill. It's not impossible in 30 years time for the UK (or anywhere else) to be a police state or authoritarian regime. An AI rigidly sticking to the law would be misaligned morally. You did focus on this when discussing a balance of outcomes so it's not so much a weakness as an area we need to explore much more.

Yeah, definitely agree that this is tricky and should be analyzed more (especially drawing on the substantial existing literature about moral permissibility of lawbreaking, which I haven't had the time to fully engage in).

For Common Law systems such as the UK (and I believe USA? Please correct if wrong), the AI utilising case law for its moral alignment would change faster than we could reasonably foresee. Just look at the impact R v Cartwright (1569) had on Shanley v Harvey (1763) by not mentioning the race of the slave who was to be considered a person and freed because he was on English soil. For Civil Law systems such as those in mainland Europe this would be a much easier idea, but Common Law systems would need a bit more finesse. Or just not including case law and relying on legislation, which could be a bit more barebones.

Yeah, I do think there's an interesting thing here where LFAI would make apparent the existing need to adopt some jurisprudential stance about how to think about the evolution of law, and particularly of predicted changes in the law. As an example of how this already comes up in the US, judges sometimes regard higher courts' precedents as bad law, notwithstanding the fact that the higher court has not yet overruled it. The addition of AI into the mix—as both a predictor of and possible participant in the legal system, as well as a general accelerator of the rate of societal change—certainly threatens to stretch our existing ways of thinking about this. This is also why I'm worried about asymmetrical use of advanced AI in legal proceedings. See footnote 6.

(And yes, the US[1] is also common law. :-) )

Different countries have different laws. Some companies in the USA don't (or can't) operate in the EU and in the UK because GDPR affords Europeans more data rights than the USA does on its own citizens, which means that for data processing companies which even allow access to their websites from European jurisdiction would face large fines for doing to EU resident data what they regularly do to US data with no problem. I can see this being an issue with moral alignment. When we say for AI to follow law - whose law? US rights and freedoms laws imposed on European AI would be a collossal backstep for human and civil rights in Europe not seen since WW2, whereas European copyright and patent laws imposed to US AI would be a huge backstep for the US (don't get me started on the EU's patent laws!). If the AI just follows the laws where the AI is based, it will be difficult to operate internationally for all except physical robot systems which could follow the same rules as people currently do without much effort given to changing its thinking as soon as it passed physical boundaries or borders. Perhaps we could create an international agreement on core concepts such as the Universal Declaration of Human Rights (UDHR), the International Covenant on Civil and Political Rights (ICCPR), the International Covenant on Economic, Social, and Cultural Rights (ICCPR), and the European Convention on Human Rights (ECHR). We did it for unifying laws about rights to avoid another holocaust - why not about alignment to avoid a similar, yet potentially larger, catastrophe?

Definitely agree. I think the practical baby step is to develop the capability of AI to interpret and apply any given legal system. But insofar as we actually want AIs to be law-following, we obviously need to solve the jurisdictional and choice of law questions, as a policy matter. I don't think we're close to doing that—even many of the jurisdictional issues in cyber are currently contentious. And as I think you allude to, there's also a risk of regulatory arbitrage, which seems bad.


  1. Except the civil law of Louisiana, interestingly. ↩︎

No problem RE timescale of reply! Thank you for such a detailed and thoughtful one :)