Aligning AI with Humans by Leveraging Legal Informatics

johnjnay

A summer 2022 survey of hundreds of AI researchers estimated an aggregate forecast time of 37 years for a 50% chance of high–level machine intelligence (“when unaided machines can accomplish every task better and more cheaply than human workers”).^[1] Natural language processing (NLP) is a key domain of AI, so surveys of these researchers are of particular interest. A separate summer 2022 survey of hundreds of NLP researchers found that 73% “agree that labor automation from AI could plausibly lead to revolutionary societal change in this century, on at least the scale of the Industrial Revolution.”^[2]

We already face significant challenges communicating our goals and values in a way that reliably directs AI behavior – even without additional technological advancements, which could compound the difficulty with more autonomous systems. Specifying the desirability (value) of an AI system taking a particular action in a particular state of the world is unwieldy beyond a very limited set of value-action-states. In fact, the purpose of machine learning is to train on a subset of world states and have the resulting agent generalize an ability to choose high value actions in new circumstances. But the program ascribing value to actions chosen during training is an inevitably incomplete encapsulation of the breadth and depth of human judgements, and the training process is a sparse exploration of states pertinent to all possible futures. Therefore, after training, AI is deployed with a coarse map of human preferred territory and will often choose actions unaligned with our preferred paths.

Law is a computational engine that converts human values into legible directives. Law Informs Code is the research agenda attempting to model that complex process, and embed it in AI. As an expression of how humans communicate their goals, and what society values, Law Informs Code.

Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans, an article forthcoming in the Northwestern Journal of Technology and Intellectual Property, dives deeper into related work and this upcoming research agenda being pursued at The Stanford Center for Legal Informatics (a center operated by Stanford Law School and the Stanford Computer Science Department).

Similar to how parties to a legal contract cannot foresee every potential “if-then” contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed legislation will be applied, we cannot specify “if-then” rules that provably lead to good AI behavior. Fortunately, legal theory and practice have developed arrays of tools for goal specification and value alignment.

Take, for example, the distinction between legal rules and standards. Rules (e.g., “do not drive more than 60 miles per hour”) are more targeted directives than standards. They enable the rule-maker to have clarity over outcomes that will be realized in the states they specify. If rules are not written with enough potential states of the world in mind, they can lead to unanticipated undesirable outcomes (e.g., a driver following the rule above is too slow to bring their passenger to the hospital in time to save their life), but to enumerate all the potential scenarios is excessively costly outside of simple environments. Legal standards evolved to allow parties to contracts, judges, regulators, and citizens to develop shared understandings and adapt them to novel situations (i.e., to estimate value expectations about actions in unspecified states of the world). For the Law Informs Code use-case, standards do not require adjudication for implementation and resolution of meaning like they do for their legal creation. The law’s lengthy process of iteratively defining standards through judicial opinion and regulatory guidance can be the AI’s starting point, via machine learning on the application of the standards.

Toward that end, we are embarking on the project of engineering legal data into training signals to help AI learn standards, e.g., fiduciary duties. The practices of making, interpreting, and enforcing law have been battle tested through millions of legal contracts and actions that have been memorialized in digital format, providing large data sets of training examples and explanations, and millions of well-trained active lawyers from which to elicit machine learning model feedback to embed an evolving comprehension of law. For instance, court opinions on violations of investment adviser’s fiduciary obligations represent (machine) learning opportunities for curriculum on the fiduciary standard and its duties of care and loyalty.

Other data sources suggested for use toward AI alignment – surveys of human preferences, humans contracted for labeling data, or (most commonly) the implicit beliefs of the AI system designers – lack an authoritative source of synthesized preference aggregations. In contrast, legal rules, standards, policies, and reasoning approaches are not academic philosophical guidelines or ad hoc online survey results. They are legal standards with a verifiable resolution: ultimately obtained from a court opinion; but short of that, elicited from legal experts.

Building integrated legal informatics-AI systems that learn the theoretical constructs and practices of law, the language of alignment, such as contract drafting and interpretation, should help us more robustly specify inherently vague human goals for AI, increasing human-AI alignment. This may even improve general AI capabilities (or at least not cause net negative overall change), which, arguably, could be positive for AI safety because techniques that increase AI alignment at the expense of AI capabilities can lead to organizations eschewing alignment to gain additional capabilities as organizations race forward developing powerful AI.

Toward society-AI alignment, we are developing a framework for understanding law as the applied philosophy of multi-agent alignment, which harnesses public policy as an up-to-date knowledge base of democratically endorsed values. Although law is partly a reflection of historically contingent political power – and thus not a perfect aggregation of citizen preferences – if properly parsed, its distillation offers a legitimate computational comprehension of societal beliefs.

If others find this research agenda potentially interesting, please reach out to this project to explore how we could collaborate.

^[1] 2022 Expert Survey on Progress in AI (August 23, 2022) https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/.

^[2] Julian Michael et al., What Do NLP Researchers Believe? Results of the NLP Community Metasurvey (August 26, 2022) https://arxiv.org/abs/2208.12852 at 11.

20 Reactions

What are the challenges and problems with programming law-breaking constraints into AGI?

34 comments20 karma

Comments11

Sorted by

New & upvoted

Click to highlight new comments since: Today at 10:15 AM

Charlie Steiner4y5

Presumably you're aware of various Dylan Hadfield-Menell papers, e.g. https://dl.acm.org/doi/10.1145/3514094.3534130 , https://dl.acm.org/doi/10.1145/3306618.3314258 , https://dl.acm.org/doi/10.1145/3514094.3534130

And of course Xuan's talk ( https://www.lesswrong.com/posts/Cty2rSMut483QgBQ2/what-should-ai-owe-to-us-accountable-and-aligned-ai-systems )

But, to be perfectly honest... I think there's part of this proposal that has merit, and part of this proposal that might sound good to many people but is actually bad.

First, the bad: The notion that "Law is a computational engine that converts human values into legible directives" is wrong. Legibility is not an inherent property of the directives. It is a property of the directives with respect to the one interpreting them, which in the case of law is humans. If you build an AI that doesn't try to follow the spirit of the law in a human-recognizable way, the law will not be legible in the way you want.

The notion that it would be good to build AI that humans direct by the same process that we currently create laws is wrong. Such a process works for laws, specifically for laws for humans, but the process is tailored to the way we currently apply it in many ways large and small, and has numerous flaws even for that purpose (as you mention, about expressions of power).

Then, the good: Law offers a lot of training data that directly bears on what what humans value, what vague statements of standards mean in practice, and what humans think good reasoning looks like. The "legible" law can't be used directly, but it can be used as a yardstick against which to learn the illegible spirit of the law. This research direction does not look like a Bold New Way to do AI alignment, instead it looks like a Somewhat Bold New Way to apply AI alignment work that is fully contiguous with other alignment research (e.g. attempts to learn human preferences by actively asking humans).

johnjnay4y4

Hi Charlie, thank you for your comment.

I cite many of Dylan's papers in the longer form version of this post: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4218031

I will check out Xuan's talk. Thanks for sharing that.

Instead of:

Law is a computational engine that converts human values into legible directives.

I could expand the statement to cover the larger project of what we are working on:

Law and legal interpretation form a computational engine that converts human values into legible directives.

One of the primary goals of this research agenda is to teach AI to follow the spirit of the law in a human-recognizable way. This entails leveraging existing human capabilities for the "law-making" / "contract-drafting" part (how do we use the theory and practice of law about how to tell agents what to do?), and conducting research on building AI capabilities for the interpretation part (how do our machine learning processes use data and processes from the theory and practice of law about how agents interpret those directives / contracts?).

Reinforcement learning with human attorney feedback (there are more than 1.3 million lawyers in the US) via natural language interactions with AI models is potentially a powerful process to teach (through training, or fine-tuning, or extraction of templates for in-context prompting of large language models) statutory interpretation, argumentation, and case-based reasoning, which can then be applied more broadly for AI alignment. Models could be trained to assist human attorney evaluators, which theoretically, in partnership with the humans, could allow the combined human-AI evaluation team to have capabilities that surpass the legal understanding of the expert humans alone.

The Foundation Models in use today, e.g., GPT-3, have, effectively, conducted a form of behavioral cloning on a large portion of the Internet to leverage billions of human actions (through natural language expressions). It may be possible to, similarly, leverage billions of human legal data points to build Law Foundation Models through large-scale language model self-supervision on pre-processed legal text data.

Aspects of legal standards, and the "spirit" of the law, can be learned directly from legal data. We could also codify examples of human and corporate behavior exhibiting standards such as fiduciary duty into a structured format to evaluate the standards-understanding capabilities of AI models. The legal data available for AI systems to learn from, or be evaluated on, includes textual data from all types of law (constitutional, statutory, administrative, case, and contractual), legal training tools (e.g., bar exam outlines, casebooks, and software for teaching the casuistic approach), rule-based legal reasoning programs, and human-in-the-loop live feedback from law and policy human experts. The latter two could simulate state-action-reward spaces for AI fine-tuning or validation, and the former could be processed to do so.

Automated data curation processes to convert textual legal data into either state-action-reward tuples, or contextual constraints for shaping candidate action choices conditional on the state, is an important frontier in this research agenda (and promising for application to case law text data, contracts, and legal training materials). General AI capabilities research has recently found that learning from textual descriptions, rather than direct instruction, may allow models to learn reward functions that better generalize. Fortunately, much of law is embedded more in the form of descriptions and standards than it is in the form of direct instructions and specific rules. Descriptions of the application of standards provides a rich and large surface area to learn from.

Textual data can be curated and labeled for these purposes. We will aim for two outcomes with this labeling. First, data that can be used to evaluate how well AI models understand legal standards. Second, the possibility that the initial “gold-standard” human expert labeled data can be used to generate additional much larger sets of data through automated curation and processing of full corpora of legal text, and through model interaction with human feedback.

I think your statement:

"This research direction does not look like a Bold New Way to do AI alignment, instead it looks like a Somewhat Bold New Way to apply AI alignment work that is fully contiguous with other alignment research"

is spot on. That is how I was thinking about it, but I should have made that more clear; perhaps I should work on a follow-up post at some point that explicitly explores the intersections of Law Informs Code with other strands of alignment research. Some of this is in the longer form version of this post, but with this inspiration from you, I may try to go further in that direction (although I am already beyond the length the Journal editors want!).

Charlie Steiner4y2

Thanks for your thorough response, and yeah, I'm broadly on board with all that. I think learning from detailed text behind decisions, not just the single-bit decision itself, is a great idea that can leverage a lot of recent work.

I don't think that using modern ML to create a model of legal text is directly promising from an alignment standpoint, but by holding out some of your dataset (e.g. a random sample, or all decisions about a specific topic, or all decisions later than 2021), you can test the generalization properties of the model, and more importantly test interventions intended to improve those properties.

I don't think we have that great a grasp right now on how to use human feedback to get models to generalize to situations the humans themselves can't navigate. This is actually a good situation for sandwiching: suppose most text about a specific topic (e.g. use of a specific technology) is held back from the training set, and the model starts out bad at predicting that text. Could we leverage human feedback from non-experts in those cases (potentially even humans who start out basically ignorant about the topic) to help the model generalize better than those humans could alone? This is an intermediate goal that it would be great to advance towards.

johnjnay4y1

Interesting. I will think more about the sandwiching approach between non-legal experts and legal experts.

johnjnay4y4

This was cross-posted here as well:

A follow-up thought based on conversations catalyzed by this post:

Much of the research on governing AI and managing its potential unintended consequences currently falls into two ends of a spectrum related to assumptions of the imminence of transformative AGI. Research operating under the assumption of a high probability of near-term transformative AI (e.g., within 10-15 years) is typically focused more on how to align AGI with ideal aggregations of human preferences (through yet to be tested aggregation processes). Research operating under the assumption of a low probability of near-term transformative AI is typically focused on how to reduce discriminatory, safety, and privacy harms posed by present-day (relatively "dumb") AI systems. The proposal in this post seeks a framework that, over time, bridges these two important ends of the AI safety spectrum.

Geoffrey Miller4y4

Hi John, thanks for this post. I think it's a fascinating and fruitful idea.

My main concern would be that almost every mind that's been writing, debating, legislating, interpreting, and applying law, so far, has been a human mind, full of a vast amount of 'common sense', background, cultural context, and world-knowledge that might need to built into an AI system, for the system to really be able to use law as a crystallization of human values.

My hunch is that that we'll be surprised at how much of that background knowledge will be necessary for law to be very useful in alignment. But, I think it's well worth pursuing, with that caveat in mind! Also, law tends to track situations where humans have conflicts of interest with each other, and it might not track universal values that are so obvious to everyone that conflicts of interest hardly ever arise.

I have a second concern that in many cultures, codified law might embody a lot of values that people in other cultures might find objectionable. For example an AI system that's fully aligned with Muslim Sharia law might embody values that don't fit very well with secular Western law. Or, Americans might not like Chinese law very much, or vice versa.

Maybe we can try to find some cross-cultural universals in legal systems that exemplify some common ground for human values. That might be easier in some domains of law (e.g. contract law, property law, commercial law) than in other domains of law (e.g. sexual conduct, marriage law, family law, religious law).

johnjnay4y2

Hi Geoffrey, thank you for this feedback.

On your background knowledge comment, I agree that is an important open question (for this proposal, and other alignment techniques).

Related to that, I have been thinking through the systematic selection of which data sets are best suited for self-supervised pre-training of large language models - an active area of research in AI capabilities and Foundation Models more generally, which may be even more important for this application to legal data. For self-supervision on legal data, we could use (at least) two filters to guide data selection and data structuring processes.

First, is the goal of training on a data point to embed world knowledge into AI, or legal task knowledge? Learning that humans in the U.S. drive on the right side of the road is learning world knowledge; whereas, learning how to map a statute about driving rules to a new fact pattern in the real world is learning how to conduct a legal reasoning task. World knowledge can be learned from legal and non-legal corpora. Legal task knowledge can primarily be learned from legal data.

Second, is the approximate nature of the uncertainty that an AI could theoretically resolve by training on a data point epistemic or aleatory ? If the nature of the uncertainty is epistemic – e.g., whether citizens prefer climate change risk reduction over endangered species protection – then it is fruitful to apply as much data as we can to learning functions to closer approximate the underlying fact about the world or about law. If the nature of the uncertainty is more of an aleatory flavor – e.g., the middle name of the defendant in a case – then there is enough inherent randomness that we would seek to avoid attempting to learn anything about that fact or data point.

There are many other aspects of self-supervised pre-training data curation that we will need to explore, but figured I'd share a couple that are top of mind in the context of your world knowledge comment.

Public law informs AI more through negative than positive directives; and therefore it’s unclear the extent to which policy – outside of the human-AI “contract and standards” type of alignment we are working on – can inform which goals AI should proactively pursue to improve the world on society’s behalf. I agree with your comment that, "law tends to track situations where humans have conflicts of interest with each other, and it might not track universal values that are so obvious to everyone that conflicts of interest hardly ever arise." This is a great illustration of the need to complement the Law Informs Code approach with other approaches to specifying human values. But I believe there are challenges with using the "AI Ethics" approach as the core framework, see section IV. PUBLIC LAW: SOCIETY-AI ALIGNMENT of the longer form version of this post, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4218031 . I think a blend of the frameworks could most fruitful.

Finally, it would be very interesting to conduct research on the possibility of "cross-cultural universals in legal systems that exemplify some common ground for human values," and which domains of law have the most cultural overlap. There are many exciting threads to pursue here!

Geoffrey Miller4y2

Thanks for this reply; it all makes sense.

Regarding cross-cultural universals, I think there's some empirical research on cross-cultural universals in which kinds of violent or non-violent crime are considered worst, most harmful, and most deserving of punishment. I couldn't find a great reference for that in a cursory lit search, but there is related work on the evolutionary psychology of crime and criminal law that might be useful, e.g. work by Owen Jones: https://onlinelibrary.wiley.com/doi/abs/10.1002/9780470939376.ch34

Also David Buss (UT Austin) has written a lot about violent crime, esp. murder, e.g. https://labs.la.utexas.edu/buss/files/2015/09/Evolutionary-psychology-and-crime.pdf

johnjnay4y2

Thanks!

Cullen 🔸4y3

Hi John! You might be interested in my Law-Following AI Sequence, where I've explored very similar ideas: https://forum.effectivealtruism.org/s/3pyRzRQmcJNvHzf6J

I'm glad we've seemed to converge on similar ideas. I would love to chat sometime!

johnjnay4y1

That's awesome - thank you for sharing!

Would love to chat as well.