A summer 2022 survey of hundreds of AI researchers estimated an aggregate forecast time of 37 years for a 50% chance of high–level machine intelligence (“when unaided machines can accomplish every task better and more cheaply than human workers”).[1] Natural language processing (NLP) is a key domain of AI, so surveys of these researchers are of particular interest. A separate summer 2022 survey of hundreds of NLP researchers found that 73% “agree that labor automation from AI could plausibly lead to revolutionary societal change in this century, on at least the scale of the Industrial Revolution.”[2]
We already face significant challenges communicating our goals and values in a way that reliably directs AI behavior – even without additional technological advancements, which could compound the difficulty with more autonomous systems. Specifying the desirability (value) of an AI system taking a particular action in a particular state of the world is unwieldy beyond a very limited set of value-action-states. In fact, the purpose of machine learning is to train on a subset of world states and have the resulting agent generalize an ability to choose high value actions in new circumstances. But the program ascribing value to actions chosen during training is an inevitably incomplete encapsulation of the breadth and depth of human judgements, and the training process is a sparse exploration of states pertinent to all possible futures. Therefore, after training, AI is deployed with a coarse map of human preferred territory and will often choose actions unaligned with our preferred paths.
Law is a computational engine that converts human values into legible directives. Law Informs Code is the research agenda attempting to model that complex process, and embed it in AI. As an expression of how humans communicate their goals, and what society values, Law Informs Code.
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans, an article forthcoming in the Northwestern Journal of Technology and Intellectual Property, dives deeper into related work and this upcoming research agenda being pursued at The Stanford Center for Legal Informatics (a center operated by Stanford Law School and the Stanford Computer Science Department).
Similar to how parties to a legal contract cannot foresee every potential “if-then” contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed legislation will be applied, we cannot specify “if-then” rules that provably lead to good AI behavior. Fortunately, legal theory and practice have developed arrays of tools for goal specification and value alignment.
Take, for example, the distinction between legal rules and standards. Rules (e.g., “do not drive more than 60 miles per hour”) are more targeted directives than standards. They enable the rule-maker to have clarity over outcomes that will be realized in the states they specify. If rules are not written with enough potential states of the world in mind, they can lead to unanticipated undesirable outcomes (e.g., a driver following the rule above is too slow to bring their passenger to the hospital in time to save their life), but to enumerate all the potential scenarios is excessively costly outside of simple environments. Legal standards evolved to allow parties to contracts, judges, regulators, and citizens to develop shared understandings and adapt them to novel situations (i.e., to estimate value expectations about actions in unspecified states of the world). For the Law Informs Code use-case, standards do not require adjudication for implementation and resolution of meaning like they do for their legal creation. The law’s lengthy process of iteratively defining standards through judicial opinion and regulatory guidance can be the AI’s starting point, via machine learning on the application of the standards.
Toward that end, we are embarking on the project of engineering legal data into training signals to help AI learn standards, e.g., fiduciary duties. The practices of making, interpreting, and enforcing law have been battle tested through millions of legal contracts and actions that have been memorialized in digital format, providing large data sets of training examples and explanations, and millions of well-trained active lawyers from which to elicit machine learning model feedback to embed an evolving comprehension of law. For instance, court opinions on violations of investment adviser’s fiduciary obligations represent (machine) learning opportunities for curriculum on the fiduciary standard and its duties of care and loyalty.
Other data sources suggested for use toward AI alignment – surveys of human preferences, humans contracted for labeling data, or (most commonly) the implicit beliefs of the AI system designers – lack an authoritative source of synthesized preference aggregations. In contrast, legal rules, standards, policies, and reasoning approaches are not academic philosophical guidelines or ad hoc online survey results. They are legal standards with a verifiable resolution: ultimately obtained from a court opinion; but short of that, elicited from legal experts.
Building integrated legal informatics-AI systems that learn the theoretical constructs and practices of law, the language of alignment, such as contract drafting and interpretation, should help us more robustly specify inherently vague human goals for AI, increasing human-AI alignment. This may even improve general AI capabilities (or at least not cause net negative overall change), which, arguably, could be positive for AI safety because techniques that increase AI alignment at the expense of AI capabilities can lead to organizations eschewing alignment to gain additional capabilities as organizations race forward developing powerful AI.
Toward society-AI alignment, we are developing a framework for understanding law as the applied philosophy of multi-agent alignment, which harnesses public policy as an up-to-date knowledge base of democratically endorsed values. Although law is partly a reflection of historically contingent political power – and thus not a perfect aggregation of citizen preferences – if properly parsed, its distillation offers a legitimate computational comprehension of societal beliefs.
If others find this research agenda potentially interesting, please reach out to this project to explore how we could collaborate.
[1] 2022 Expert Survey on Progress in AI (August 23, 2022) https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/.
[2] Julian Michael et al., What Do NLP Researchers Believe? Results of the NLP Community Metasurvey (August 26, 2022) https://arxiv.org/abs/2208.12852 at 11.
Hi Charlie, thank you for your comment.
I cite many of Dylan's papers in the longer form version of this post: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4218031
I will check out Xuan's talk. Thanks for sharing that.
Instead of:
I could expand the statement to cover the larger project of what we are working on:
One of the primary goals of this research agenda is to teach AI to follow the spirit of the law in a human-recognizable way. This entails leveraging existing human capabilities for the "law-making" / "contract-drafting" part (how do we use the theory and practice of law about how to tell agents what to do?), and conducting research on building AI capabilities for the interpretation part (how do our machine learning processes use data and processes from the theory and practice of law about how agents interpret those directives / contracts?).
Reinforcement learning with human attorney feedback (there are more than 1.3 million lawyers in the US) via natural language interactions with AI models is potentially a powerful process to teach (through training, or fine-tuning, or extraction of templates for in-context prompting of large language models) statutory interpretation, argumentation, and case-based reasoning, which can then be applied more broadly for AI alignment. Models could be trained to assist human attorney evaluators, which theoretically, in partnership with the humans, could allow the combined human-AI evaluation team to have capabilities that surpass the legal understanding of the expert humans alone.
The Foundation Models in use today, e.g., GPT-3, have, effectively, conducted a form of behavioral cloning on a large portion of the Internet to leverage billions of human actions (through natural language expressions). It may be possible to, similarly, leverage billions of human legal data points to build Law Foundation Models through large-scale language model self-supervision on pre-processed legal text data.
Aspects of legal standards, and the "spirit" of the law, can be learned directly from legal data. We could also codify examples of human and corporate behavior exhibiting standards such as fiduciary duty into a structured format to evaluate the standards-understanding capabilities of AI models. The legal data available for AI systems to learn from, or be evaluated on, includes textual data from all types of law (constitutional, statutory, administrative, case, and contractual), legal training tools (e.g., bar exam outlines, casebooks, and software for teaching the casuistic approach), rule-based legal reasoning programs, and human-in-the-loop live feedback from law and policy human experts. The latter two could simulate state-action-reward spaces for AI fine-tuning or validation, and the former could be processed to do so.
Automated data curation processes to convert textual legal data into either state-action-reward tuples, or contextual constraints for shaping candidate action choices conditional on the state, is an important frontier in this research agenda (and promising for application to case law text data, contracts, and legal training materials). General AI capabilities research has recently found that learning from textual descriptions, rather than direct instruction, may allow models to learn reward functions that better generalize. Fortunately, much of law is embedded more in the form of descriptions and standards than it is in the form of direct instructions and specific rules. Descriptions of the application of standards provides a rich and large surface area to learn from.
Textual data can be curated and labeled for these purposes. We will aim for two outcomes with this labeling. First, data that can be used to evaluate how well AI models understand legal standards. Second, the possibility that the initial “gold-standard” human expert labeled data can be used to generate additional much larger sets of data through automated curation and processing of full corpora of legal text, and through model interaction with human feedback.
I think your statement:
is spot on. That is how I was thinking about it, but I should have made that more clear; perhaps I should work on a follow-up post at some point that explicitly explores the intersections of Law Informs Code with other strands of alignment research. Some of this is in the longer form version of this post, but with this inspiration from you, I may try to go further in that direction (although I am already beyond the length the Journal editors want!).