A summer 2022 survey of hundreds of AI researchers estimated an aggregate forecast time of 37 years for a 50% chance of high–level machine intelligence (“when unaided machines can accomplish every task better and more cheaply than human workers”).[1] Natural language processing (NLP) is a key domain of AI, so surveys of these researchers are of particular interest. A separate summer 2022 survey of hundreds of NLP researchers found that 73% “agree that labor automation from AI could plausibly lead to revolutionary societal change in this century, on at least the scale of the Industrial Revolution.”[2]
We already face significant challenges communicating our goals and values in a way that reliably directs AI behavior – even without additional technological advancements, which could compound the difficulty with more autonomous systems. Specifying the desirability (value) of an AI system taking a particular action in a particular state of the world is unwieldy beyond a very limited set of value-action-states. In fact, the purpose of machine learning is to train on a subset of world states and have the resulting agent generalize an ability to choose high value actions in new circumstances. But the program ascribing value to actions chosen during training is an inevitably incomplete encapsulation of the breadth and depth of human judgements, and the training process is a sparse exploration of states pertinent to all possible futures. Therefore, after training, AI is deployed with a coarse map of human preferred territory and will often choose actions unaligned with our preferred paths.
Law is a computational engine that converts human values into legible directives. Law Informs Code is the research agenda attempting to model that complex process, and embed it in AI. As an expression of how humans communicate their goals, and what society values, Law Informs Code.
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans, an article forthcoming in the Northwestern Journal of Technology and Intellectual Property, dives deeper into related work and this upcoming research agenda being pursued at The Stanford Center for Legal Informatics (a center operated by Stanford Law School and the Stanford Computer Science Department).
Similar to how parties to a legal contract cannot foresee every potential “if-then” contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed legislation will be applied, we cannot specify “if-then” rules that provably lead to good AI behavior. Fortunately, legal theory and practice have developed arrays of tools for goal specification and value alignment.
Take, for example, the distinction between legal rules and standards. Rules (e.g., “do not drive more than 60 miles per hour”) are more targeted directives than standards. They enable the rule-maker to have clarity over outcomes that will be realized in the states they specify. If rules are not written with enough potential states of the world in mind, they can lead to unanticipated undesirable outcomes (e.g., a driver following the rule above is too slow to bring their passenger to the hospital in time to save their life), but to enumerate all the potential scenarios is excessively costly outside of simple environments. Legal standards evolved to allow parties to contracts, judges, regulators, and citizens to develop shared understandings and adapt them to novel situations (i.e., to estimate value expectations about actions in unspecified states of the world). For the Law Informs Code use-case, standards do not require adjudication for implementation and resolution of meaning like they do for their legal creation. The law’s lengthy process of iteratively defining standards through judicial opinion and regulatory guidance can be the AI’s starting point, via machine learning on the application of the standards.
Toward that end, we are embarking on the project of engineering legal data into training signals to help AI learn standards, e.g., fiduciary duties. The practices of making, interpreting, and enforcing law have been battle tested through millions of legal contracts and actions that have been memorialized in digital format, providing large data sets of training examples and explanations, and millions of well-trained active lawyers from which to elicit machine learning model feedback to embed an evolving comprehension of law. For instance, court opinions on violations of investment adviser’s fiduciary obligations represent (machine) learning opportunities for curriculum on the fiduciary standard and its duties of care and loyalty.
Other data sources suggested for use toward AI alignment – surveys of human preferences, humans contracted for labeling data, or (most commonly) the implicit beliefs of the AI system designers – lack an authoritative source of synthesized preference aggregations. In contrast, legal rules, standards, policies, and reasoning approaches are not academic philosophical guidelines or ad hoc online survey results. They are legal standards with a verifiable resolution: ultimately obtained from a court opinion; but short of that, elicited from legal experts.
Building integrated legal informatics-AI systems that learn the theoretical constructs and practices of law, the language of alignment, such as contract drafting and interpretation, should help us more robustly specify inherently vague human goals for AI, increasing human-AI alignment. This may even improve general AI capabilities (or at least not cause net negative overall change), which, arguably, could be positive for AI safety because techniques that increase AI alignment at the expense of AI capabilities can lead to organizations eschewing alignment to gain additional capabilities as organizations race forward developing powerful AI.
Toward society-AI alignment, we are developing a framework for understanding law as the applied philosophy of multi-agent alignment, which harnesses public policy as an up-to-date knowledge base of democratically endorsed values. Although law is partly a reflection of historically contingent political power – and thus not a perfect aggregation of citizen preferences – if properly parsed, its distillation offers a legitimate computational comprehension of societal beliefs.
If others find this research agenda potentially interesting, please reach out to this project to explore how we could collaborate.
[1] 2022 Expert Survey on Progress in AI (August 23, 2022) https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/.
[2] Julian Michael et al., What Do NLP Researchers Believe? Results of the NLP Community Metasurvey (August 26, 2022) https://arxiv.org/abs/2208.12852 at 11.
Hi Geoffrey, thank you for this feedback.
On your background knowledge comment, I agree that is an important open question (for this proposal, and other alignment techniques).
Related to that, I have been thinking through the systematic selection of which data sets are best suited for self-supervised pre-training of large language models - an active area of research in AI capabilities and Foundation Models more generally, which may be even more important for this application to legal data. For self-supervision on legal data, we could use (at least) two filters to guide data selection and data structuring processes.
First, is the goal of training on a data point to embed world knowledge into AI, or legal task knowledge? Learning that humans in the U.S. drive on the right side of the road is learning world knowledge; whereas, learning how to map a statute about driving rules to a new fact pattern in the real world is learning how to conduct a legal reasoning task. World knowledge can be learned from legal and non-legal corpora. Legal task knowledge can primarily be learned from legal data.
Second, is the approximate nature of the uncertainty that an AI could theoretically resolve by training on a data point epistemic or aleatory ? If the nature of the uncertainty is epistemic – e.g., whether citizens prefer climate change risk reduction over endangered species protection – then it is fruitful to apply as much data as we can to learning functions to closer approximate the underlying fact about the world or about law. If the nature of the uncertainty is more of an aleatory flavor – e.g., the middle name of the defendant in a case – then there is enough inherent randomness that we would seek to avoid attempting to learn anything about that fact or data point.
There are many other aspects of self-supervised pre-training data curation that we will need to explore, but figured I'd share a couple that are top of mind in the context of your world knowledge comment.
Public law informs AI more through negative than positive directives; and therefore it’s unclear the extent to which policy – outside of the human-AI “contract and standards” type of alignment we are working on – can inform which goals AI should proactively pursue to improve the world on society’s behalf. I agree with your comment that, "law tends to track situations where humans have conflicts of interest with each other, and it might not track universal values that are so obvious to everyone that conflicts of interest hardly ever arise." This is a great illustration of the need to complement the Law Informs Code approach with other approaches to specifying human values. But I believe there are challenges with using the "AI Ethics" approach as the core framework, see section IV. PUBLIC LAW: SOCIETY-AI ALIGNMENT of the longer form version of this post, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4218031 . I think a blend of the frameworks could most fruitful.
Finally, it would be very interesting to conduct research on the possibility of "cross-cultural universals in legal systems that exemplify some common ground for human values," and which domains of law have the most cultural overlap. There are many exciting threads to pursue here!