An intervention to shape policy dialogue, communication, and AI research norms for AI safety


9


Lee_Sharkey

A PDF version of this article can be found here.1

Abstract

Discourse on AI safety suffers from heated disagreement between those sceptical and those concerned about existential risk from AI. Framing discussion using strategic choice of language is a subtle but potentially powerful method to shape the direction of AI policy and AI research communities. It is argued here that the AI safety community is committing the dual error of frequently using language that hinders constructive dialogue and missing the opportunity to frame discussion using language that assists their aims. It is suggested that the community amend usage of the term ‘AI risk’ and employ more widely the ‘AI accidents’ frame in order to improve external communication, AI policy discussion, and AI research norms. 

Contents

  • Abstract
  • The state of public discourse on AI safety
  • Why to care about terminology
  • Introducing ‘AI accidents’ and why use of ‘AI risk’ can be inaccurate
  • Why use of ‘AI risk’ is problematic and why use of ‘AI accidents’ is helpful
    • From the perspective of sceptics
    • From the perspective of the newcomer to the subject
    • Shaping policy discussion and research norms
  • Seizing the opportunity
  • Footnotes
  • Works Cited

The state of public discourse on AI safety

Contemporary public discourse on AI safety is often tense. Two technology billionaires have engaged in a regrettable public spat over existential risks from artificial intelligence (Samuelson, 2017); high profile AI experts have volleyed loud opinion pieces  making contradictory calls for concern or for calm (Dafoe & Russell, 2016) (Etzioni, 2016); both factions (the group sceptical of existential risk posed by AI and the group concerned about the risk) grow larger as interest in AI increases, and more voices join the debate. The divide shows little sign of narrowing. If surviving machine superintelligence will require strong coordination or even consensus, humanity’s prospects currently look poor.

In this polarised debate, both factions, especially the AI safety community, should look to ways to facilitate constructive policy dialogue and shape safety-conscious AI research norms. Though it insufficient on its own, framing discussion using strategic choice of language is a subtle but potentially powerful method to help accomplish these goals (Baum, 2016)

Why to care about terminology

Language choice frames policy debate, assigns the focus of discussion, and thereby influences outcomes. It decides whether the conversation is “Gun control” (liberty reducing) or “Gun violence prevention” (security promoting); “Red tape” or “Safety regulations”; “Military spending” or “Defence spending”. If terminology does not serve discussion well, it should be promptly rectified while the language, the concepts it signifies, and the actions, plans, and institutions guided by those concepts are still relatively plastic. With that in mind, the below advocates that the AI safety community revise its use of the term ‘AI risk’ and employ the ‘AI accidents’ frame more widely.

It will help first to introduce what is argued to be the substantially better term, ‘AI accidents’. The inaccuracy of current language will then be explored, followed by discussion of the problems caused by this inaccuracy and the important opportunities missed by only rarely using the ‘AI accidents’ frame.

Introducing ‘AI accidents’ and why use of ‘AI risk’ can be inaccurate

An AI accident is “unintended and harmful behavior that may emerge from poor design of real-world AI systems” (Amodei, et al., 2016). The earliest description of misaligned AI as an ‘accident’ appears to be in Marvin Minsky’s 1984 afterword to Vernor Vinge's novel, True Names:

“The first risk is that it is always dangerous to try to relieve ourselves of the responsibility of understanding exactly how our wishes will be realized. Whenever we leave the choice of means to any servants we may choose then the greater the range of possible methods we leave to those servants, the more we expose ourselves to accidents and incidents. When we delegate those responsibilities, then we may not realize, before it is too late to turn back, that our goals have been misinterpreted, perhaps even maliciously. We see this in such classic tales of fate as Faust, the Sorcerer's Apprentice, or the Monkey's Paw by W.W. Jacobs.” (Minsky, 1984)

The term ‘AI accident’ seems to emerge publicly later, with Huw Price’s 2012 quotation of Jaan Tallin:

“He (Tallinn) said that in his pessimistic moments he felt he was more likely to die from an AI accident than from cancer or heart disease,” (University of Cambridge, 2012).

There is some evidence that the term was used in the AI safety community prior to this (LessWrong commenter "Snarles", 2010), but other written evidence proved elusive through online search.

The first definition of ‘accidents in machine learning systems’ appears to be provided in the well-known paper Concrete Problems in AI Safety (Amodei, et al., 2016). This is the definition for ‘AI accident’ given above and used here throughout. 

Some examples of AI accidents may be illustrative: A self-driving car crash where the algorithm was at fault would be an AI accident; a housekeeping robot cooking the cat for dinner because it was commanded to “Cook something for dinner” would be an AI accident; using algorithms in the justice system that have inadvertently been trained to be racist would be an AI accident; the 2010 Flash Crash or similar future incidents would be an AI accident; deployment of a paperclip maximiser would be an AI accident. There is no presupposed upper bound for the size of AI accidents. AI safety seeks to reduce the risk of AI accidents.

 

AI accidents 

Figure: AI accidents. The relative placement of instances of AI accidents may be subject to debate; the figure is intended for illustration only. 

At significant risk of pedantry, close examination of terminology is worthwhile because, despite the appearance of hair-splitting, it yields what will emerge to be useful distinctions.

‘AI risk’ has at least three uses.

  1. ‘An AI risk’ - The ‘count noun’ sense, meaning a member of the set of all risks from AI, ‘AI risks’, which can be used interchangeably with ‘dangers from AI’, ‘potential harms of AI’, ‘threats from AI’, etc. Members of the set of AI risks include:
    1. AI accidents
    2. Deliberate misuse of AI systems (e.g. autonomy in weapons systems)
    3. Risks to society deriving from intended use of AI systems, which may result from coordination failures in the deployment of AI (e.g. mass unemployment resulting from automation).
  2. ‘AI risk’ – The ‘mass noun’ sense, meaning some amount of risk from AI. In practice, this means to discuss at least one member of the above set of risks, but the source of risk is not implied.  It can be used interchangeably with ‘danger from AI’, ‘potential harm of AI’, ‘AI threat’, etc.
  3. The third, inaccurate sense is employing ‘AI risk’ to mean specifically ‘Risk of a catastrophic AI accident’.

Observe that in the third usage, the label used for the second (mass noun) sense is used to refer to an instance of the first (count noun) sense. It would be easy to overlook this small discrepancy of ‘crossed labels’. Nevertheless, below it is argued that using the third sense causes problems and missed opportunities.

Before exploring why use of the third sense might cause problems, note that it has been employed frequently by many of the major institutions in the AI safety community (although the accurate senses are used even more commonly)2:

  • Cambridge Centre for the Study of Existential Risk (CSER) examples 1, 23;
  • Future of Humanity Institute (FHI) examples 1, 2;
  • Future of Life Institute (FLI) example 1;
  • Foundational Research Institute example 1;
  • Global Challenges Report 2017 example 1;
  • Machine Intelligence Research Institute (MIRI) examples 1, 2;
  • Open Philanthropy example 1;
  • Wikipedia entries: 1, 2, 3;
  • And others in popular blogs, community blogs, or media: Slatestar codex, Import AI, LessWrong, Overcoming Bias, a high profile AI expert in the New Yorker, WEF.

Why use of ‘AI risk’ is problematic and why use of ‘AI accidents’ is helpful

Use of the third sense could be defended on several grounds. It is conveniently short. In a way, it is not even especially inaccurate; if, like many in the AI safety community, one believes that the vast majority of AI risk comes from catastrophic AI accidents, one could be excused for equivocating the labels.

Problems arise in the combination of the generality of the mass noun sense and the inaccuracy of the third use. An additional issue is the missed opportunity of not using ‘AI accidents’.

Generality: A key issue is that general terms like ‘AI risk’, ‘AI threat’, etc., when used in their mass noun senses, conjure the most available instances of ‘AI risk’, thus summoning in many listeners images of existential catastrophes induced by artificial superintelligence – this is perhaps one reason why the AI safety community came to employ the third, inaccurate use of ‘AI risk’. The generality of the term permits the psychological availability of existential risks from superintelligent AI to overshadow less sensational risks and accidents. A member of the AI safety community will not necessarily find this problematic; a catastrophic AI accident is indeed their main concern, so they might understandably not care much if general terms like ‘AI risk’, ‘AI threat’, etc. conjure their highest priority risk specifically. There are two groups for which this usage may cause problems: (1) sceptics of risks from catastrophic AI accidents and (2) newcomers to the subject who have not yet given issues surrounding AI much serious consideration. Aside from causing issues, not using a strategically selected frame misses opportunities to influence how groups such as policymakers and AI researchers think about existential risks from AI; using the AI accident frame should prove beneficial.

From the perspective of sceptics

Inaccuracy: Most informed sceptics of catastrophic AI accidents are plausibly still somewhat concerned about small AI accidents and other risks, but they may find it difficult to agree that they are concerned with what the AI safety community might, by the third sense, refer to as ‘AI risk’. The disagreement with ‘AI risk’ (third sense) does not reflect the fact that the two groups are in broad agreement on most risks, disagreeing only on risk from a part of the AI accidents scale. The crossed labels creates the illusion of discord regarding mitigation of ‘AI risk’. The confusion helps drive the chorus of retorts that safety-proponents are wrong about ‘AI risk’ and that AI accidents are the ‘wrong risk’ to focus on (Sinders, 2017) (Nogrady, 2016) (Madrigal, 2015) (Etzioni, 2016), and presents AI safety work, which in fact mitigates risk of AI accidents of any size, as the domain of the superintelligence-concerned uniquely.

With the ‘AI accidents’ frame, otherwise-opposing factions can claim to be concerned with different but overlapping areas on the scale of AI accidents; the difference between those concerned about catastrophic AI accidents and those who are not is simply that the former camp sees reason to be cautious about the prospect of AI systems of arbitrary levels of capability or misalignment, while the latter chooses to discount perceived risk at higher levels of these scales. To observers of the debate, this partial unity is much easier to see within the AI accident frame than when the debate concerned ‘AI risk’ or ‘existential risk from AI’. There does not need to be agreement about the probability of accidents on the upper-end of the scale to have consensus on the need to prevent smaller ones, thereby facilitating agreement to prioritize research that prevents AI accidents in general.

Both factions now working within the same conceptual category, the result is that the primary disagreement between groups becomes only the scope of their concerns rather than on the existence of a principal concept. Using the ‘AI accidents’ frame helps find common ground where ‘AI risk’ struggles.

From the perspective of the newcomer to the subject

Missed opportunity: We should conservatively assume that a newcomer to the subject holds priors that are sceptical of existential risks from artificial superintelligence. For these individuals, current language misses an opportunity for sound communication. What ‘AI risk’ and even ‘existential risk from artificial superintelligence’ omits to communicate is the fundamental nature of the risk: that the true risk is of the simple accident of deploying a singularly capable machine with a poorly designed objective function – not something malicious or fantastical. This central point is not communicated by the label, giving the priors of the newcomer free reign over the interpretation, facilitating the ‘dismissal by science fiction’.

Using ‘AI accidents’, it is directly implied that the risk involves no malicious intent. Moreover, one can point to existing examples of AI accidents, such as racist algorithms or the 2010 Flash Crash. AI accidents slightly higher on the capability scale are believable accidents: a housekeeping robot cooking the cat for dinner is an accident well within reach of imagination; likewise the AI that fosters war to maximise its profit objective. Using ‘AI accidents’ thus creates a continuous scale populated by existing examples and facilitates arrival at the comprehension of misaligned superintelligence by simple, believable steps of induction. The framing as an accident on the upper, yet-to-be-realised part of a scale arguably makes the idea feel more tangible than ‘existential risk’.

Shaping policy discussion and research norms

Missed opportunity: This reframing should confer some immediate practical benefits. Since most policy-making organisations are likely to be composed of a mix of sceptics, the concerned, and newcomers to the subject, it may be socially difficult to have frank policy discussion on potential risks from artificial superintelligence; an ill-received suggestion of existential risk from AI may be dismissed as science fiction or ridiculed. If it exists, this difficulty would be especially marked in organizations with pronounced hierarchy (a common attribute of e.g. governments), where there is a greater perceived social cost to making poorly received suggestions. In such organizations, concerns of existential risk from artificial superintelligence may thus be omitted from policy discussion or relegated to a weaker mention than if framed in terms of AI accidents involving arbitrary levels of intelligence. The ‘AI accidents’ frame automatically introduces large scale AI accidents, making it an opt-out discussion item, rather than opt in.

 

Missed opportunity: Stuart Russell advocates a culturally-oriented intervention to improve AI safety:

“I think the right approach is to build the issue directly into how practitioners define what they do. No one in civil engineering talks about “building bridges that don't fall down.” They just call it “building bridges.” Essentially all fusion researchers work on containment as a matter of course; uncontained fusion reactions just aren't useful. Right now we have to say “AI that is probably beneficial,” but eventually that will just be called “AI.” [We must] redirect the field away from its current goal of building pure intelligence for its own sake, regardless of the associated objectives and their consequences.” (Bohannon, 2017).

How to realise Russell’s edict? Seth Baum discusses framing as an ‘intrinsic measure’ to influence social norms in AI research to pursue beneficial designs and highlights the importance not only of what is said, but how something is said (Baum, 2016). For engineers, it would be strangely vague to talk about ‘car risk’, ‘bridge risk’ or other broad terms. Instead, they talk about reducing the risk of car accidents or bridge collapses – referring explicitly to the event that they are responsible for mitigating and precluding competing ideas, e.g. the risks from mass use of cars on air pollution, or from disruption to a horse-driven economy. The same should be true for AI. The ‘AI accidents’ frame moves thinking away from abstract argument and analogy and brings the salient concepts closer to the material realm. Giving AI researchers a clear, available, and tangible idea of the class of events they should design to avoid will be important to engender safe AI research norms.

Seizing the opportunity

The count noun and mass noun senses of ‘AI risk’ and ‘existential risk from AI’ etc. still have their place. But opportunities should be sought for the ‘AI accidents’ frame where it is appropriate. Without being prescriptive (and cognisant that not all catastrophic AI risks are of catastrophic AI accidents), instead of ‘reducing AI risk’ or ‘reducing existential risk from AI’, the policy, strategy, and technical AI safety community would claim to work on reducing the risk of AI accidents, at least where they are not also working on other risks.

Shifting established linguistic habits requires effort. The AI safety community is relatively small and cohesive, so establishing this subtle but potentially powerful change in frame at a community level could be an achievable aim. By driving a shift in terminology, a goal of wider adoption by other groups such as policy makers, journalists, and AI researchers is within reach.

 

Footnotes

[1] For comment and review, I am grateful to Nick Robinson, Hannah Todd, and Jakob Graabak.

[2] In some links, other AI risks are discussed elsewhere in the texts, but nevertheless the sense in which ‘AI risk’ was used was actually the third sense. The list is not exhaustive. 

[3] By a co-founder - not an institutional use.

 

Works Cited

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016, July 25). Concrete Problems in AI Safety. Retrieved from arXiv:1606.06565v2 [cs.AI]: https://arxiv.org/abs/1606.06565

Baum, S. (2016). On the promotion of safe and socially beneficial artificial intelligence. AI & Society, 1-9. Retrieved from https://link.springer.com/article/10.1007/s00146-016-0677-0

Bohannon, J. (2017, July 17). Fears of an AI pioneer. Science, Vol. 349, Issue 6245, pp. 252. doi:DOI: 10.1126/science.349.6245.252

Bostrom, N., Dafoe, A., & Flynn, C. (2016). Policy Desiderata in the Development of Machine Superintelligence.

Dafoe, A., & Russell, S. (2016, November 2). Yes, We Are Worried About the Existential Risk of Artificial Intelligence. Retrieved from MIT Technology Review: https://www.technologyreview.com/s/602776/yes-we-are-worried-about-the-existential-risk-of-artificial-intelligence/

Etzioni, O. (2016, September 20). No, the Experts Don’t Think Superintelligent AI is a Threat to Humanity. (MIT Technology Review) Retrieved from https://www.technologyreview.com/s/602410/no-the-experts-dont-think-superintelligent-ai-is-a-threat-to-humanity/

LessWrong commenter "Snarles". (2010, May 19). Be a Visiting Fellow at the Singularity Institute. Retrieved from LessWrong: http://lesswrong.com/lw/29c/be_a_visiting_fellow_at_the_singularity_institute/

Madrigal, A. (2015, February 27). The case against killer robots, from a guy actually working on artificial intelligence. Retrieved from Splinternews: http://splinternews.com/the-case-against-killer-robots-from-a-guy-actually-wor-1793845735

Minsky, M. (1984). Afterword to Vernor Vinge's novel, "True Names". Retrieved from http://web.media.mit.edu/~minsky/papers/TrueNames.Afterword.html

Nogrady, B. (2016, November 10). The Real Risks of Artificial Intelligence. Retrieved from BBC: http://www.bbc.com/future/story/20161110-the-real-risks-of-artificial-intelligence

Samuelson, K. (2017, July 25). Elon Musk Just Dissed Mark Zuckerberg’s Understanding of Artificial Intelligence. (Fortune) Retrieved from http://fortune.com/2017/07/25/elon-musk-just-dissed-mark-zuckerbergs-understanding-of-artificial-intelligence/

Sinders, C. (2017, August 25). Dear Elon – Forget Killer Robots. Here’s What You Should Really Worry About. Retrieved from Fastcodedesign: https://www.fastcodesign.com/90137818/dear-elon-forget-killer-robots-heres-what-you-should-really-worry-about

University of Cambridge. (2012, November 25). Humanity's last invention and our uncertain future. Retrieved from http://www.cam.ac.uk/research/news/humanitys-last-invention-and-our-uncertain-future