AI Research Considerations for Human Existential Safety (ARCHES)

by critch3 min read22nd Sep 2021No comments


Classic repostAI alignment

Originally published in May 2020, cross-posted by Aaron Gertler on 9/21/21. 


Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity’s long-term prospects for survival as a species. In negative terms, we ask what existential risks humanity might face from AI development in the next century, and by what principles contemporary technical research might be directed to address those risks. 

A key property of hypothetical AI technologies is introduced, called prepotence, which is useful for delineating a variety of potential existential risks from artificial intelligence, even as AI paradigms might shift. A set of twenty-nine contemporary research directions are then examined for their potential benefit to existential safety. Each research direction is explained with a scenario-driven motivation, and examples of existing work from which to build. The research directions present their own risks and benefits to society that could occur at various scales of impact, and in particular are not guaranteed to benefit existential safety if major developments in them are deployed without adequate forethought and oversight. As such, each direction is accompanied by a consideration of potentially negative side effects. 

Taken more broadly, the twenty-nine explanations of the research directions also illustrate a highly rudimentary methodology for discussing and assessing potential risks and benefits of research directions, in terms of their impact on global catastrophic risks. This impact assessment methodology is very far from maturity, but seems valuable to highlight and improve upon as AI capabilities expand.


At the time of writing, the prospect of artificial intelligence (AI) posing an existential risk to humanity is not a topic explicitly discussed at length in any technical research agenda known to the present authors. Given that existential risk from artificial intelligence seems physically possible, and potentially very important, there are number of historical factors that might have led to the current paucity of technical-level writing about it: 

  1. Existential safety involves many present and future stakeholders (Bostrom, 2013), and is therefore a difficult objective for any single researcher to pursue.
  2. The field of computer science, with AI and machine learning as subfields, has not had a culture of evaluating, in written publications, the potential negative impacts of new technologies (Hecht et al., 2018).
  3. Most work potentially relevant to existential safety is also relevant to smaller-scale safety and ethics problems (Amodei et al., 2016; Cave and ÓhÉigeartaigh, 2019), and is therefore more likely to be explained with reference to those applications for the sake of concreteness.
  4. The idea of existential risk from artificial intelligence was first popularized as a science-fiction trope rather than a topic of serious inquiry (Rees, 2013; Bohannon, 2015), and recent media reports have leaned heavily on these sensationalist fictional depictions, a deterrent for some academics.

We hope to address (1) not by successfully unilaterally forecasting the future of technology as it pertains to existential safety, but by inviting others to join in the discussion. Counter to (2), we are upfront in our examination of risks. Point (3) is a feature, not a bug: many principles relevant to existential safety have concrete, present-day analogues in safety and ethics with potential to yield fruitful collaborations. Finally, (4) is best treated by simply moving past such shallow examinations of the future, toward more deliberate and analytical methods. 

Our primary intended audience is that of AI researchers (of all levels) with some preexisting level of intellectual or practical interest in existential safety, who wish to begin thinking about some of the technical challenges it might raise. For researchers already intimately familiar with the large volume of contemporary thinking on existential risk from artificial intelligence (much of it still informally written, non-technical, or not explicitly framed in terms of existential risk), we hope that some use may be found in our categorization of problem areas and the research directions themselves. 

Our primary goal is not to make the case for existential risk from artificial intelligence as a likely eventuality, or existential safety as an overriding ethical priority, nor do we argue for any particular prioritization among the research directions presented here. Rather, our goal is to illustrate how researchers already concerned about existential safety might begin thinking about the topic from a number of different technical perspectives. In doing this, we also neglect many non-existential safety and social issues surrounding AI systems. The absence of such discussions in this document is in no way intended as an appraisal of their importance, but simply a result of our effort to keep this report relatively focused in its objective, yet varied in its technical perspective.

Read the rest of the paper


New Comment