AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

AI safety is the study of ways to reduce risks posed by artificial intelligence.

Interventions that aim to reduce these risks can be split into:

  • Technical alignment  - research on how to align AI systems with human or moral goals
  • AI governance  - reducing AI risk by e.g. global coordination around regulating AI development or providing incentives for corporations to be more cautious in their AI research
  • AI forecasting  -  predicting AI capabilities ahead of time


Reading on why AI might be an existential risk

Hilton, Benjamin (2023) Preventing an AI-related catastrophe, 80000 Hours, March 2023

Cotra, Ajeya (2022) Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover Effective Altruism Forum, July 18

Carlsmith, Joseph (2022) Is Power-Seeking AI an Existential Risk? Arxiv, 16 June

Yudkowsky, Eliezer (2022) AGI Ruin: A List of Lethalities LessWrong, June 5

Ngo et al (2023) The alignment problem from a deep learning perspective Arxiv, February 23


Arguments against AI safety

AI safety and AI risk is sometimes referred to as a Pascal's Mugging [1],  implying that the risks are tiny and that for any stated level of ignorable risk the the payoffs could be exaggerated to force it to still be a top priority. A response to this is that in a survey of  700 ML researchers, the median answer to the "the probability that the long-run effect of advanced AI on humanity will be “extremely bad (e.g., human extinction)” was 5% with, with 48% of respondents giving 10% or higher[2]. These probabilites are too high (by at least 5 orders of magnitude) to be consider Pascalian. 


Further reading on arguments against AI Safety

Grace, Katja (2022) Counterarguments to the basic AI x-risk case EA Forum, October 14

Garfinkel, Ben (2020) Scrutinising classic AI risk arguments 80000 Hours Podcast, July 9


AI safety as a career

80,000 Hours' medium-depth investigation rates technical AI safety research a "priority path"—among the most promising career opportunities the organization has identified so far.[3][4] Richard Ngo and Holden Karnofsky also have advice for those interested in working on AI Safety[5][6]



(Read more)

Posts tagged AI safety