Hide table of contents

The Intro to ML Safety course covers foundational techniques and concepts in ML safety for those interested in pursuing research careers in AI safety, with a focus on empirical research. We think it's a good fit for people with ML backgrounds who are looking to get into empirical research careers focused on AI safety.

Intro to ML Safety is run by the Center for AI Safety and designed and taught by Dan Hendrycks, a UC Berkeley ML PhD and director of the Center for AI Safety.

Apply to be a participant by January 29th, 2023

Website: mlsafety.org/intro-to-ml-safety

About the Course

Intro to ML Safety is an 8-week virtual course that aims to introduce students with a deep learning background to the latest empirical AI Safety research. The program introduces foundational ML safety concepts such as robustness, alignment, monitoring, and systemic safety.

The course takes 5 hours a week, and consists of a mixture of:

  • Assigned readings and lecture videos (publicly available at course.mlsafety.org)
  • Homework and coding assignments
  • A facilitated discussion session with a TA and weekly optional office hours

The course will be virtual by default, though in-person sections may be offered at some universities.

The Intro to ML Safety curriculum

The course covers:

  1. Hazard Analysis: an introduction to concepts from the field of hazard analysis and how they can be applied to ML systems; and an overview of standard models for modelling risks and accidents.
  2. Robustness: Robustness focuses on ensuring models behave acceptably when exposed to abnormal, unforeseen, unusual, highly impactful, or adversarial events. We cover techniques for generating adversarial examples and making models robust to adversarial examples; benchmarks in measuring robustness to distribution shift; and approaches to improving robustness via data augmentation, architectural choices, and pretraining techniques.
  3. Monitoring: We cover techniques to identify malicious use, hidden model functionality and data poisoning, and emergent behaviour in models; metrics for OOD detection; confidence calibration for deep neural networks; and transparency tools for neural nets.
  4. Alignment: We define alignment as reducing inherent model hazards. We cover measuring honesty in models; power aversion; an introduction to ethics; and imposing ethical constraints in ML systems.
  5. Systemic Safety: In addition to directly reducing hazards from AI systems, there are several ways that AI can be used to make the world better equipped to handle the development of AI by improving sociotechnical factors like decision making ability and safety culture. We cover using ML for improved epistemics; ML for cyberdefense;  and ways in which AI systems could be made to better cooperate.
  6. Additional X-Risk Discussion: The last section of the course explores the broader importance of the concepts covered: namely, existential risk and possible existential hazards. We cover specific ways in which AI could potentially cause an existential catastrophe, such as weaponization, proxy gaming, treacherous turn, deceptive alignment, value lock-in, and persuasive AI. We introduce some considerations for influencing future AI systems; and introduce research on selection pressures. 

How is this program different from AGISF?

If you are interested in an empirical research career in AI safety, then  you are in the target audience for this course. The ML Safety course does not overlap much with AGISF, so we expect that participants who both have and have not previously done AGISF to get a lot out of Intro to ML Safety.

Intro to ML Safety is focused on ML empirical research rather than conceptual work. Participants are required to watch recorded lectures and complete homework assignments that test their understanding of the technical material. 

You can read about more the ML safety approach in Open Problems in AI X-risk.

Time Commitment

The program will last 8 weeks, beginning on February 20th and ending on April 21th. Participants are expected to commit at least 5 hours per week. This includes ~1 hour of recorded lectures, ~1-2 hours of readings, ~1-2 hours of written assignments, and 1 hour of discussion. 

We understand that 5 hours is a large time commitment, so to make our program more inclusive and remove any financial barriers, we will provide a $500 stipend upon completion of the course.


The prerequisites for the course are:

  • Familiarity with deep learning (e.g. a college course)
  • Linear algebra or introductory statistics (e.g. AP Statistics)
  • Multivariate differential calculus

Apply to be a participant by January 29th, 2023.

Website: mlsafety.org/intro-to-ml-safety





More posts like this

Sorted by Click to highlight new comments since:

I wanted to mention that I went through the first week's lectures and exercises and I was really impressed at the quality!


Is there a list anywhere of ways to upskill in ML for AI Safety Engineering, such as MLAB too?

Do you have an opinion on when someone should pick your course over some other course?

(I'm asking because I often hear from people trying to upskill in ML and I'm not sure what to tell them, I hope someone can comment here and help)


I'll refer someone to this post right now

I originally helped design the course and I ran the first iteration of a similar program. I'm not really involved with the course now but I think I'm qualified to answer. However, I did AGI safety fundamentals a long time ago and haven't done MLAB, so my knowledge of those could be wrong (though I don't think so).

In comparison to AGI Safety Fundamentals, this course a lot more technical and less conceptual. AGISF is not going to include the latest in machine learning on a technical level, and this course doesn't include as many conceptual readings.

In comparison with MLAB, this course is more focused on reading papers and understanding research, and less focused on teaching particular frameworks or engineering skills.

There's a bit of overlap between all, but it's pretty minimal. I think anyone who has done any of these programs would learn something from doing the others. It mostly depends on what people want to take out of the course: knowledge of a lot of different conceptual research directions (AGISF), skills in engineering with Pytorch (MLAB), or knowledge of the frontier of ML safety research and paper reading skills (Intro to ML Safety).

More from james
Curated and popular this week
Relevant opportunities