Hide table of contents

The Center for AI Safety is running another iteration of Intro to ML Safety this Spring for people who want to learn about empirical AI safety research topics.

Apply to be a participant by January 29th, 2023.

Apply to be a facilitator by December 30th.

Website: mlsafety.org/intro-to-ml-safety

About the Course

Introduction to ML Safety is an 8-week course that aims to introduce students with a deep learning background to empirical AI Safety research. The program is designed and taught by Dan Hendrycks, a UC Berkeley ML PhD and director of the Center for AI Safety, and provides an introduction to robustness, alignment, monitoring, systemic safety, and conceptual foundations for existential risk.

Each week, participants will be assigned readings, lecture videos, and required homework assignments. The materials are publicly available at course.mlsafety.org.

There are two tracks:

  • The introductory track: for people who are new to AI Safety. This track aims to familiarize students with the AI X-risk discussion alongside empirical research directions.
  • The advanced track: for people who already have a conceptual understanding of AI X-risk and want to learn more about existing empirical safety research so they can start contributing.

The course will be virtual by default, though in-person sections may be offered at some universities.

How is this program different from AGISF?

Intro to ML Safety is generally more focused on empirical topics rather than conceptual work. Participants are required to watch recorded lectures and complete homework assignments that test their understanding of the technical material. If you’ve already taken AGISF and are interested in empirical research, then you are the target audience for the advanced track.

Intro to ML Safety also emphasizes different ideas and research directions than AGISF does. Examples include:

  • Detecting trojans: this is a current security issue but also a potential microcosm for detecting deception and testing monitoring tools.
  • Adversarial robustness: it is helpful for reward models to be adversary robust. Otherwise, the models they are used to train can ‘overoptimize’ them and exploit their deficiencies instead of performing as intended. This applies whenever an AI system is used to evaluate another AI system. For example, an ELK reporter must also be highly robust if its output is used as a training signal.
  • Power averseness: Arguments for taking AI seriously as an existential risk often focus on power-seeking behavior. Can we train language models to avoid power-seeking actions in text-based games?

You can read about more examples in Open Problems in AI X-risk.

Time Commitment

The program will last 8 weeks, beginning on February 20th and ending on April 14th. Participants are expected to commit at least 5 hours per week. This includes ~1 hour of recorded lectures (which will take more than one hour to digest), ~1-2 hours of readings, ~1-2 hours of written assignments, and 1 hour of discussion. 

We understand that 5 hours is a large time commitment, so to make our program more inclusive and remove any financial barriers, we will provide a $500 stipend upon completion of the course. (EDT: reduced from $1000)


Anyone is eligible to apply. The prerequisites are:

  • Deep learning: you can gauge the background knowledge required by skimming the week 1 slides: deep learning review.
  • Linear algebra or introductory statistics (e.g., AP Statistics)
  • Multivariate differential calculus

If you are not sure whether you meet these prerequisites, err on the side of applying. We will review applications on a case-by-case basis.

Facilitating a section

To be a facilitator, you must have a strong background in deep learning and AI Safety. Note that if you are not familiar with the content, you will have to learn it in advance of each week.

The time commitment for running one cohort is ~2-4 hours per week, depending on prior familiarity with the material. 1 hour of discussion and 1-3 hours of prep. Discussion times are flexible.

We will pay facilitators a stipend corresponding to roughly $30 per hour (subject to legal constraints).

Apply by December 30th. We are especially interested in finding facilitators for in-person groups.

You can post questions here or reach out to introcourse@mlsafety.org.





More posts like this

Sorted by Click to highlight new comments since: Today at 5:35 AM

will add this opportunity to the EA Opportunity board!