Hide table of contents

The Intro to ML Safety course covers foundational techniques and concepts in ML safety for those interested in pursuing research careers in AI safety, with a focus on empirical research. 

We think it's a good fit for people with ML backgrounds who are looking to get into empirical research careers focused on AI safety.

Intro to ML Safety is run by the Center for AI Safety and designed and taught by Dan Hendrycks, a UC Berkeley ML PhD and director of the Center for AI Safety.

Apply to be a participant by May 22nd

Website: https://course.mlsafety.org/ 

About the Course

Intro to ML Safety is an 8-week virtual course that aims to introduce students with a deep learning background to the latest empirical AI Safety research. The program introduces foundational ML safety concepts such as robustness, alignment, monitoring, and systemic safety.

The course takes 5 hours a week, and consists of a mixture of:

  • Assigned readings and lecture videos (publicly available at course.mlsafety.org)
  • Homework and coding assignments
  • A facilitated discussion session with a TA and weekly optional office hours

The course will be virtual by default, though in-person sections may be offered at some universities.
 

The Intro to ML Safety curriculum

The course covers:

  1. Hazard Analysis: an introduction to concepts from the field of hazard analysis and how they can be applied to ML systems; and an overview of standard models for modelling risks and accidents.
  2. Robustness: Robustness focuses on ensuring models behave acceptably when exposed to abnormal, unforeseen, unusual, highly impactful, or adversarial events. We cover techniques for generating adversarial examples and making models robust to adversarial examples; benchmarks in measuring robustness to distribution shift; and approaches to improving robustness via data augmentation, architectural choices, and pretraining techniques.
  3. Monitoring: We cover techniques to identify malicious use, hidden model functionality and data poisoning, and emergent behaviour in models; metrics for OOD detection; confidence calibration for deep neural networks; and transparency tools for neural nets.
  4. Alignment: We define alignment as reducing inherent model hazards. We cover measuring honesty in models; power aversion; an introduction to ethics; and imposing ethical constraints in ML systems.
  5. Systemic Safety: In addition to directly reducing hazards from AI systems, there are several ways that AI can be used to make the world better equipped to handle the development of AI by improving sociotechnical factors like decision making ability and safety culture. We cover using ML for improved epistemics; ML for cyberdefense;  and ways in which AI systems could be made to better cooperate.
  6. Additional X-Risk Discussion: The last section of the course explores the broader importance of the concepts covered: namely, existential risk and possible existential hazards. We cover specific ways in which AI could potentially cause an existential catastrophe, such as weaponization, proxy gaming, treacherous turn, deceptive alignment, value lock-in, and persuasive AI. We introduce some considerations for influencing future AI systems; and introduce research on selection pressures. 

How is this program different from AGISF?

If you are interested in an empirical research career in AI safety, then  you are in the target audience for this course. The ML Safety course does not overlap much with AGISF, so we expect that participants who both have and have not previously done AGISF to get a lot out of Intro to ML Safety.

Intro to ML Safety is focused on ML empirical research rather than conceptual work. Participants are required to watch recorded lectures and complete homework assignments that test their understanding of the technical material. 

You can read about more the ML safety approach in Open Problems in AI X-risk.

Time Commitment

The program will last 8 weeks, beginning on June 12th and ending on August 14th. 

Participants are expected to commit around 5-10 hours per week. This includes ~1-2 hours of recorded lectures, ~2-3 hours of readings, ~2 hours of written assignments, and 1.5 hours of in person discussion.

In order to give more people the opportunity to study ML Safety, we will provide a $500 stipend to eligible students who complete the course

Eligibility

This is a technical course. A solid background in deep learning is required. 

If you don’t have this background, we recommend Week 1-6 of MIT 6.036 followed by Lectures 1-13 of the University of Michigan’s EECS498 or Week 1-6 and 11-12 of NYU’s Deep Learning.

Apply to be a participant by May 22nd

Website: https://course.mlsafety.org/

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 20m read
 · 
Advanced AI could unlock an era of enlightened and competent government action. But without smart, active investment, we’ll squander that opportunity and barrel blindly into danger. Executive summary See also a summary on Twitter / X. The US federal government is falling behind the private sector on AI adoption. As AI improves, a growing gap would leave the government unable to effectively respond to AI-driven existential challenges and threaten the legitimacy of its democratic institutions. A dual imperative → Government adoption of AI can’t wait. Making steady progress is critical to: * Boost the government’s capacity to effectively respond to AI-driven existential challenges * Help democratic oversight keep up with the technological power of other groups * Defuse the risk of rushed AI adoption in a crisis → But hasty AI adoption could backfire. Without care, integration of AI could: * Be exploited, subverting independent government action * Lead to unsafe deployment of AI systems * Accelerate arms races or compress safety research timelines Summary of the recommendations 1. Work with the US federal government to help it effectively adopt AI Simplistic “pro-security” or “pro-speed” attitudes miss the point. Both are important — and many interventions would help with both. We should: * Invest in win-win measures that both facilitate adoption and reduce the risks involved, e.g.: * Build technical expertise within government (invest in AI and technical talent, ensure NIST is well resourced) * Streamline procurement processes for AI products and related tech (like cloud services) * Modernize the government’s digital infrastructure and data management practices * Prioritize high-leverage interventions that have strong adoption-boosting benefits with minor security costs or vice versa, e.g.: * On the security side: investing in cyber security, pre-deployment testing of AI in high-stakes areas, and advancing research on mitigating the ris
saulius
 ·  · 22m read
 · 
Summary In this article, I estimate the cost-effectiveness of five Anima International programs in Poland: improving cage-free and broiler welfare, blocking new factory farms, banning fur farming, and encouraging retailers to sell more plant-based protein. I estimate that together, these programs help roughly 136 animals—or 32 years of farmed animal life—per dollar spent. Animal years affected per dollar spent was within an order of magnitude for all five evaluated interventions. I also tried to estimate how much suffering each program alleviates. Using SADs (Suffering-Adjusted Days)—a metric developed by Ambitious Impact (AIM) that accounts for species differences and pain intensity—Anima’s programs appear highly cost-effective, even compared to charities recommended by Animal Charity Evaluators. However, I also ran a small informal survey to understand how people intuitively weigh different categories of pain defined by the Welfare Footprint Institute. The results suggested that SADs may heavily underweight brief but intense suffering. Based on those findings, I created my own metric DCDE (Disabling Chicken Day Equivalent) with different weightings. Under this approach, interventions focused on humane slaughter look more promising, while cage-free campaigns appear less impactful. These results are highly uncertain but show how sensitive conclusions are to how we value different kinds of suffering. My estimates are highly speculative, often relying on subjective judgments from Anima International staff regarding factors such as the likelihood of success for various interventions. This introduces potential bias. Another major source of uncertainty is how long the effects of reforms will last if achieved. To address this, I developed a methodology to estimate impact duration for chicken welfare campaigns. However, I’m essentially guessing when it comes to how long the impact of farm-blocking or fur bans might last—there’s just too much uncertainty. Background In
 ·  · 2m read
 · 
In my opinion, we have known that the risk of AI catastrophe is too high and too close for at least two years. At that point, it’s time to work on solutions (in my case, advocating an indefinite pause on frontier model development until it’s safe to proceed through protests and lobbying as leader of PauseAI US).  Not every policy proposal is as robust to timeline length as PauseAI. It can be totally worth it to make a quality timeline estimate, both to inform your own work and as a tool for outreach (like ai-2027.com). But most of these timeline updates simply are not decision-relevant if you have a strong intervention. If your intervention is so fragile and contingent that every little update to timeline forecasts matters, it’s probably too finicky to be working on in the first place.  I think people are psychologically drawn to discussing timelines all the time so that they can have the “right” answer and because it feels like a game, not because it really matters the day and the hour of… what are these timelines even leading up to anymore? They used to be to “AGI”, but (in my opinion) we’re basically already there. Point of no return? Some level of superintelligence? It’s telling that they are almost never measured in terms of actions we can take or opportunities for intervention. Indeed, it’s not really the purpose of timelines to help us to act. I see people make bad updates on them all the time. I see people give up projects that have a chance of working but might not reach their peak returns until 2029 to spend a few precious months looking for a faster project that is, not surprisingly, also worse (or else why weren’t they doing it already?) and probably even lower EV over the same time period! For some reason, people tend to think they have to have their work completed by the “end” of the (median) timeline or else it won’t count, rather than seeing their impact as the integral over the entire project that does fall within the median timeline estimate or
Recent opportunities in AI safety
54
14
Ryan Kidd
·