[Rescheduled to May 5]
Talk by Dylan Hadfield-Menell, MIT
About this talk
One of the most useful developments in the field of artificial intelligence is the use of incentives to program target behaviors into systems. This creates a convenient way for system designers to specify useful, goal-driven behaviors. However, it also creates issues due to the inherent incompleteness of these goals: we often observe that optimizing a proxy for one's intended goal eventually leads to counterintuitive and undesired results. This is sometimes ascribed to 'Goodhart's Law' that "once a target becomes a measure, it ceases to be a good target". In this talk, I will present theoretical results that characterize situations where Goodhart's Law holds and discuss approaches to manage this incompleteness. I will conclude with a discussion of how incomplete specifications are managed in recommendation systems and propose research directions for safe AI systems that have affordances for updating and maintaining aligned incentives.
About the speaker
Hadfield-Menell is a Professor of EECS at MIT and previously received his Ph.D. in Computer Science from UC-Berkeley. His research focuses on the value alignment problem in artificial intelligence and aims to help create algorithms that pursue the intended goals of their user.
About the speaker series
With the advancement of Machine Learning research and the expected appearance of Artificial General Intelligence in the near future, it becomes an extremely important problem to positively shape the development of AI and to align AI values to that of human values.
In this speaker series, we bring state-of-art research on AI alignment into focus for audiences interested in contributing to this field. We will kick off the series by closely examining the potential risks engendered by AGIs and making the case for prioritizing the mitigation of risks now. Later on in the series, we will hear about more technical talks on concrete proposals for AI alignment.
See the full schedule and register at https://www.harvardea.org/agathon.
You can participate in the talks in person at Harvard and MIT, as well as remotely through the webinar by registering ahead of time (link above). All talks happen at 5 pm EST (2 pm PST, 10 pm GMT) on Thursdays. Dinner is provided for in-person venues.