An Introduction to Developmental Interpretability with Matthew Farrugia-Roberts

Group Organizer

This month we're excited to be hosting a presentation by Matthew Farrugia-Roberts.

Matthew is quickly emerging as one of Australia's leading AI safety researchers; he recently coauthored an ICML conference paper with Stuart Russell from the Centre for Human-Aligned AI, and prior to that, he completed an impactful Master's Thesis with the Melbourne Deep Learning Group (MDLG).

MDLG are pioneering an emerging form of interpretability research called Developmental Interpretability. This new approach is based upon a little-known branch of statistical learning theory that is equipped to describe deep learning models, called Singular Learning Theory (SLT).

Despite its esoteric origins in algebraic geometry and Bayesian statistics, SLT provides insights into neural networks that are both intuitive and impactful. For instance, early research into SLT already explains some of the observations made by mechanistic interpretability researchers (such as those raised in our presentation by Joseph Bloom).

Perhaps the most important implication of SLT is that it can be used to monitor and predict capabilities that emerge as neural networks are trained - an approach known as 'Developmental Interpretability' (Dev-Interp). This new research direction was recently announced at the "SLT and Alignment Summit" in Berkeley, which Matthew attended.

We're keen to hear Matthew's thoughts on the potential impact of Dev-Interp, as well has other personal insights from his experience in the field.

We encourage you to check out Matthew's weekly AI safety discussion group, which is held each Thursday: https://metauni.org/ai-safety/
You can also find links to his research, presentations, and other outputs at his website: https://far.in.net/

For those interested in background information on SLT and Developmental Interpretability, here are some resources:
"Neural Networks Generalise Because of this One Weird Trick": https://www.lesswrong.com/.../mqwA5Fc.../p/fovfuFdpuEwQzJu2w
"Towards Developmental Interpretability": https://www.alignmentforum.org/.../towards-developmental...

For those with an ongoing interest in Dev-Interp or SLT, there are also weekly discussion groups facilitated by MDLG's Daniel Murfet: https://metauni.org/slt/

Schedule:
6:00pm Doors
6:15pm Free pizza dinner
7:00pm Presentation
8:00pm Discussion

See you there!

Effective Altruism Forum
Events
EA Forum

An Introduction to Developmental Interpretability with Matthew Farrugia-Roberts

4

4

Reactions