Capabilities and Interpretability of AI models – with Joseph Bloom

Group Organizer

Join us for an enlightening evening exploring the capabilities and interpretability of state-of-the-art (SOTA) models. Our speaker Joseph Bloom is an independent researcher on mechanistic interpretability (more detail below). His talk will cover the broader trends in the capabilities and limitations of SOTA models, the trends and challenges in alignment-centric interpretability efforts, as well as a brief overview of his own research agenda at the intersection of AI alignment and interpretability. Joseph’s talk will be followed by a Q&A and discussion hosted by James Fodor, providing an opportunity to clarify concepts and dig deeper into areas of interest. And to top off the night we will be providing pizza for dinner. Please book a free ticket (via Eventbrite link below) so that we can order the right amount of pizza.
https://www.eventbrite.com/.../capabilities-and...

The content of the presentation will be somewhat technical in nature and so will be most accessible to those who have some familiarity with the technical basics of AI and interpretability. However we still welcome all levels of experience and the Q&A will provide an opportunity for those less familiar with the topic to enhance their understanding. If you would like to read up on some of the basics we will provide links to some resources below, but you don’t need to read them in order to attend.

Joseph’s talk will also be great background and context for the event AI Safety Melbourne will be running in a few weeks to help people prepare submissions for the Government’s consultation on responsible AI (https://consult.industry.gov.au/supporting-responsible-ai).

Schedule:
6pm – Doors open. Snacks, mingle and chat.
6:30pm – 8:00pm – Joseph’s talk followed by Q&A and discussion.
8:00pm onwards – Pizza, mingle and chat.

The event is primarily intended to be in-person, but attending via Zoom will also be available. (A zoom link will be posted here soon.)

Looking forward to seeing you! Please get in touch if you have any questions.

About Joseph:
Joseph is an Independently Funded Alignment Research Engineer studying Mechanistic Interpretability of GridWorld Agents. He is also the current maintainer of TransformerLens, a popular open source package for mechanistic interpretability of transformers. Joseph's work on Decision Transformers was recently mentioned in the Transformer Circuits Thread May Update, a popular publication in mechanistic interpretability, and he recently led the Career Development program at the Alignment Research Engineering Accelerator (ARENA). Prior to working in AI Alignment, Joseph studied computational biology, and worked for 2 years as a data scientist in a proteomics startup.

Lastly, we'd like to say a big thank you to Bellroy for generously allowing us to use their space to host this event.

Optional pre-reading material:
- https://www.lesswrong.com/.../what-will-gpt-2030-look-like
- https://distill.pub/2020/circuits/zoom-in/

Facebook group: https://www.facebook.com/groups/503645528219169/

Calendar link: https://calendar.google.com/calendar/event...

Effective Altruism Forum
Events
EA Forum

Capabilities and Interpretability of AI models – with Joseph Bloom

4

4

Reactions