Given the increasing emergence and advancement of artificial intelligence, AI safety has become an increasingly large cause area in the EA community.
For our week 2 event, our speaker David Quarel will be presenting a talk on the current state of AI safety research.
Title: Where is AI safety and capabilities at right now?
What to Expect
- Demystifying Transformers: Understand the backbone of large language models (LLMs) hype
- Insights into papers on LLM understanding and limitations
- Latent space of models and how they can be explored
- Major discoveries about internals of models like "grokking", and "superposition"
- Goal misgeneralisation discovered in real world reinforcement learning models
Bio: David Quarel was a PhD student at the Australian National University in Universal Artificial Intelligence, a theoretical framework on intelligence by Marcus Hutter. David is based in the UK, and has worked as a Research Assistant at Cambridge University with David Krueger