List of technical AI safety exercises and projects

JakubK

This is a linkpost for https://docs.google.com/document/d/1-58zgC2lRMbMK-CXU44VR3ApGYbTI0aJKX-cKkxDeyo/edit?usp=sharing

EDIT 3/17/2023: I've reorganized the doc and added some governance projects.

I intend to maintain a list at this doc. I'll paste the current state of the doc (as of January 19th, 2023) below. I encourage people to comment with suggestions.

Levelling Up in AI Safety Research Engineering [Public] (LW)
- Highly recommended list of AI safety research engineering resources for people at various skill levels.
AI Alignment Awards
Alignment jams / hackathons from Apart Research
- Past / upcoming hackathons: LLM, interpretability 1, AI test, interpretability 2
- Projects on AI Safety Ideas: LLM, interpretability, AI test
- Resources: black-box investigator of language models, interpretability playground (LW), AI test
- Examples of past projects; interpretability winners
- How to run one as an in-person event at your school
Neel Nanda: 200 Concrete Open Problems in Mechanistic Interpretability (doc and previous version)
Project page from AGI Safety Fundamentals and their Open List of Project ideas
AI Safety Ideas by Apart Research; EAF post
Most Important Century writing prize (Superlinear page)
Center for AI Safety
- Competitions like SafeBench
- Student ML Safety Research Stipend Opportunity – provides stipends for doing ML research.
- course.mlsafety.org projects CAIS is looking for someone to add details about these projects on course.mlsafety.org
Distilling / summarizing / synthesizing / reviewing / explaining
Forming your own views on AI safety (without stress!) – also see Neel's presentation slides and "Inside Views Resources" doc
Answer some of the application questions from the winter 2022 SERI-MATS, such as Vivek Hebbar's problems
10 exercises from Akash in “Resources that (I think) new alignment researchers should know about”
[T] Deception Demo Brainstorm has some ideas (message Thomas Larsen if these seem interesting)
Upcoming 2023 Open Philanthropy AI Worldviews Contest
Alignment research at ALTER – interesting research problems, many have a theoretical math flavor
Open Problems in AI X-Risk [PAIS #5]
Amplify creative grants (old)
Evan Hubinger: Concrete experiments in inner alignment, ideas someone should investigate further, sticky goals
Richard Ngo: Some conceptual alignment research projects, alignment research exercises
Buck Shlegeris: Some fun ML engineering projects that I would think are cool, The case for becoming a black box investigator of language models
Implement a key paper in deep reinforcement learning
“Paper replication resources” section in “How to pursue a career in technical alignment”
Daniel Filan idea
Summarize a reading from Reading What We Can

Effective Altruism Forum
EA Forum

List of technical AI safety exercises and projects

15

15

Reactions