Probably good projects for the AI safety ecosystem

Ryan Kidd

At EAGxBerkeley 2022, I was asked several times what new projects might benefit the AI safety and longtermist research ecosystem. I think that several existing useful-according-to-me projects (e.g., SERI MATS, REMIX, CAIS, etc.) could urgently absorb strong management and operations talent, but I think the following projects would also probably be useful to the AI safety/longtermist project. Criticisms are welcome.

Projects I might be excited to see, in no particular order:

A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes.
Rolling admissions alternatives to MATS’ cohort-based structure for mentors and scholars with different needs (e.g., to support alignment researchers who suddenly want to train/use research talent at irregular intervals but don’t have the operational support to do this optimally).
A combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research.
A dedicated bi-yearly workshop for AI safety university group leaders that teaches them how to recognize talent, foster useful undergraduate research projects, and build a good talent development pipeline or “user journey” (including a model of alignment macrostrategy and where university groups fit in).
An organization that does for the Open Philanthropy worldview investigations team what GCP did to supplement CEA's workshops and 80,000 Hours’ career advising calls.
Further programs like ARENA that aim to develop ML safety engineering talent at scale by leveraging good ML tutors and proven curricula like CAIS’ Intro to ML Safety, Redwood Research’s MLAB, and Jacob Hilton's DL curriculum for large language module alignment.
More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?). Possible contest seeds:
- Evan Hubinger’s SERI MATS deceptive AI challenge problem;
- Vivek Hebbar’s and Nate Soares’ SERI MATS diamond maximizer selection problem;
- Alex Turner’s and Quintin Pope’s SERI MATS training stories selection problem.
More "plug-and-play" curriculums for AI safety university groups, like AGI Safety Fundamentals, Alignment 201, and Intro to ML Safety.
A well-considered "precipism" university course template that critically analyzes Toby Ord's “The Precipice,” Holden Karnofsky's “The Most Important Century,” Will MacAskill's “What We Owe The Future,” some Open Philanthropy worldview investigations reports, some Global Priorities Institute ethics papers, etc.
Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews (e.g., what Jacob Steinhardt’s “ML Systems Will Have Weird Failure Modes” does for Hubinger et al.’s “Risks From Learned Optimization”).
A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman’s planned AI safety research group.
More organizations like CAIS that aim to recruit established ML talent into alignment research with clear benchmarks, targeted hackathons/contests with prizes, and offers of funding for large compute projects that focus on alignment. To avoid accidentally furthering AI capabilities, this type of venture needs strong vetting of proposals, possibly from extremely skeptical and doomy alignment researchers.
A talent recruitment and onboarding organization targeting cyber security researchers to benefit AI alignment, similar to Jeffrey Ladish’s and Lennart Heim's theory of change. A possible model for this organization is the Legal Priorities Project, which aims to recruit and leverage legal talent for longtermist research.

Effective Altruism Forum
EA Forum

Probably good projects for the AI safety ecosystem

21

21

Reactions

More posts like this