A Brief Overview of AI Safety/Alignment Orgs, Fields, Researchers, and Resources for ML Researchers

Austin Witte

Crossposted to LessWrong [Link]

TLDR: I’ve written an overview of the AI safety space, tagged by keywords and subject/field references (short version, long version). The aim is to allow existing ML researchers to quickly gauge interest in the subject based on their existing subfield skills and interests!

Overview

When ML researchers first hear about AI alignment and are interested in learning more, they often wonder how their existing skills and interests could fit within the research already taking place. With expertise in specific subfields, and momentum in their careers and projects, interested researchers are curious about the overall AI alignment space and what research projects they could invest in relatively easily. As one step towards addressing this, the AISFB Hub commissioned a collection of resources that could be provided to technical researchers trying to quickly assess what areas seem like promising candidates for them to investigate further: (Short version, Long version).

These documents list a subset of the various organizations and researchers involved in the AI safety space, along with major papers. To allow quick scanning, I focused on keywords and subject/field references. As this was targeted at researchers who already have experience with ML, the summaries provided are primarily meant to allow the reader to quickly gauge interest in the subject based on their existing subfield skills and interests.

This work contains papers and posts up through the end of 2022. Please contact me or Vael Gates if you would be willing to keep it updated!

Details and Disclaimers

As an attempt at collecting alignment research, I generally see this post as complementary to Larsen’s post on technical alignment. Neither entirely includes the other, with Larsen’s post having a slightly stronger and more curated focus on fields and projects, while this collection emphasized providing general resources and example areas of work for new researchers.

Overall, this list took a little over 40 hours of work to put together. It primarily included looking into and summarizing the work of organizations I knew about. This was supplemented by investigating a list of researchers provided by the AISFB Hub, along with work referenced by various other posts in LessWrong/EA forums, and by the organizations and researchers from their websites and papers.

More specifically, these lists include various AI organizations (ex. DeepMind’s safety team, MIRI, OpenAI…) and individual researchers (both academic and independent) currently working on the subject, summaries of papers and posts they have produced, and a number of guides and other resources for those trying to get into the field. All of these include some keyword tags for quicker scanning. Unfortunately, it is impossible to include every research direction and relevant piece of work while keeping this concise. Instead, I tried to limit paper selection to representative samples of the ideas being actively worked on, or explicit overviews of their agendas, while providing as many links as possible for those interested in looking deeper.

Still, with all of that said, I believe these documents can provide an easily shareable resource for anyone who either is themself or knows someone who is interested in transitioning into alignment research but is lacking information about how they might approach, learn about, or contribute to the field. Of course, if you just want to use it to check out some papers, that would work too. Thank you for reading!

plexFeb 20 20233

This seems super useful! Would you be willing to let Rob Miles's aisafety.info use this as seed content? Our backend is already in Google Docs, so if you moved those files to this drive folder we could rename them to have a question-shaped title and they'd be synced in and kept up to date by our editors, or we could copy these if you'd like to have your original separate.

Austin WitteFeb 22 20231

Sounds like a good idea to me! I'm perfectly fine with moving the original documents into the back-end folder (though for now it's read only and I'm unable to do so). Copying is also fine with me, and it might be easier to coordinate as I will be fairly busy over the next few days.

Vael GatesFeb 22 20231

Whoops, forgot I was the owner. I tried moving those files to the drive folder, but also had trouble with it? So I'm happy to have them copied instead.

Thanks plex, this sounds great!

plexFeb 25 20231

Ops, forgot to share edit access, I sent you an invitation to the subfolder so you should be able to move it now. Can also copy if you'd prefer, but I think having one canonical version is best.

Effective Altruism Forum
EA Forum

A Brief Overview of AI Safety/Alignment Orgs, Fields, Researchers, and Resources for ML Researchers

18

Overview

Details and Disclaimers

18

Reactions

More posts like this