As part of my interview series, I’m considering interviewing AI safety technical researchers at several of the main organizations on what they would recommend newcomers do to excel in the field. If you would like to see more interviews on this topic, please let me know in the comments.
Ryan Carey is an AI Safety Research Fellow at the Future of Humanity Institute. Ryan also sometimes coaches people interested in getting into AI safety research for 80,000 Hours. The following takeaways are from a conversation I had with Ryan Carey last June on how to transition from being a software engineer to a research engineer at a safety team.
A lot of people talk to Ryan and ask “I’m currently a software engineer, and I would like to quit my job to apply to AI safety engineering jobs. How can I do it?”
To these people, Ryan usually says the following: For most people transitioning from software engineering into AI safety, becoming a research engineer at a safety team is often a realistic and desirable goal. The bar for safety engineers seems high, but not insanely so. E.g. if you’ve already been a Google engineer for a couple of years, and have an interest in AI, you have a fair chance of getting a research engineer role at a top industry lab. If you have a couple of years of somewhat less-prestigious industry work, there’s a fair chance of getting a valuable research engineer role at a top academic lab. If you don’t make it, there are a lot of regular machine learning engineering jobs to go around.
How would you build your CV in order to make a credible application? Ryan suggests the following:
- First, spend a month trying to replicate a paper from the Neurips safety workshop. It’s normal to take 1-6 weeks full time to replicate a paper when starting out. Some papers are harder or easier than that, but if it’s taking much longer, you probably would need to build those skills before you could work in the field.
- You might simultaneously apply for internships at AI safety orgs or a MIRI workshop.
- If you’re not able to get an internship and replicate papers yet, maybe try to progress further in regular machine learning engineering first. Try to get internships or jobs at any of the big companies/trendy startups, just as you would if you were pursuing a regular ML engineering career.
- If you’re not there yet, maybe consider a master’s degree in ML if you have the money. People commonly want to avoid formal studies by self-studying and then carving a path to a less-orthodox safety startup of the likes of MIRI. If super bright and math-y, then this can work, but it is a riskier path.
- If you can’t get (2-4), one option is to take three months to build up your GitHub of replicated papers. Maybe go to a machine learning conference. (6 months of building your GitHub is much more often the right answer than 6 months of math.) Then repeat steps 2-4.
- If you’re not able to get any of the internships or reasonably good ML industry jobs or into master’s programs (top 50 in the world), then it may be that ML research engineering is not going to work out for you. In this case, you could look at other directly useful software work, or earning to give.
While doing these steps, it’s reasonably useful to be reading papers. Rohin Shah’s Alignment Newsletter is amazing if you want to read things. The sequences on the Alignment Forum are another good option.
As for textbooks, reading the Goodfellow ML textbook is okay. Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz if you want to work at MIRI/do math.
There are no great lit reviews yet for safety research. Tom Everitt’s paper on observation incentives is good if trying to do theoretical research. If trying to do experimental research, Paul Christiano’s Deep Reinforcement Learning from Human Preferences paper is good.
Good schools for doing safety:
- Best: Berkeley
- Amazing: Oxford, Toronto, Stanford
- Great: Cambridge, Columbia, Cornell, UCL, CMU, MIT, Imperial, other Ivies
General advice:
People shouldn’t take crazy risks that would be hard to recover from (e.g. don’t quit their job unless it’s easy to get a new one).
If you are trying to do research on your own, get feedback early, e.g. share with Alignment Forum, Less Wrong, or share google docs with people. Replications are fine to share; they pad CVs but aren’t of great interest otherwise.
We ran the above past Daniel Ziegler, who previously transitioned from software engineering to working at Open AI. Daniel said he agrees with this advice and added:
“In addition to fully replicating a single paper, it might be worth reading a variety of papers and at least roughly reimplementing a few of them (without trying to get the same performance as the paper). e.g. from https://spinningup.openai.com/en/latest/spinningup/keypapers.html.”
If you liked this post, I recommend you check out 80,000 Hours’ podcast with Catherine Olsson and Daniel Ziegler.
This piece is cross-posted on my blog here.
Not sure if he hasn't seen it or it just isn't what he's looking for, but there's AI Alignment Research Overview.
On a short skim, this seems more like a research agenda? There are a few research agendas by now...
The only lit review I've seen is [1]. I probably should've said I haven't seen any great lit reviews, because I felt this one was OK - it covered a lot of ground. However, it is a couple of years old, and it didn't organize the work in a way that was satisfying for me.
1. Everitt, Tom, Gary Lea, and Marcus Hutter. "AGI safety literature review." arXiv preprint arXiv:1805.01109 (2018).
I intended the document to be broader than a research agenda. For instance I describe many topics that I'm not personally excited about but that other people are and where the excitement seems defensible. I also go into a lot of detail on the reasons that people are interested in different directions. It's not a literature review in the sense that the references are far from exhaustive but I personally don't know of any better resource for learning about what's going on in the field. Of course as the author I'm biased.
Okay, thought it might be something like that. Thought I'd post just in case.
I've read "Understanding ML" and have only a superficial understanding of what MIRI does, but the two don't seem related. The book is for those interested in statistical machine learning theory.
I'm not sure what the metric for the "good schools" list is but the ranking seemed off to me. Berkeley, Stanford, MIT, CMU, and UW are generally considered the top CS (and ML) schools. Toronto is also top-10 in CS and particularly strong in ML. All of these rankings are of course a bit silly but I still find it hard to justify the given list unless being located in the UK is somehow considered a large bonus.
Yep, I'd actually just asked to clarify this. I'm listing schools that are good for doing safety work in particular. They may also be biased toward places I know about. If people are trying to become professors, or are not interested in doing safety work in their PhD then I agree they should look at a usual CS university ranking, which would look like what you describe.
That said, at Oxford there are ~10 CS PhD students interested in safety, and a few researchers, and FHI scholarships, which is why it makes it to the Amazing tier. At Imperial, there are 2 students and one professor. But happy to see this list improved.
Okay, thanks for the clarification. I now see where the list comes from, although I personally am bearish on this type of weighting. For one, it ignores many people who are motivated to make AI beneficial for society but don't happen to frequent certain web forums or communities. Secondly, in my opinion it underrates the benefit of extremely competent peers and overrates the benefit of like-minded peers.
While it's hard to give generic advice, I would advocate for going to the school that is best at the research topic one is interested in pursuing, or where there is otherwise a good fit with a strong PI (though basing on a single PI rather than one's top-2/top-3 can sometimes backfire). If one's interests are not developed enough to have a good sense of topic or PI then I would go with general strength of program.