What is it like doing AI safety work?

Kat Woods; peterbarnett

How do you know if you’ll like AI safety work? What’s the day-to-day work like? What are the best parts of the job? What are the worst?

To better answer these questions, we talked to ten AI safety researchers in a variety of organizations, roles, and subfields. If you’re interested in getting into AI safety research, we hope this helps you be better informed about what pursuing a career in the field might entail.

The first section is about what people do day-to-day and the second section describes each person’s favorite and least favorite aspects of the job.

Of note, the people we talked with are not a random sample of AI safety researchers, and it is also important to consider the effects of survivorship bias. However, we still think it's useful and informative to hear about their day-to-day lives and what they love and hate about their jobs.

Also, these interviews were done about a year ago, so may no longer represent what the researchers are currently doing.

Reminder that you can listen to LessWrong and EA Forum posts like this on your podcast player using the Nonlinear Library.

This post is part of a project I’ve been working on at Nonlinear. You can see the first part of the project here where I explain the different ways people got into the field.

What do people do all day?

John Wentworth

John describes a few different categories of days.

He sometimes spends a day writing a post; this usually takes about a day if all the ideas are developed already.
He might spend a day responding to comments on posts or talking to people about ideas. This can be a bit of a chore but is also necessary and useful.
He might spend his day doing theoretical work. For example, if he’s stuck on a particular problem, he can spend a day working with a notebook or on a whiteboard. This means going over ideas, trying out formulas and setups, and trying to make progress on a particular problem.

Over the past month he’s started working with David Lorell. David’s a more active version of the programmer's "rubber duck". As John’s thinking through the math on a whiteboard, he’ll explain to David what's going on. David will ask for clarifications, examples, how things tie into the bigger picture, why did/didn't X work, etc.

John estimates that this has increased his productivity at theoretical work by a factor somewhere between 2 and 5.

Ondrej Bajgar

Ondrej starts the day by cycling to the office. He has breakfast there and tries to spend as much time as possible at a whiteboard away from his computer. He tries to get into a deep-thinking mindset, where there aren’t all the answers easily available. Ideally, mornings are completely free of meetings and reserved for this deep-thinking work.

Deep thinking involves a lot of zooming in and out, working on sub-goals while periodically zooming out to check on the higher-level goal every half hour. He switches between trying to make progress and reflecting on how this is actually going. This is to avoid getting derailed on something unproductive but cognitively demanding.

Once an idea is mostly formed, he’ll try to implement things in code. Sometimes seeing things in action can make you see new things you wouldn’t get from just the theory. But he also says that it’s important to not get caught in the trap of writing code, which can feel fun and feel productive even when it isn’t that useful.

Scott Emmons

Scott talked about a few different categories of day-to-day work:

Research, which involves brainstorming, programming, writing & communicating, and collaborating with people
Reading papers to stay up-to-date with the literature
Administrative work
Service, such as giving advice to undergrads, talking about AI safety, and reviewing other people’s papers

An example of a typical day might look like this: he’ll start work in the morning by reading papers because this is his best time for getting into deep work. This is followed by a research meeting with some collaborators and answering some emails. After that, he has the CHAI weekly lab meeting, where someone presents their research. After this he might spend a few hours coding on an existing project, followed by some admin work.

Alex Turner

Alex has historically specialized in whiteboard-centric math and theory work but has pivoted to empirical and mentoring work (coding, experiment design, frontend, management).

He spends most days focused on specific questions. For example, by what processes might a diamond-motivated AI improve itself? What precautions might it take, given such-and-such resources?

He’ll occasionally zoom out to check that he's still focusing on questions he expects to most increase the probability of alignment success.

On empirical research days, he spends about six hours coding, one reading, and one to two in conversations. On theory days, it's rather evenly mixed between reading, writing, and thinking. He will periodically spend a few days to weeks communicating large bundles of ideas and results, whether in blog post or paper format.

William Saunders

OpenAI, where William works, has three days a week where there are meetings and two days a week where they try to avoid meetings. William’s day starts by getting into the office, checking Slack to see how various projects are going, and leaving comments if he has useful ideas for any of them. If it’s a meeting day, he will usually have 1 or 2 meetings with people working on a project and sometimes other people who are interested in talking about the higher-level ideas. There will be discussions about what experiments to run next, or whether any of the code needs to be refactored.

There are two types of experiments that he works on:

Experiments where data needs to be collected from human contractors. In these cases, William needs to decide, and so it’s a matter of determining how to collect the data and build an interface for them to use. This involves both frontend and backend development, as well as monitoring quality to make sure contractors aren’t misunderstanding things.
Experiments where the data has already been collected. In these cases, his job is to change algorithms and hyperparameters to improve performance

Ethan Perez

Ethan describes his projects as having three phases: brainstorming, implementation, and writing up. The brainstorming phase usually lasts about a month and is all about working out what to do next. This involves a lot of reading, taking notes, and talking with people. Each week he’ll have a meeting with his advisor about ideas for directions that seem promising to pursue and get feedback.

Once they’ve settled on a project, he’ll move on to the ‘implementation’ phase, which starts with figuring out how to get things to work in practice. Then it's a matter of putting the ideas into code and running experiments on a GPU cluster, getting feedback from these experiments, and deciding what to run next. During this process, he’ll have weekly meetings with his advisor to talk about which research directions he should continue to pursue. This implementation phase takes around one to six months, ending when he either gets something to work or decides to shift directions.

Once something works, he’ll take about a month to run the final experiments, write it up in a paper, put it on arXiv, and submit it to a conference.

Justin Shovelain

Justin runs Convergence, an organization which does work in AI and x-risk strategy. A normal week usually involves:

1 day of reading
1 day of operations and logistics; emails, grants, paperwork
1 to 2 days of discussions with people; advising, mentoring, or discussing a topic with people
1 to 1.5 days of solo thinking
Half a day of thinking with his colleague in a discussion format

Days are usually devoted to one activity rather than splitting them up too much, but sometimes this isn’t possible when working with other people, especially in different time zones. Usually he spends the start of the day doing deep work and then has calls later in the day.

Stephen Casper

Stephen usually has at least a couple of things to work on at once, so that when one task gets boring, he can switch to another. He usually works from the offices at MIT, MAIA, and HAIST, bouncing between various tasks including, for example:

Reading papers
Drafting papers or posts
Checking on experiments
Writing and debugging code

As well as the standard work, Stephen has the habit of reading and taking notes on at least one paper a day. This is to explore the literature more widely, so he tries to choose papers that aren’t too related to what he would have read anyway. You can read more about his daily routine here.

Ramana Kumar

At DeepMind the work can be quite varied and things can change a lot in 6 months. Ramana usually works between 10 am - 6 pm and tries not to think too much about work outside of this; although this isn’t always possible. With creative work, you can’t always say, “I’m going to be working now, and I’m not going to be working now”. Sometimes you just have to follow your mood.

About half of Ramana’swork time is spent on tasks that can be done alone or by closely collaborating with someone else. These include:

Reading things like papers, books, and the Alignment Forum
Technical coding work, debugging, and making figures. This type of work is especially fun to do with someone else
Writing papers, outlines, and presentations

The other half of the time is spent with larger groups, for example:

Meetings or long collaboration days to talk about research priorities and directions
Project discussions to talk about what people have been doing, problems, and where to go next
Reading groups

Dan Hendrycks

N.B.: this interview was done in early 2022 before Dan had finished his PhD.

Dan has an exceptional academic and publication record, and as such (as a PhD student) now spends a lot of his time managing other people. He doesn’t spend very much time coding himself; instead, he manages other people, which he’s been doing since the third year of his undergraduate degree. This involves a lot of meetings with people and thinking about ideas for projects.

Part of his management work involves applying some pressure and motivating his colleagues to make sure that projects actually get done, especially applying “start-up torque” to get projects started in the first place.

Favorite and least favorite parts

We asked people what their favorite and least favorite parts of the day-to-day work were. If you think the best bits sound great and you could handle the not-so-good parts, then you might be a great fit for AI safety work.

Ethan Perez

Favorite

Fun feedback loops where you can implement something during the day, run it during the night, and then check back and iterate the next morning.
Sometimes there are even shorter feedback loops, like testing what you can do with prompts in the OpenAI playground
Ethan prefers working on existing codebases, making modifications rather than implementing entire ideas from scratch. If you implement from scratch, and it doesn’t work, it’s hard to know which part of your implementation isn’t working.

Least favorite

Sometimes the feedback loops can be long, and there are a lot of steps that can’t be automated
Initially, Ethan had an aversion to the software engineering side of projects, but this was possibly due to insecurities about his coding ability. This has mainly gone away since addressing this issue

Ondrej Bajgar

Favorite

Mornings: Ondrej dedicates this time to deep thinking, mainly in front of a whiteboard working on problems
Lunches at FHI where there are lots of exciting people to talk to, random encounters, and good conversations

Least favorite

Afternoons: he is usually low on energy and often doesn’t end up making much progress
Since the original interview, Ondrej has learned to accept this and take a break in the afternoon with a nap and a run to restore energy, so it’s no longer his least favorite bit.

Scott Emmons

Favorite

Reading and brainstorming. Scott thinks that this is one of the most fun parts of research: you can think pretty widely and you feel like anything is possible

Least favorite

Dealing with the annoying small details that are involved in getting a project to completion. For example, making sure all the font sizes on your figures look nice, and that all your experiments have a consistent set of hyperparameters

William Saunders

Favorite

William likes doing pure coding tasks where you can just implement something and see if it works. Often for experiments or ML tasks, it can be more hit-or-miss and the feedback loops aren’t as satisfying.

Least favorite

He enjoys generating ideas but finds it less enjoyable to prioritize those ideas and choose the specific thing to do next. For example, “We have several different ways to collect this data. Which should we do first?”
Getting stuck on projects and becoming anxious about whether things will work or whether the project is not going to go anywhere. One of the benefits of working with people is that if your project is stuck, you can go and talk to someone. You can either talk about their project or ask them for help getting yours unstuck

Justin Shovelain

Favorite

The feeling that he is ‘resolving mysteries’ and learning new, cool things
The sense of ‘a deed well done for the world’. Looking at what you’ve done and realizing ‘Oh I actually did improve things. This is wonderful’.

Least favorite

Dealing with bureaucracy
Dealing with the politics involved in running and representing an organization

Stephen Casper

Favorite

Looking at cool visualizations, and producing nice-looking figures. One of the upsides of working in the subfield of adversarial robustness is that he gets to make and work with interesting visualizations.
Getting things to work when they weren’t previously working
Working with people in person

Least favorite

When something is not working and you don’t know why
Working with other people’s code and you don’t like how it’s written
Reviewer #2. (“Reviewer 2 symbolizes the peer reviewer who is rude, vague, smug, committed to pet issues, theories, and methodologies, and unwilling to treat the authors as peers.”)

John Wentworth

Favorite

These are really interesting problems to think about. Having insightful ‘shower thoughts’.

Least favorite

Being stuck on a problem for a long time isn’t very fun, although it is hard to separate the good parts (working on a fun problem) from the bad parts here.
Needing to communicate something when there’s a large inferential gap and so it’s difficult to get the whole idea across.

Alex Turner

Favorite

“When I’m trying to prove something that’s interesting, and I don’t know how to do it yet. Well, actually the favorite part is that instant when I figure out how to do it, but besides that, the process of attacking it is really fun.”

Least favorite

Things like dealing with emails or meetings which aren’t very important, but these can be minimized.
Being in the state of mind of not feeling very quiet internally, where your attention is being pulled by various things like social media.

Ramana Kumar

Favorite

When you make progress on something and see what you’ve produced. For example, coming up with an idea or working with people on a presentation or paper. Seeing the final product and thinking, “Oh, that’s really nice”.
Being in a reading group and feeling on top of what’s going on and that you’re making useful contributions
Spending a while reading or trying to write a comment and reaching a place where you understand something or know how to make a specific point well

Least favorite

It can be hard when you’re not satisfied with any of the options of what to do next and it feels like you don’t really know what to do next. You can end up flitting between options because it seems better than nothing, but then changing direction because one thing didn’t seem quite right. When in this state it’s useful to step back and either do longer-term prioritization, or realize you’ve already done the prioritization and so do (and stick to) what you’ve previously decided.

If you liked this post, you might also like:

How to become an AI safety researcher, the first part of this series
A curated list of links on career advice for AI safety researchers from AI Safety Support
Career coaching from AI Safety Support or 80,000 Hours

Thanks to Amber for editing this post. If you find writing/editing tedious or can never find the time to write, you can contract her to write or edit your EA posts here.

Vasco Grilo🔸Feb 26 20234

Thanks, great post!

programmer's "rubber duck"

In case anyone is wondering. From this Wikipedia page:

In software engineering, rubber duck debugging (or rubberducking) is a method of debugging code by articulating a problem in spoken or written natural language. The name is a reference to a story in the book The Pragmatic Programmer in which a programmer would carry around a rubber duck and debug their code by forcing themselves to explain it, line-by-line, to the duck.

Effective Altruism Forum
EA Forum

What is it like doing AI safety work?

99

What do people do all day?

John Wentworth

Ondrej Bajgar

Scott Emmons

Alex Turner

William Saunders

Ethan Perez

Justin Shovelain

Stephen Casper

Ramana Kumar

Dan Hendrycks

Favorite and least favorite parts

Ethan Perez

Ondrej Bajgar

Scott Emmons

William Saunders

Justin Shovelain

Stephen Casper

John Wentworth

Alex Turner

Ramana Kumar

99

Reactions

More posts like this