More people getting into AI safety should do a PhD

AdamGleave

This is a linkpost for https://gleave.me/post/why-do-phd/

Doing a PhD is a strong option to get great at developing and evaluating research ideas. These skills are necessary to become an AI safety research lead, one of the key talent bottlenecks in AI safety, and are helpful in a variety of other roles. By contrast, my impression is that currently many individuals with the goal of being a research lead pursue options like independent research or engineering-focused positions instead of doing a PhD. This post details the reasons I believe these alternatives are usually much worse at training people to be research leads.

I think many early-career researchers in AI safety are undervaluing PhDs. Anecdotally, I think it’s noteworthy that people in the AI safety community were often surprised to find out I was doing a PhD, and positively shocked when I told them I was having a great experience. In addition, I expect many of the negatives attributed to PhDs are really negatives on any pathway involving open-ended, exploratory research that is key to growing to become a research lead.

I am not arguing that most people contributing to AI safety should do PhDs. In fact, a PhD is not the best preparation for the majority of roles. If you want to become a really strong empirical research contributor, then start working as a research engineer on a great team: you will learn how to execute and implement faster than in a PhD. There are also a variety of key roles in communications, project management, field building and operations where a PhD is of limited use. But I believe a PhD is excellent preparation for becoming a research lead with your own distinctive research direction that you can clearly communicate and ultimately supervise junior researchers to work on.

However, career paths are highly individual and involve myriad trade-offs. Doing a PhD may or may not be the right path for any individual person: I simply think it has a better track record than most alternatives, and so should be the default for most people. In the post I’ll also consider counter-arguments to a PhD, as well as reasons why particular people might be better fits for alternative options. I also discuss how to make the most of a PhD if you do decide to pursue this route.

Author Contributions: This post primarily reflects the opinion of Adam Gleave so is written using an “I” personal pronoun. Alejandro Ortega and Sean McGowan made substantial contributions writing the initial draft of the post based on informal conversations with Adam. This resulting draft was then lightly edited by Adam, including feedback & suggestions from Euan McLean and Siao Si Looi.

Why be a research lead?

AI safety progress can be substantially accelerated by people who can develop and evaluate new ideas, and mentor new people to develop this skill. Other skills are also in high demand, such as entrepreneurial ability, people management and ML engineering. But being one of the few researchers who can develop a compelling new agenda is one of the best roles to fill. This ability also pairs well with other skills: for example, someone with a distinct agenda who is also entrepreneurial would be well placed to start a new organisation.

Inspired by Rohin Shah’s terminology, I will call this kind of person a research lead: someone who generates (and filters) research ideas and determines how to respond to results.

Research leads are expected to propose and lead research projects. They need strong knowledge of AI alignment and ML. They also need to be at least competent at executing on research projects: for empirically focused projects, this means adequate programming and ML engineering ability, whereas a theory lead would need stronger mathematical ability. However, what really distinguishes research leads is they are very strong at developing research agendas: i.e., generating novel research ideas and then evaluating them so the best ideas can be prioritized.

This skill is difficult to get. It might take a long time to obtain and it doesn’t happen by default. Moreover, you can’t directly aim for developing this skill: just being an “ideas person” in a highly technical field rarely pans out. You need to get your hands dirty working on a variety of research projects and trying out different ideas to learn what does and doesn’t work. Being a really strong ML engineer or mathematician will help a lot since you can iterate faster and test out more ideas – but this only gets you more "training data”, you still have to learn from that. Apart from experience and iteration speed, the thing that seems to matter most for getting good at research agenda generation are the people you’re surrounded by (peers and mentors) and the environment (e.g. are you supported in trying out new, untested ideas?)

It may not be worth becoming a research lead under many worldviews. For one, there’s a large time cost: it typically takes around 5 years to gain the requisite skills and experience. So this option looks unattractive if you think transformative AI systems are likely to developed within the next 5 years. However, with a 10-years timeframe things look much stronger: you would still have around 5 years to contribute as a research lead. Another possibility is that creating more AI safety agendas may not be that useful. If the current AI safety approaches are more or less enough, the most valuable work may lie in implementing and scaling them up.

In the rest of the post, we’ll assume your goal is to become a research lead and learn to generate great research agendas. The main options available to you are a PhD, working as a research contributor, or independent research. What are the main considerations for and against each of these options?

Why do a PhD?

People

Having a mentor is a key part of getting good at generating research agendas. Empirically testing an idea could easily take you 6 months of work. But an experienced mentor should immediately have a sense of how promising the idea is, and so be able to steer you away from dead ends. This lets you massively increase the amount of training data you get: rather than getting meaningful feedback every 6 months you finish a project, you get it every week you propose an idea.

You don’t just get to learn from your advisor’s predictions of project outcome, but also the reasoning behind them. In fact, you probably want to learn to predict the judgement and reasoning of as many good researchers as you can – not just your official advisor, but other professors, post-docs, promising senior PhD students, and so on. Over time, you’ll learn to analyze research projects from a variety of different frames. At some point, you’ll probably end up finding many of these frames as well as your own judgement disagree with your advisor – congratulations, you’re now on your way to being a research lead. A (good) advisor is more of a mentor than a boss, so you will have the freedom to try different things.

For this reason, it matters enormously where you do your PhD: if you are surrounded by mediocre researchers, your learning opportunities will be significantly diminished. However, universities still have some of the best AI talent in the world: professors in top departments are leaders in the field and have 10+ years of research experience. They are comparable to senior team leads at the top industry labs. If you can get directly advised by a professor of this calibre, that’s a great deal for you.

Environment

Within a PhD program you’re incentivized to come up with your own research ideas and execute on them. Moreover, the program guarantees at least some mentorship from your supervisor. Your advisor’s incentives are reasonably aligned with yours: they get judged by your success in general, so want to see you publish well-recognized first-author research, land a top research job after graduation and generally make a name for yourself (and by extension, them).

Doing a PhD also pushes you to learn how to communicate with the broader ML research community. The “publish or perish'' imperative means you’ll get good at writing conference papers and defending your work. This is important if you want your research to get noticed outside of a narrow group of people such as your colleagues or LessWrong readers. It’ll also help you influence other ML researchers’ work and build a consensus that safety is important.

You'll also have an unusual degree of autonomy: You’re basically guaranteed funding and a moderately supportive environment for 3-5 years, and if you have a hands-off advisor you can work on pretty much any research topic. This is enough time to try two or more ambitious and risky agendas.

But freedom can be a double-edged sword. Some people struggle with the lack of structure, and a lot of people fritter the opportunity away doing safe, incremental work. But if you grasp it, this is an excellent opportunity.

Alternatives to PhDs

Doing independent research

As an independent researcher, you get time to think and work on ideas. And you’ll feel none of the bad incentives that industry or academia place on you.

But by default you’ll be working alone and without a mentor. Both of these things are bad.

Working by yourself is bad for motivation. Entrepreneurs are unusually self-motivated and have “grit”, but are still strongly recommended to find a co-founder. If you think being isolated doesn’t matter, you’re probably fooling yourself.

Moreover, without a mentor your feedback loop will be a lot longer: rather than getting regular feedback on your ideas and early-stage results, you’ll need to develop your research ideas to the point where you can tell if they’re panning out or not. In some fields like mechanistic interpretability that have fast empirical feedback loops this may be only a modest cost. In research fields with longer implementation times or a lack of empirical feedback, this will be much more costly.

And mentor-time is hard to come by. There aren’t many people who are able to i) impart the skills of research idea generation and evaluation and ii) donate enough time to actually help you learn good taste. That’s not to say it isn’t possible to find someone happy to mentor you, but getting comments on your Google Docs every 3 months is unlikely to be good enough. I think an hour every other week is the minimum mentorship most people need, although some people are exceptionally quick independent learners.

Working as a research contributor

As a research contributor you execute on other people’s ideas, for example as a research engineer in an industry lab. This is often an excellent way of getting good at execution as well as learning some basic research skills. But it is not usually sufficient for getting good at developing research agendas.

Industry lab agendas are often set top-down, so your manager likely won’t give you opportunities to practice exploring your own research ideas. It’s also worth noting that most research leads at these organizations seem to have PhDs anyway. But that’s not to say there aren’t firms or teams where working as a research engineer would be better than doing a PhD.

Similarly, non-profit alignment organizations (like Redwood, Apollo, METR, ARC) often have pre-set research agendas. Furthermore, these organizations are often staffed by more junior researchers, who may not be able to provide good mentorship.

Working as an RA at an academic lab also usually involves just executing on other people’s ideas. However, it is a bit better optimized for PhD applications: Professors are well-placed to write a strong recommendation letter, and RA projects are usually designed to be publishable.

Working as a research contributor can be a good starting point for the first year or two of a prospective research lead’s career. In particular, engineering skills are often acquired faster and better in a company than a PhD. So even if a PhD is your end goal, it may be worth spending some time in a research contributor role. Indeed, many well-run academic labs more or less have an apprentice system where junior PhD students will initially work closely with more senior PhD students or post-docs before they can operate more independently. Starting a PhD a bit later but with greater independence could let you skip this step.

However, if you do opt to start working as a research contributor, choose your role carefully. You’ll want to ensure you develop a strong PhD portfolio (think: can you publish in this role, and get a strong recommendation letter?). Additionally, be honest with yourself as to whether you’ll be willing to take a paycut in the future. Going from undergraduate to a PhD will feel like getting richer, whereas going from an industry role to a PhD will involve taking a massive pay-cut. Although you might have a higher standard of living with supplemental savings from an industry role, it won’t feel like you do. Setting yourself a relatively strict budget to prevent your expenses expanding to fill your (temporarily elevated) salary can help here.

Things to be wary of when doing a PhD

Although I are in favour of more people doing PhDs, I do think they fall far short of an ideal research training program. In particular, the quality of mentorship varies significantly between advisors. Many PhD students experience mental health issues during their programme, often with limited support.

I think most criticisms of PhDs are correct, but as it currently stands the other options are usually worse than PhDs. We’d be excited to see people develop alternative, better ways of becoming research leads, but until that happens I think people should not be discouraged from doing PhDs.

Your work might have nothing to do with safety

By default, a PhD will do an excellent job at training you to predict the outcome of a research project and getting research ideas to work. But it will do very little to help you judge whether the outcome of a research project actually matters for safety. In other words, PhD’s do not train you to evaluate the theory of impact for a research project.

Academic incentives are mostly unrelated to real-world impact. The exception is if you’re in a program where other students or academics care about alignment, where you’ll probably get practice at evaluating theories of impact. See below if you want some specific recommendations on how to make this happen.

But for most people, this won’t be the case and you’ll have to supplement with other sources. The easiest way is to attend AI safety focused conferences and workshops, co-work from an AI safety hub (mostly located in the SF Bay Area & London) and/or intern at an AI safety non-profit or an industry org’s safety team.

Your mental health might suffer

The mental health of graduate students is notoriously bad. Some PhD programs are better than others at giving students more guidance early on, or training supervisors to be better at management. But even in the best case, learning how to do research is hard. If you think you are high-risk for mental health issues, then you should choose your PhD program and advisor carefully, and may want to seriously consider alternatives to a PhD.

Anecdotally, it seems like mental health amongst independent researchers or in some alignment non-profits might be as bad as in PhD programs. However, mental health is often better in more structured roles, and at organizations that champion a healthy management culture.

So what should you do?

There are multiple options available to get good at developing research agendas and I am definitely not suggesting that doing a PhD is the correct choice for everyone. Weighing up what’s best for you to do will depend on your background and history.

But it’ll also depend on what specific options you have available to you. We’d stress that it’s worth exploring multiple paths (e.g. PhD and research engineering) in parallel. Even if one path is on average more impactful or a better fit for you, the best option in a given track usually dwarfs the median option in other tracks. Doing a PhD might be better for most people, but working as an ML engineer at a top AI safety non-profit probably beats doing a PhD at a low-ranked program with no one working on safety.

To try and work out how good a PhD is likely to be, ask:

How good a researcher is your supervisor?
How good a mentor are they? (Visit their lab and ask current grad students!)
How interested are they in AI Safety?
How much flexibility do you have to choose your own projects?

If you’re doing independent research, then ask:

Do you already have most of the skills needed for this research project?
Have you thrived in independent environments with limited accountability in the past?
Do you already have a research track record?
What are your sources of mentorship and feedback? How much of their time are they able to give?

Advice for making the most of a PhD

Improving execution: I would suggest starting by prioritizing getting high-bandwidth, object-level feedback from mentors to improve your execution and general knowledge of the field. You could get this by working with a junior professor who has a lot of time, or a post-doc or senior PhD student. You'll learn a lot about how to execute on a project, including implementation, experiments, and write-up. At this point it’s fine to work on other people's ideas, and on non-safety projects.

Improving idea generation: In the background, read up on safety and try to keep an eye on what's going on. Form opinions on what's good and bad, and what’s missing. Keep a list of ideas and don't worry too much if they're good or bad. Flesh out the ones you think are best into one to two page proposals. Ask safety researchers for feedback on your theory of change, and ask non-safety AI researchers for feedback on general tractability and technical interest.

Improving idea evaluation: If other students or academics in your program are interested in alignment, you could set up a reading group. One format which seems to work well is for one person to go deep on the research agenda of another safety researcher, and to start the meeting by explaining and justifying this agenda. Then the rest of the meeting is the group engaging in spirited debate and discussion about the agenda. This feels less personal than if the agenda of someone in the room is being critiqued.

I also sometimes recommend a reading group format where people present their own ideas and get feedback. I think it's good if these are low-stakes – for example, where the norm is that it’s acceptable to present half-baked ideas. It's easy to get demotivated if you put a lot of work into an idea and it gets shot down. Another good format is cycles of "clarify, correct, critique", where you start by understanding what someone else is proposing, try to improve/correct any issues with it, then critique this stronger version of it.

Increase your independence: After the first year or two (depending on how much prior experience you have and how long the PhD program is), switch to working more on your own ideas and working autonomously. Now it's time to put the pieces together. Your time spent ideating and evaluating will have given you a list of ideas that are safety-relevant and which you & your advisor agree are strong. Your time spent developing execution skills will have enabled you to rapidly test these ideas.

Increase your ambition: Gradually start being more ambitious. Rather than aiming for individual project ideas, can you start to craft an overarching agenda? What is your worldview, and how does it differ from others? This won't happen overnight, so thinking about this little but often might be the best approach.

Conclusion

Doing a PhD is usually the best way to get great at the key skills of generating and evaluating research ideas. At a top PhD program you’ll be mentored by world-class researchers and get practice developing and executing on your own ideas. PhD programs are by no means ideal, but I think they are usually the best option for those aiming to be one of the few researchers who can develop a compelling, new research agenda.

In particular, I think most people are unlikely to become research leads by working as a research contributor or by doing independent research. However, other roles can make equal or greater contributions to AI safety research, and there are a number of reasons why doing a PhD might not be the best option for any individual person.

RyanCareyMar 15 202413

It may not be worth becoming a research lead under many worldviews.

I'm with you on almost all of your essay, regarding the advantages of a PhD, and the need for more research leads in AIS, but I would raise another kind of issue - there are not very many career options for a research lead in AIS at present. After a PhD, you could pursue:

Big RFPs. But most RFPs from large funders have a narrow focus area - currently it tends to be prosaic ML, safety, and mechanistic interpretability. And having to submit to grantmakers' research direction somewhat defeats the purpose of being a research lead.
Joining an org working on an adjacent research direction. But they may not exist, depending on what you're excited to work on.
Academia. But you have to be willing to travel, teach a lot, and live on well below the salary for a research contributor.
Little funders (like LTFF). But they may take 3+ months to apply for, and only last a year at a time, and they won't respond to your emails for an explanation of this.
Get hired by as a researcher at OpenPhil? But very few will be hired and given research autonomy here.

For a many research leads, these options won't be very attractive, and I find it hard to feel positive about convincing people to become research leads until better opportunities are in place. What would make me excited? I think we should have:

A. Research agenda agnostic RFPs. There needs to be some way for experienced AI safety researchers to figure out whether AI safety is actually a viable long-term career for them. Currently, there's no way to get OpenPhil's opinion on this - you simply have to wait years until they notice you. But there aren't very many AI safety researchers, and there should be a way for them to run this test so that they can decide which way to direct their lives.

Concrete proposal: OpenPhil should say "we want applications from AIS researchers who we might be excited about as individuals, even if we don't find their research exciting" and should start an RFP along these lines.

B. MLGA (Make LTFF great again). I'm not asking much here, but they should be faster, be calibrated on their timelines, respond to email in case of delays, offer multi-year grants.

Concrete proposal: LTFF should say "we want to fund people for multiple years at a time, and we will resign if we can't get our grantmaking process work properly

C. At least one truly research agenda-agnostic research organisation, that will hire research leads to pursue their own research interests.

Concrete proposal: Folks should found an academic department-style research organisation that hires research leads, gets them office space and visas, and gives them a little support to apply for grants to support their teams. Of course this requires a level of interest from OpenPhil and other grantmakers in supporting this organisation.

Finally, I conclude on a personal note. As Adam knows, and other readers may deduce, I myself am a research lead underwhelmed with options (1-5). I would like to fix C (or A-B) and am excited to talk about ways of achieving this, but a big part of me just wants to leave AIS for a while, as these options are so much stronger, from a selfish perspective. Given that AIS has been this way for years, I suspect many others might leave before these issues are fixed.

L Rudolf LMar 20 20248

(A) Call this "Request For Researchers" (RFR). OpenPhil has tried a more general version of this in the form of the Century Fellowship, but they discontinued this. That in turn is a Thiel Fellowship clone, like several other programs (e.g. Magnificent Grants). The early years of the Thiel Fellowship show that this can work, but I think it's hard to do well, and it does not seem like OpenPhil wants to keep trying.

(B) I think it would be great for some people to get support for multiple years. PhDs work like this, and good research can be hard to do over a series of short few-month grants. But also the long durations just do make them pretty high-stakes bets, and you need to select hard not just on research skill but also the character traits that mean people don't need external incentives.

(C) I think "agenda-agnostic" and "high quality" might be hard to combine. It seems like there are three main ways to select good people: rely on competence signals (e.g. lots of cited papers, works at a selective organisation), rely on more-or-less standardised tests (e.g. a typical programming interview, SATs), or rely on inside-view judgements of what's good in some domain. New researchers are hard to assess by the first, I don't think there's a cheap programming-interview-but-for-research-in-general that spots research talent at high rates, and therefore it seems you have to rely a bunch on the third. And this is very correlated with agendas; a researcher in domain X will be good at judging ideas in that domain, but less so in others.

The style of this that I'd find most promising is:

Someone with a good overview of the field (e.g. at OpenPhil) picks a few "department chairs", each with some agenda/topic.
Each department chair picks a few research leads who they think have promising work/ideas in the direction of their expertise.
These research leads then get collaborators/money/ops/compute through the department.

I think this would be better than a grab-bag of people selected according to credentials and generic competence, because I think an important part of the research talent selection process is the part where someone with good research taste endorses the agenda takes of someone else on agenda-specific inside-view grounds.

AdamGleaveMar 17 20246

This is an important point. There's a huge demand for research leads in general, but the people hiring & funding often have pretty narrow interests. If your agenda is legibly exciting to them, then you're in a great position. Otherwise, there can be very little support for more exploratory work. And I want to emphasize the legible part here: you can do something that's great & would be exciting to people if they understood it, but novel research is often time-consuming to understand, and these are time-constrained people who will not want to invest that time unless they have a strong signal it's promising.

A lot of this problem is downstream of very limited grantmaker time in AI safety. I expect this to improve in the near future, but not enough to fully solve the problem.

I do like the idea of a more research agenda agnostic research organization. I'm striving to have FAR be more open-minded, but we can't support everything so are still pretty opinionated to prioritize agendas that we're most excited by & which are a good fit for our research style (engineering-intensive empirical work). I'd like to see another org in this space set-up to support a broader range of agendas, and am happy to advise people who'd like to set something like this up.

SummaryBotMar 15 20241

Executive summary: Doing a PhD is usually the best way to develop the skills needed to become an AI safety research lead, such as generating and evaluating research ideas, despite some drawbacks.

Key points:

Research leads who can develop new research agendas are a key bottleneck in AI safety, but this skill is difficult to acquire.
A PhD provides mentorship from experienced researchers, an environment to develop and communicate ideas, and autonomy to pursue ambitious agendas.
Independent research often lacks sufficient mentorship and feedback, while research contributor roles don't provide enough opportunities to develop and pursue one's own ideas.
Potential drawbacks of a PhD include work unrelated to AI safety, mental health challenges, and varying advisor quality. Careful selection of the program and advisor is important.
Making the most of a PhD involves initially prioritizing execution skills and knowledge, gradually developing research ideas, and increasing independence and ambition over time.
A PhD is not ideal or the best fit for everyone, but is usually better preparation to become a research lead compared to the alternatives.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Effective Altruism Forum
EA Forum