The Center for AI Safety (CAIS) is announcing the CAIS Philosophy Fellowship, a program for philosophy PhD students and postdoctorates to work on conceptual problems in AI safety.

Why Philosophers?

Conceptual AI safety researchers aim to help orient the field and clarify its ideas, but in doing so, they must wrestle with imprecise, hard-to-define problems. Part of the difficulty involved in conducting conceptual AI safety research is that it involves abstract thinking about future systems which have yet to be built. Additionally, the concepts involved in conceptual AI safety research (e.g., “power”, “intelligence”, “optimization”, “agency”, etc.) can be particularly challenging to work with. 

Philosophers specialize in working through such nebulous problems; in fact,  many active fields within philosophy present a similar type of conceptual challenge, and have nothing empirical to lean on. As an example, a philosopher may take up the question of whether or not ethical claims can possess truth values. Questions such as this one cannot be approached by looking carefully at the world, making accurate measurements, or monitoring the ethical behavior of real people. Instead, philosophers must grapple with intuitions, introduce multiple perspectives, and provide arguments for selecting between these perspectives. We think that this skill set makes philosophers especially fit for conceptual research.

Philosophy has already proven itself to be useful for orienting the field of conceptual AI safety. Many of the foundational arguments for AI-risk were philosophical in nature. (As an example, consider Bostrom’s Superintelligence.) More recently, philosophy has proven itself to have direct influence on the important research directions in AI safety. Joseph Carlsmith’s work on power-seeking AI, for example, has directly influenced research currently being conducted by Beth Barnes and, separately, Dan Hendrycks. Peter Railton’s lectures on AI have provided a compelling justification for further research on cooperative behavior in AI agents. Evan et al.'s exploration into truthful AI prompted more technical works into truthful and honest AI.

Since philosophers have historically produced valuable conceptual AI safety work, we believe that introducing more philosophy talent into this research space has the potential to be highly impactful. By offering good incentives, we hope to attract strong philosophy talent with a high likelihood of producing quality conceptual research.

The Program

Our program will be a paid, in-person opportunity running from January to August 2023. Our ideal candidate is a philosophy PhD student or graduate with an interest in AI safety, exceptional research abilities, demonstrated philosophical rigor, self-motivation, and a willingness to spend time working with more technical subjects. No prior experience in AI or machine learning is necessary for this fellowship. There will be an in-depth onboarding program at the start of the fellowship to get the researchers up to speed on the current state of AI/AI safety.

Fellows will receive a $60,000 grant, covered student fees, and a housing stipend to relocate to San Francisco, CA. The program will feature guest lectures from top philosophers and AI safety experts including Nick Bostrom, Peter Railton, Hilary Greaves, Jacob Steinhardt, Rohin Shah, David Kruger, and Victoria Krakovna among others.

As an organization that places a high value on good conceptual researchers, we plan on extending full-time employment offers at our organization to top-performing fellows. Additionally, many institutions such as the Center for Human-Compatible AI (UC Berkeley), the Kavli Center for Ethics, Science, and the Public (UC Berkeley), the Centre for the Governance of AI, the Center on Long-Term Risk, the Global Priorities Institute (University of Oxford), and the Future of Humanity Institute (University of Oxford) have expressed an interest in the skill set this fellowship aims to develop. We plan on connecting fellows with these organizations after the program concludes.

Feel free to contact us at contact@safe.ai with questions, comments, or suggestions. We are looking to get to know more philosophers within EA, so feel free to leave a comment or contact us.

For more information, visit our website: philosophy.safe.ai


We would like to thank Owain Evans, Hilary Greaves, Nick Beckstead, Andreas Mogenson, Emily Perry, Joe Carlsmith, Cameron Buckner, Thomas Woodside, and Ben Levinstein for their assistance in shaping this program.



New Comment
13 comments, sorted by Click to highlight new comments since: Today at 10:02 PM

Really cool to see you're running this, but sad to see that only PhD students can apply. I hope to see someone else fill the gap.

By default, we're looking to primarily bring in philosophy PhDs into our program. However, if an undergraduate with significant prior research experience is interested in applying, we would also consider their application.

I don't think only PhD students can apply. On the website it says either philosophy PhD students, or graduates of a philosophy program, can apply. So I assume e.g. early-career professors would also be welcome to apply.

Oh, I meant PhD students or above.

What's the gap you're referring to? Philosophy undergrads?

Anything below? Undergraduates, people who majored in philosophy, masters students.

Also people who are talented at philosophy but didn’t study it formally in academia

Have you seen Problems in AI Alignment that philosophers could potentially contribute to? (See also additional suggestions in the comments.) Might give your fellows some more topics to research or think about.

ETA: Out of those problems, solving metaphilosophy is currently the highest on my wish list. See this post for my reasons why.

Thanks for the suggestion, Wei, we'll check these out!

Will this program recur, or is this a one-off opportunity? (I'm quite interested, but unfortunately unsure whether I can take seven months off my PhD during this particular academic year.)

Whether the program recurs likely depends on a few different factors including the results of the first iteration of the program. Assuming things all go well; however, we would be excited to run this again next year.

Thanks, Oliver! And am I reading the website correctly that the fellowship is full time, such that participants won't be able to devote any time to their current research agendas (aside from weekends/evenings etc.)? 

The program is full-time, so we do expect fellows to devote full-time working hours towards fellowship research.  Of course, if the research agenda of a participant is aligned with the sort of work we're excited to see in the program, this could be worked on as part of the fellowship. Aside from that, where a participant's research is unrelated to the work being done in the fellowship, they will need to pursue that research in their free time.