Conceptual AI safety researchers aim to help orient the field and clarify its ideas, but in doing so, they must wrestle with imprecise, hard-to-define problems. Part of the difficulty involved in conducting conceptual AI safety research is that it involves abstract thinking about future systems which have yet to be built. Additionally, the concepts involved in conceptual AI safety research (e.g., “power”, “intelligence”, “optimization”, “agency”, etc.) can be particularly challenging to work with.
Philosophers specialize in working through such nebulous problems; in fact, many active fields within philosophy present a similar type of conceptual challenge, and have nothing empirical to lean on. As an example, a philosopher may take up the question of whether or not ethical claims can possess truth values. Questions such as this one cannot be approached by looking carefully at the world, making accurate measurements, or monitoring the ethical behavior of real people. Instead, philosophers must grapple with intuitions, introduce multiple perspectives, and provide arguments for selecting between these perspectives. We think that this skill set makes philosophers especially fit for conceptual research.
Philosophy has already proven itself to be useful for orienting the field of conceptual AI safety. Many of the foundational arguments for AI-risk were philosophical in nature. (As an example, consider Bostrom’s Superintelligence.) More recently, philosophy has proven itself to have direct influence on the important research directions in AI safety. Joseph Carlsmith’s work on power-seeking AI, for example, has directly influenced research currently being conducted by Beth Barnes and, separately, Dan Hendrycks. Peter Railton’s lectures on AI have provided a compelling justification for further research on cooperative behavior in AI agents. Evan et al.'s exploration into truthful AI prompted more technical works into truthful and honest AI.
Since philosophers have historically produced valuable conceptual AI safety work, we believe that introducing more philosophy talent into this research space has the potential to be highly impactful. By offering good incentives, we hope to attract strong philosophy talent with a high likelihood of producing quality conceptual research.
Our program will be a paid, in-person opportunity running from January to August 2023. Our ideal candidate is a philosophy PhD student or graduate with an interest in AI safety, exceptional research abilities, demonstrated philosophical rigor, self-motivation, and a willingness to spend time working with more technical subjects. No prior experience in AI or machine learning is necessary for this fellowship. There will be an in-depth onboarding program at the start of the fellowship to get the researchers up to speed on the current state of AI/AI safety.
Fellows will receive a $60,000 grant, covered student fees, and a housing stipend to relocate to San Francisco, CA. The program will feature guest lectures from top philosophers and AI safety experts including Nick Bostrom, Peter Railton, Hilary Greaves, Jacob Steinhardt, Rohin Shah, David Kruger, and Victoria Krakovna among others.
As an organization that places a high value on good conceptual researchers, we plan on extending full-time employment offers at our organization to top-performing fellows. Additionally, many institutions such as the Center for Human-Compatible AI (UC Berkeley), the Kavli Center for Ethics, Science, and the Public (UC Berkeley), the Centre for the Governance of AI, the Center on Long-Term Risk, the Global Priorities Institute (University of Oxford), and the Future of Humanity Institute (University of Oxford) have expressed an interest in the skill set this fellowship aims to develop. We plan on connecting fellows with these organizations after the program concludes.
Feel free to contact us at firstname.lastname@example.org with questions, comments, or suggestions. We are looking to get to know more philosophers within EA, so feel free to leave a comment or contact us.
For more information, visit our website: philosophy.safe.ai.
We would like to thank Owain Evans, Hilary Greaves, Nick Beckstead, Andreas Mogenson, Emily Perry, Joe Carlsmith, Cameron Buckner, Thomas Woodside, and Ben Levinstein for their assistance in shaping this program.