A grand strategy to recruit AI capabilities researchers into AI safety research

Peter S. Park

A grand strategy to recruit AI capabilities researchers into AI safety research

Peter S. Park

5 min read · Apr 15, 2022

Comments 13

Sorted by

New & upvoted

Max_Daniel

Thanks for sharing. – I love the spirit of aiming to come up with a strategy that, if successful, would have a shot at significantly moving the needle on prospects for AI alignment. I think this is an important but (perhaps surprisingly) hard challenge, and that a lot of work labeled as AI governance or AI safety is not as impactful as it could be in virtue of not being tied into an overall strategy that aims to attack the full problem.

That being said, I feel fairly strongly that this strategy as stated is not viable, and that you aiming to implement the strategy would come with sufficiently severe risks of harming our prospects for achieving aligned AI that I would strongly advise against moving forward with this.

I know that you have emailed several people (including me) asking for their opinion on your plan. I want to share my advice and reasoning publicly so other people you may be talking to can save time by referring to my comment and indicating where they agree or disagree.

Here is why I'm very skeptical that you moving forward with your plan is a good idea:

I think you massively understate how unlikely your strategy is to succeed:
- There are a lot of AI researchers. More than 10,000 people attended the machine learning conference NeurIPS (in 2019), and if we include engineers the number is in the hundreds of thousands. Having one-on-one conversations with all of them would require at least hundreds of thousands to millions of person-hours from people who could otherwise do AI safety research or do movement building aimed at potentially more receptive audiences.
- Several prominent AI researchers have engaged with AI risk arguments for likely dozens of hours if not more (example), and yet they remain unconvinced. So there would very likely be significant resistance against your scheme by leaders in the field, which makes it seem dubious whether you could convince a significant fraction of the field to change gears.
- There are significant financial, status, and 'fun' incentives for most AI researchers to keep doing what they doing. You allude to this, but seem to fail to grasp the magnitude of the problem and how hard it would be to solve. Have you ever seen "marketing specialists" convince hundreds of thousands of people to leave high-paying and intellectually rewarding jobs to work on something else (let alone a field that is likely pretty frustrating if not impossible to enter)? (Not even mentioning the issue that any such effort would be competing against the 'marketing' of trillion-dollar companies like Google that have strong incentives to portray themselves as, and actually become, good places to work at.)
- AI safety arguably isn't a field that can absorb many people right now. Your post sort of acknowledges this when briefly mentioning mentoring bottlenecks, but again in my view fails to grasp with the size and implications of this problem. (And also it's not just about mentoring bottlenecks, but a lack of strategic clarity, much required research being 'preparadigmatic', etc.)
Your plan comes with significant risks, which you do not acknowledge at all. Together with the other flaws and gaps I perceive in your reasoning, I consider this a red flag for your fit for executing any project in the vicinity of what you outline here.
- Poorly implemented versions of your plan can easily backfire: AI researchers might either be substantively unconvinced and become more confident in dismissing AI risk or – and this seems like a rather significant risk to me – would perceive an organized effort to seek one-on-one conversations with them in order to convince them of a particular position as weird or even ethically objectionable.
- Explaining to a lot of people why AI might be a big deal comes with the risk of putting the idea that they should race toward AGI into the heads of malign or reckless actors.
There are many possible strategic priorities for how to advance AI alignment. For instance, an alternative to your strategy would be: 'Find talented and EA-aligned students who are able to contribute to AI alignment or AI governance despite both of these being ill-defined fields marred with wicked problems.' (I.e., roughly speaking, precisely the strategy that is being executed by significant parts of the longtermist EA movement.) And there are significant flaws and gaps in the reasoning that makes you pick out 'move AI capabilities researchers to AI safety' as the preferred strategy.
- You make it sound like a majority or even "monopoly" of AI researchers needs to work on safety rather than capabilities. However, in fact (and simplifying a bit), we only need as many researchers to work on AI safety as is required to solve the AI alignment problem in time. We don't know how hard this is. It could be that we need twice as much research effort as on AI capabilities, or that we only need one millionth of that.
- There are some reasons to think that expected returns to both safety and capabilities progress toward AGI are logarithmic. That is, each doubling of total research effort produces the same amount of expected returns. Given that AI capabilities research is a field that is several orders of magnitudes larger then AI safety, this means that the marginal returns of moving people from capabilities to safety research are almost all due to increasing AI safety effort, while the effect from slowing down capabilities is very small. This suggests that the overall best strategy is to scale up AI safety research by targeting whatever audiences leads to most quality-adjusted expected AI safety research hours.

(I also think that, in fact, there is not a clean division between "AI capabilities" and "AI safety" research. For instance, work on interpretability or learning from human feedback arguably significantly contributes to both capabilities and safety. I have bracketed this point because I don't think it is that relevant for the viability of your plan, except perhaps indirectly by providing evidence about your understanding of the AI field.)

To be clear, I think that some of the specific ideas you mention are very good if implemented well. For instance, I do think that better AI safety curricula are very valuable.

However, these viable ideas are usually things that are already happening. There are AI alignment curricula, there are events aimed at scaling up the field, there are efforts to make AI safety seem prestigious to mainstream AI talent, and there even are efforts that are partly aimed at increasing the credibility of AI risk ideas to AI researchers, such as TED talks or books by reputable AI professors or high-profile conferences at which AI researchers can mingle with people more concerned about AI risk.

If you wanted to figure out to which of these efforts you are best placed to contribute, or whether there might be any gaps among current activities, then I'm all for it! I just don't think that trying to tie them into a grand strategy that to me seems flawed in all those places at which it is new and specific, and not new in all those places where it makes sense to me, will be a productive approach.

Peter S. Park

Thank you very much for the constructive criticisms, Max! I appreciate your honest response, and agree with many of your points.

I am in the process of preparing a (hopefully) well-thought-out response to your comment.

Logan Riggs

Thanks for the in depth response!

The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).

You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.

You linked the debate between AI researchers, and I remember being extremely disappointed in the way the debate was handled (eg why is Stuart using metaphors? Though I did appreciate Yoshua’s responses). The ideal product I’m thinking of says obvious things like “don’t use metaphors as arguments” and “don’t have a 10 person debate” and “be kind”, along with the actual arguments to present and the most common counter arguments.

This could have negative effects if done wrong, so the next step is to practice on lower stakes people while building the argument-doc. Then, higher stakes people can be approached.

Additionally, a list of why certain “obvious solutions to alignment” fails is useful for pointing out dead-ends in research. For example, any project that relies on orthogonality thesis being wrong is doomed to fail imo.

This is a tangent: The links for scaling alignment are very inadequate, (though I’m very glad they exist!). MLAB had what? 30/500 applicants accepted. AISC had 40/200 accepted (and I talked to one rejected who was very high quality!) Richards course is scaling much faster though and I’m excited about that. I do believe that none of the courses handle “how to do great research” unless you do a mentorship, but I think we can work around that.

Max_Daniel

The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).

I agree that this would be valuable, and I'd be excited about empirically informed work on this.

You are most likely aware of this, but in case not I highly recommend reaching out to Vael Gates who has done some highly relevant research on this topic.

I do think it is important to keep in mind that (at least according to my model) what matters is not just the content of the arguments themselves but also the context in which they are made and even the identity of the person making them.

(I also expect significant variation in which arguments will be most convincing to whom.)

You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.

Yes, I agree that the number of people who are making significant progress toward AGI/TAI specifically is much smaller, and that this makes the project of convincing all of those more feasible.

For the reasons mentioned in my original comment (incentives, failed past attempts, etc.) I suspect I'm still much more pessimistic than you might be that it is possible to convince them if only we found the right arguments, but for all I know it could be worth a shot. I certainly agree that we have not tried to do this as hard as possible (at least not that I'm aware of), and that it's at least possible that a more deliberate strategy could succeed where past efforts have failed.

(This is less directly relevant, but fwiw I don't think that this point counts against expected research returns being logarithmic per se. I think instead it is a point about what counts as research inputs – we should look at doublings of 'AGI-weighted AI research hours', whatever that is exactly.)

That being said, my guess is that in addition to 'trying harder' and empirically testing which arguments work in what contexts, it would be critical to have any new strategy to be informed by an analysis of why past efforts have not been successful (I expect there are useful lessons here) and by close coordination with those in the AI alignment and governance communities who have experience interacting with AI researchers and who care about their relationships with them, how they and AI safety/governance are being perceived as fields by mainstream AI researchers, etc. - both to learn from their experience trying to engage AI researchers and to mitigate risks.

FWIW my intuition is that the best version of a persuasion strategy would also include a significant component of preparing to exploit windows of opportunity – i.e., capitalizing on people being more receptive to AI risk arguments after certain external events like intuitively impressive capability advances, accident 'warning shots', high-profile endorsements of AI risk worries, etc.

Logan Riggs

I really like the window of opportunity idea.

I am talking to Vael currently thanks to a recommendation from someone else. If there’s other people you know or sources of failed attempts in the past, I’d also appreciate those!

I also agree that a set of really good arguments is great to have but not always sufficient.

Although convincing the top few researchers is important, also convincing the bottom 10,000’s is also important for movement building. The counter argument of “we can’t handle that many people switching careers” is to scale our programs.

Another is just trusting them to figure it out themselves (I want to compare with COVID research, but I’m not sure how well that research went or what incentives there were to make it better or worse), but this isn’t my argument but another’s intuition. I think an additional structure of “we can give quick feedback on your alignment proposal”would help with this.

MaxRa

Thanks for writing this, I think that attempts of getting more people to work on AI Safety seem pretty worth exploring.

One thought that came to my mind was that it would be great if we could identify AI researchers who have the most potential to contribute to the most bottlenecked issues in AI safety research. One skill seems to be something like "finding structure/good questions/useful framing in a preparadigmatic field". Maybe we could also identify researchers from other technical fields who have shown to be skilled at this type of work and convince them to give AI safety a shot. Furthermore it would maybe help with scouting junior research talent.

Peter S. Park

Thank you so much for your kind words, Max! I'm extremely grateful.

I completely agree that if (a big if!) we could identify and recruit AI capabilities researchers who could quickly "plug in" to the current AI safety field, and ideally could even contribute novel and promising directions for "finding structure/good questions/useful framing", that would be extremely effective. Perhaps a maximally effective use of time and resources for many people.

I also completely agree that experiential learning on how to talent-scout and recruit AI capabilities researchers is likely to be also helpful for recruiting for the AI safety field in general. The transfer will be quite high. (And of course, recruiting junior research talent, etc. will be "easy mode" compared to recruiting AI capabilities researchers.)

Chris Leong

I'm definitely in favour of improving the mentoring pipeline and maximising the prestige difference, although I suspect that we can't pull away capabilities researchers at anything near the scale you're envisioning. I do think it is possible to slow capabilities somewhat by pulling away top researchers, and some effort should go into this strategy, but I wouldn't overinvest in this for the purposes of slowing progress since I expect we could only slow things on the order of years. That said, drawing top capabilities researchers will provide us with vital talent for alignment and also make it much easier to recruit other people.

Jay Bailey🔸

I think this is a fantastic idea, and I wish I had something to add to this post. I wonder if it lacks comments just because it's sufficiently comprehensive and compelling that there isn't really much to criticise or add.

Peter S. Park

Thank you so much Jay for your kind words!

If you happen to think of any suggestions, any blind spots of the post, or any constructive criticisms, I'd be extremely excited to hear them! (Either here or in private conversation, whichever you prefer.)

Peter Slattery 🔸

Hey Peter, thanks for writing this up.

I agree with (and really appreciate) Max's comment, so maybe there isn't a need for a grand strategy. However, I suspect that there are probably still many good opportunities to do research to understand and change attitudes and behaviours related to AI safety if that work is carefully co-designed with experts.

With that in mind, I just wanted to comment to ask that READI be kept in the loop about anything that comes out of this.

We might be interested in helping in some way. For instance, that could be a literature/practice review of what is known on influencing desired behaviours, surveys to understand related barriers and enables, experimental work to test the impact of potential/ongoing interventions, and/or, brainstorming and disseminating approaches for 'systemic change' that might be effective.

Ideally, anything we did together would be in collaboration/supervised by individuals with more domain specific expertise (e.g., Max and other people working in the field) who could make sure it is well-planned and useful in expectation and leverage and disseminate resultant insights. We have a process that has worked well with other projects and that could potentially make sense here also.

Peter Slattery 🔸

Also, have you seen this? https://docs.google.com/document/d/1KqbASWSxcGH1WjXrgfFTaDqmOxn3RWzfVw28mrFP74k/edit#

Peter S. Park

Thank you so much for your feedback on my post, Peter! I really appreciate it.

It seems like READI is doing some incredible and widely applicable work! I would be extremely excited to collaborate with you, READI, and people working in AI safety on movement-building. Please keep an eye out for a future forum post with some potential ideas on this front! We would love to get your feedback on them as well.

(And thank you very much for letting me know about Vael's extremely important write-up! It is brilliant, and I think everyone in AI safety should read it.)

Comments

More from the author

EAs should use Signal instead of Facebook Messenger

Peter S. Park·3y ago·6m read

Curated and popular this week

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 5d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

132

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·6d ago·4m read

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·1d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Recent opportunities to take action

$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects

Matt Brooks·2d ago·3m read

132

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·6d ago·4m read

Build a flourishing EA group at the University of Toronto

Joseph Kostousov, Sophia Wan (navarhontes)·1w ago·1m read

Max_Daniel

Here is why I'm very skeptical that you moving forward with your plan is a good idea:

I think you massively understate how unlikely your strategy is to succeed:
- There are a lot of AI researchers. More than 10,000 people attended the machine learning conference NeurIPS (in 2019), and if we include engineers the number is in the hundreds of thousands. Having one-on-one conversations with all of them would require at least hundreds of thousands to millions of person-hours from people who could otherwise do AI safety research or do movement building aimed at potentially more receptive audiences.
- Several prominent AI researchers have engaged with AI risk arguments for likely dozens of hours if not more (example), and yet they remain unconvinced. So there would very likely be significant resistance against your scheme by leaders in the field, which makes it seem dubious whether you could convince a significant fraction of the field to change gears.
- There are significant financial, status, and 'fun' incentives for most AI researchers to keep doing what they doing. You allude to this, but seem to fail to grasp the magnitude of the problem and how hard it would be to solve. Have you ever seen "marketing specialists" convince hundreds of thousands of people to leave high-paying and intellectually rewarding jobs to work on something else (let alone a field that is likely pretty frustrating if not impossible to enter)? (Not even mentioning the issue that any such effort would be competing against the 'marketing' of trillion-dollar companies like Google that have strong incentives to portray themselves as, and actually become, good places to work at.)
- AI safety arguably isn't a field that can absorb many people right now. Your post sort of acknowledges this when briefly mentioning mentoring bottlenecks, but again in my view fails to grasp with the size and implications of this problem. (And also it's not just about mentoring bottlenecks, but a lack of strategic clarity, much required research being 'preparadigmatic', etc.)
Your plan comes with significant risks, which you do not acknowledge at all. Together with the other flaws and gaps I perceive in your reasoning, I consider this a red flag for your fit for executing any project in the vicinity of what you outline here.
- Poorly implemented versions of your plan can easily backfire: AI researchers might either be substantively unconvinced and become more confident in dismissing AI risk or – and this seems like a rather significant risk to me – would perceive an organized effort to seek one-on-one conversations with them in order to convince them of a particular position as weird or even ethically objectionable.
- Explaining to a lot of people why AI might be a big deal comes with the risk of putting the idea that they should race toward AGI into the heads of malign or reckless actors.
There are many possible strategic priorities for how to advance AI alignment. For instance, an alternative to your strategy would be: 'Find talented and EA-aligned students who are able to contribute to AI alignment or AI governance despite both of these being ill-defined fields marred with wicked problems.' (I.e., roughly speaking, precisely the strategy that is being executed by significant parts of the longtermist EA movement.) And there are significant flaws and gaps in the reasoning that makes you pick out 'move AI capabilities researchers to AI safety' as the preferred strategy.
- You make it sound like a majority or even "monopoly" of AI researchers needs to work on safety rather than capabilities. However, in fact (and simplifying a bit), we only need as many researchers to work on AI safety as is required to solve the AI alignment problem in time. We don't know how hard this is. It could be that we need twice as much research effort as on AI capabilities, or that we only need one millionth of that.
- There are some reasons to think that expected returns to both safety and capabilities progress toward AGI are logarithmic. That is, each doubling of total research effort produces the same amount of expected returns. Given that AI capabilities research is a field that is several orders of magnitudes larger then AI safety, this means that the marginal returns of moving people from capabilities to safety research are almost all due to increasing AI safety effort, while the effect from slowing down capabilities is very small. This suggests that the overall best strategy is to scale up AI safety research by targeting whatever audiences leads to most quality-adjusted expected AI safety research hours.

To be clear, I think that some of the specific ideas you mention are very good if implemented well. For instance, I do think that better AI safety curricula are very valuable.