Edit: I was told privately that the grand strategy was unclear. The proposed grand strategy is to:

  1. Convince everyone. Recruit all AI capabilities researchers to transition into AI safety until the alignment problem is solved.
  2. Scale 1-on-1 outreach to AI capabilities researchers (what seems to be working best so far), and learn from experience to optimize outreach strategies.
  3. Exponentially grow the AI safety movement by mentoring newcomers to mentor newcomers to mentor newcomers (especially AI capabilities researchers).
  4. Maximize the social prestige of the AI safety movement (especially from the perspective of AI capabilities researchers).


Original post: AI capabilities research seems to be substantially outpacing AI safety research. It is most likely true that successfully solving the AI alignment problem before the successful development of AGI is critical for the continued survival and thriving of humanity.


Assuming that AI capabilities research continues to outpace AI safety research, the former will eventually result in the most negative externality in history: a significant risk of human extinction. Despite this, a free-rider problem causes AI capabilities research to myopically push forward, both because of market competition and great power competition (e.g., U.S. and China). AI capabilities research is thus analogous to the societal production and usage of fossil fuels, and AI safety research is analogous to green-energy research. We want to scale up and accelerate green-energy research as soon as possible, so that we can halt the negative externalities of fossil fuel use.


My claim: A task that seems extremely effective on expectation (and potentially, maximally effective for a significant number of people) is to recruit AI capabilities researchers into AI safety research. 


More generally, the goal is to convince people (especially, but not limited to, high-impact decision-makers like politicians, public intellectuals, and leaders in relevant companies) of why it is so important that AI safety research outpace AI capabilities research. 


Logan has written an excellent post on the upside of pursuing trial-and-error experience in convincing others of the importance of AI alignment. The upside includes, but is not limited to: the development of an optimized curriculum for how to convince people, and the use of this curriculum to help numerous and widely located movement-builders pitch the importance of AI alignment to high-impact individuals. Logan has recently convinced me to collaborate on this plan. Please let us know if you are interested in collaborating as well! You can email me at pspark@math.harvard.edu.


I would also like to propose the following additional suggestions:


1) For behavioral-science researchers like myself, prioritize research that is laser-focused on (a) practical persuasion, (b) movement-building, (c) how to solve coordination problems (e.g., market competition, U.S.-China competition) in high-risk domains, (d) how to scale up the AI safety research community, (e) how to recruit AI capabilities researchers into AI safety, and (f) how to help them transition. Examples include RCTs (and data collection in general; see Logan’s post!) that investigate effective ways to scale up AI safety research at the expense of AI capabilities research. More generally, scientific knowledge on how to help transition specialists of negative-externality or obsolete occupations would be broadly applicable.


2) For meta-movement-builders, brainstorm and implement practical ways for decentralized efforts around the world to scale up the AI safety research community and to recruit talented people from AI capabilities research. Examples of potential contributions include the development of optimized curricula for training AI safety researchers or for persuading AI capabilities researchers, a well-thought-out grand strategy, and coordination mechanisms. 


3) For marketing specialists, brainstorm and implement practical ways to make the field of AI safety prestigious among the AI researcher community. Potential methods to recruit AI capabilities researchers include providing fellowships and other financial opportunities, organizing enjoyable and well-attended AI-safety social events like retreats and conferences, and inviting prestigious experts to said social events. Wining-and-dining AI researchers might help too, to the extent that effective altruists feel comfortable doing so. Note that a barrier to AI safety researchers’ enthusiasm is the lack of traditional empirical feedback loops. (“I’m proud of this tangible contribution, so I’m going to do this more!”) So, keeping the AI safety research community enthusiastic, social, and prestigious will be in many ways a unique challenge.


An especially strong priority for the AI safety research community is to attract talented AI researchers from all over the world. What really matters is that AI safety research has a monopoly on the relevant talent globally, not just in the San Francisco Bay Area. Ideally, this means continually communicating with AI research communities in other countries, facilitating the immigration of AI researchers, persuading them to pursue AI safety research rather than capabilities research, and creating an environment conducive for immigrant AI researchers to stay (in, for example, the San Francisco Bay Area’s thriving AI safety community).


We should also seek to maximize the long-term prestige differential between AI safety research and AI capabilities research, especially in Silicon Valley (but also among the general public). Building this prestige differential would be difficult but potentially very impactful, given that the strategy of offering financial incentives to poach AI researchers may hit a wall at some point. This is because many of the industry-leading research teams are employed by deep-pocketed companies who are enthusiastic about AI capabilities research. But given that working at a fossil fuel company is not prestigious despite its high pay, a long-term prestige differential between AI safety research and AI capabilities research might be feasible even without a large pay increase in the favor of the former.


Finally, we really need to scale up the AI safety mentoring pipeline. I’ve heard anecdotally that many enthusiastic and talented people interested in transitioning into AI safety are “put on hold” due to a lack of mentors. They have to wait for a long time to be trained, and an even longer time to make any contributions that they are proud of. Sooner or later, they are probably going to lose their enthusiasm. This seems like a substantial loss of value. We should instead aim to be supercharging the growth of the AI safety community. This means the streamlining and widening the availability of AI safety training opportunities, especially to potential and current AI capabilities researchers, should be an absolutely top priority. Please feel free to email me at pspark@math.harvard.edu if you have any ideas on how to do this!


tl;dr. It is extremely important on expectation that AI safety research outpace AI capabilities research. Practically speaking, this means that AI safety research needs to scale up to the point of having a monopoly on talented AI researchers, and this must be true globally rather than just in the San Francisco Bay Area. We should laser-focus our efforts of persuasion, movement-building, and mentoring on this goal. Our grand strategy should be to continue doing so until the bulk of global AI research is on AI safety rather than AI capabilities, or until the AI alignment problem is solved. To put it bluntly, we shouldon all frontsscale up efforts to recruit talented AI capabilities researchers into AI safety research, in order to slow down the former in comparison to the latter.  





More posts like this

Sorted by Click to highlight new comments since:

Thanks for sharing. – I love the spirit of aiming to come up with a strategy that, if successful, would have a shot at significantly moving the needle on prospects for AI alignment. I think this is an important but (perhaps surprisingly) hard challenge, and that a lot of work labeled as AI governance or AI safety is not as impactful as it could be in virtue of not being tied into an overall strategy that aims to attack the full problem.

That being said, I feel fairly strongly that this strategy as stated is not viable, and that you aiming to implement the strategy would come with sufficiently severe risks of harming our prospects for achieving aligned AI that I would strongly advise against moving forward with this. 

I know that you have emailed several people (including me) asking for their opinion on your plan. I want to share my advice and reasoning publicly so other people you may be talking to can save time by referring to my comment and indicating where they agree or disagree.


Here is why I'm very skeptical that you moving forward with your plan is a good idea:

  • I think you massively understate how unlikely your strategy is to succeed:
    • There are a lot of AI researchers. More than 10,000 people attended the machine learning conference NeurIPS (in 2019), and if we include engineers the number is in the hundreds of thousands. Having one-on-one conversations with all of them would require at least hundreds of thousands to millions of person-hours from people who could otherwise do AI safety research or do movement building aimed at potentially more receptive audiences.
    • Several prominent AI researchers have engaged with AI risk arguments for likely dozens of hours if not more (example), and yet they remain unconvinced. So there would very likely be significant resistance against your scheme by leaders in the field, which makes it seem dubious whether you could convince a significant fraction of the field to change gears. 
    • There are significant financial, status, and 'fun' incentives for most AI researchers to keep doing what they doing. You allude to this, but seem to fail to grasp the magnitude of the problem and how hard it would be to solve. Have you ever seen "marketing specialists" convince hundreds of thousands of people to leave high-paying and intellectually rewarding jobs to work on something else (let alone a field that is likely pretty frustrating if not impossible to enter)? (Not even mentioning the issue that any such effort would be competing against the 'marketing' of trillion-dollar companies like Google that have strong incentives to portray themselves as, and actually become, good places to work at.)
    • AI safety arguably isn't a field that can absorb many people right now. Your post sort of acknowledges this when briefly mentioning mentoring bottlenecks, but again in my view fails to grasp with the size and implications of this problem. (And also it's not just about mentoring bottlenecks, but a lack of strategic clarity, much required research being 'preparadigmatic', etc.)
  • Your plan comes with significant risks, which you do not acknowledge at all. Together with the other flaws and gaps I perceive in your reasoning, I consider this a red flag for your fit for executing any project in the vicinity of what you outline here.
    • Poorly implemented versions of your plan can easily backfire: AI researchers might either be substantively unconvinced and become more confident in dismissing AI risk or – and this seems like a rather significant risk to me – would perceive an organized effort to seek one-on-one conversations with them in order to convince them of a particular position as weird or even ethically objectionable.
    • Explaining to a lot of people why AI might be a big deal comes with the risk of putting the idea that they should race toward AGI into the heads of malign or reckless actors.
  • There are many possible strategic priorities for how to advance AI alignment. For instance, an alternative to your strategy would be: 'Find talented and EA-aligned students who are able to contribute to AI alignment or AI governance despite both of these being ill-defined fields marred with wicked problems.' (I.e., roughly speaking, precisely the strategy that is being executed by significant parts of the longtermist EA movement.) And there are significant flaws and gaps in the reasoning that makes you pick out 'move AI capabilities researchers to AI safety' as the preferred strategy.
    • You make it sound like a majority or even "monopoly" of AI researchers needs to work on safety rather than capabilities. However, in fact (and simplifying a bit), we only need as many researchers to work on AI safety as is required to solve the AI alignment problem in time. We don't know how hard this is. It could be that we need twice as much research effort as on AI capabilities, or that we only need one millionth of that.
    • There are some reasons to think that expected returns to both safety and capabilities progress toward AGI are logarithmic. That is, each doubling of total research effort produces the same amount of expected returns. Given that AI capabilities research is a field that is several orders of magnitudes larger then AI safety, this means that the marginal returns of moving people from capabilities to safety research are almost all due to increasing AI safety effort, while the effect from slowing down capabilities is very small. This suggests that the overall best strategy is to scale up AI safety research by targeting whatever audiences leads to most quality-adjusted expected AI safety research hours.

(I also think that, in fact, there is not a clean division between "AI capabilities" and "AI safety" research. For instance, work on interpretability or learning from human feedback arguably significantly contributes to both capabilities and safety. I have bracketed this point because I don't think it is that relevant for the viability of your plan, except perhaps indirectly by providing evidence about your understanding of the AI field.)


To be clear, I think that some of the specific ideas you mention are very good if implemented well. For instance, I do think that better AI safety curricula are very valuable.

However, these viable ideas are usually things that are already happening. There are AI alignment curricula, there are events aimed at scaling up the field, there are efforts to make AI safety seem prestigious to mainstream AI talent, and there even are efforts that are partly aimed at increasing the credibility of AI risk ideas to AI researchers, such as TED talks or books by reputable AI professors or high-profile conferences at which AI researchers can mingle with people more concerned about AI risk.

If you wanted to figure out to which of these efforts you are best placed to contribute, or whether there might be any gaps among current activities, then I'm all for it! I just don't think that trying to tie them into a grand strategy that to me seems flawed in all those places at which it is new and specific, and not new in all those places where it makes sense to me, will be a productive approach.

Thank you very much for the constructive criticisms, Max! I appreciate your honest response, and agree with many of your points.

I am in the process of preparing a (hopefully) well-thought-out response to your comment.

Thanks for the in depth response!

The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).

You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.

You linked the debate between AI researchers, and I remember being extremely disappointed in the way the debate was handled (eg why is Stuart using metaphors? Though I did appreciate Yoshua’s responses). The ideal product I’m thinking of says obvious things like “don’t use metaphors as arguments” and “don’t have a 10 person debate” and “be kind”, along with the actual arguments to present and the most common counter arguments.

This could have negative effects if done wrong, so the next step is to practice on lower stakes people while building the argument-doc. Then, higher stakes people can be approached.

Additionally, a list of why certain “obvious solutions to alignment” fails is useful for pointing out dead-ends in research. For example, any project that relies on orthogonality thesis being wrong is doomed to fail imo.

This is a tangent: The links for scaling alignment are very inadequate, (though I’m very glad they exist!). MLAB had what? 30/500 applicants accepted. AISC had 40/200 accepted (and I talked to one rejected who was very high quality!) Richards course is scaling much faster though and I’m excited about that. I do believe that none of the courses handle “how to do great research” unless you do a mentorship, but I think we can work around that.

The most valuable part of this project I’m interested in personally is a document with the best arguments for alignment and how to effectively go about these conversations (ie finding cruxes).

I agree that this would be valuable, and I'd be excited about empirically informed work on this.

You are most likely aware of this, but in case not I highly recommend reaching out to Vael Gates who has done some highly relevant research on this topic.

I do think it is important to keep in mind that (at least according to my model) what matters is not just the content of the arguments themselves but also the context in which they are made and even the identity of the person making them. 

(I also expect significant variation in which arguments will be most convincing to whom.)

You made a logarithmic claim of improving capabilities, but my model is that 80% of progress is made by a few companies and top universities. Less than 1000 people are pushing general capabilities, so convincing these people to pivot (or the people in charge of these people’s research direction) is high impact.

Yes, I agree that the number of people who are making significant progress toward AGI/TAI specifically is much smaller, and that this makes the project of convincing all of those more feasible.

For the reasons mentioned in my original comment (incentives, failed past attempts, etc.) I suspect I'm still much more pessimistic than you might be that it is possible to convince them if only we found the right arguments, but for all I know it could be worth a shot. I certainly agree that we have not tried to do this as hard as possible (at least not that I'm aware of), and that it's at least possible that a more deliberate strategy could succeed where past efforts have failed.

(This is less directly relevant, but fwiw I don't think that this point counts against expected research returns being logarithmic per se. I think instead it is a point about what counts as research inputs – we should look at doublings of 'AGI-weighted AI research hours', whatever that is exactly.)

That being said, my guess is that in addition to 'trying harder' and empirically testing which arguments work in what contexts, it would be critical to have any new strategy to be informed by an analysis of why past efforts have not been successful (I expect there are useful lessons here) and by close coordination with those in the AI alignment and governance communities who have experience interacting with AI researchers and who care about their relationships with them, how they and AI safety/governance are being perceived as fields by mainstream AI researchers, etc. - both to learn from their experience trying to engage AI researchers and to mitigate risks.

FWIW my intuition is that the best version of a persuasion strategy would also include a significant component of preparing to exploit windows of opportunity – i.e., capitalizing on people being more receptive to AI risk arguments after certain external events like intuitively impressive capability advances, accident 'warning shots', high-profile endorsements of AI risk worries, etc.

I really like the window of opportunity idea.

I am talking to Vael currently thanks to a recommendation from someone else. If there’s other people you know or sources of failed attempts in the past, I’d also appreciate those!

I also agree that a set of really good arguments is great to have but not always sufficient.

Although convincing the top few researchers is important, also convincing the bottom 10,000’s is also important for movement building. The counter argument of “we can’t handle that many people switching careers” is to scale our programs.

Another is just trusting them to figure it out themselves (I want to compare with COVID research, but I’m not sure how well that research went or what incentives there were to make it better or worse), but this isn’t my argument but another’s intuition. I think an additional structure of “we can give quick feedback on your alignment proposal”would help with this.

Thanks for writing this, I think that attempts of getting more people to work on AI Safety seem pretty worth exploring. 

One thought that came to my mind was that it would be great if we could identify AI researchers who have the most potential to contribute to the most bottlenecked issues in AI safety research. One skill seems to be something like "finding structure/good questions/useful framing in a preparadigmatic field". Maybe we could also identify researchers from other technical fields who have shown to be skilled at this type of work and convince them to give AI safety a shot. Furthermore it would maybe help with scouting junior research talent.

Thank you so much for your kind words, Max! I'm extremely grateful.

I completely agree that if (a big if!) we could identify and recruit AI capabilities researchers who could quickly "plug in" to the current AI safety field, and ideally could even contribute novel and promising directions for  "finding structure/good questions/useful framing", that would be extremely effective. Perhaps a maximally effective use of time and resources for many people. 

I also completely agree that experiential learning on how to talent-scout and recruit AI capabilities researchers is likely to be also helpful for recruiting for the AI safety field in general. The transfer will be quite high. (And of course, recruiting junior research talent, etc. will be "easy mode" compared to recruiting AI capabilities researchers.)

I'm definitely in favour of improving the mentoring pipeline and maximising the prestige difference, although I suspect that we can't pull away capabilities researchers at anything near the scale you're envisioning. I do think it is possible to slow capabilities somewhat by pulling away top researchers, and some effort should go into this strategy, but I wouldn't overinvest in this for the purposes of slowing progress since I expect we could only slow things on the order of years. That said, drawing top capabilities researchers will provide us with vital talent for alignment and also make it much easier to recruit other people.

I think this is a fantastic idea, and I wish I had something to add to this post. I wonder if it lacks comments just because it's sufficiently comprehensive and compelling that there isn't really much to criticise or add.

Thank you so much Jay for your kind words! 

If you happen to think of any suggestions, any blind spots of the post, or any constructive criticisms, I'd be extremely excited to hear them! (Either here or in private conversation, whichever you prefer.)

Hey Peter, thanks for writing this up. 

I agree with (and really appreciate)  Max's comment,  so maybe there isn't a need for a grand strategy. However, I suspect that there are probably still many good opportunities to do research to understand and change attitudes and behaviours related to AI safety if that work is carefully co-designed with experts. 

With that in mind, I just wanted to comment to ask that READI be kept in the loop about anything that comes out of this. 

We might be interested in helping in some way. For instance, that could be a literature/practice review of what is known on influencing desired behaviours, surveys to understand related barriers and enables, experimental work to test the impact of potential/ongoing interventions, and/or, brainstorming and disseminating approaches for 'systemic change' that might be effective. 

Ideally, anything we did together would be in collaboration/supervised by individuals with more domain specific expertise (e.g., Max and other people working in the field) who could make sure it is well-planned and useful in expectation and leverage and disseminate resultant insights.  We have a process that has worked well with other projects and that could potentially make sense here also.

Also, have you seen this? https://docs.google.com/document/d/1KqbASWSxcGH1WjXrgfFTaDqmOxn3RWzfVw28mrFP74k/edit#

Thank you so much for your feedback on my post, Peter! I really appreciate it.

It seems like READI is doing some incredible and widely applicable work! I would be extremely excited to collaborate with you, READI, and people working in AI safety on movement-building. Please keep an eye out for a future forum post with some potential ideas on this front! We would love to get your feedback on them as well.

(And thank you very much for letting me know about Vael's extremely important write-up! It is brilliant, and I think everyone in AI safety should read it.)

Curated and popular this week
Relevant opportunities