Hide table of contents

Summary

As part of the EA Strategy fortnight, I am sharing a reflection on my experience doing AI safety movement building over the last year, and why I am more excited about more efforts in the space compared to EA movement-building. This is mostly due to the relative success of AI safety groups compared to EA groups at universities with both (e.g. read about Harvard and MIT updates from this past year here). I expect many of the takeaways to extend beyond the university context. The main reasons AI safety field building seems more impactful are:

  • Experimental data from universities with substantial effort put into EA and AI safety groups: Higher engagement overall, and from individuals with relevant expertise, interests, and skills
  • Stronger object-level focus encourages skill and knowledge accumulation, offers better career capital, and lends itself to engagement from more knowledgeable and senior individuals (including graduate students and professors). 
  • Impartial/future-focused altruism not being a crux for many for working on AI safety
  • Recent developments increasing the salience of potential risks from transformative AI, and decreasing the appeal of the EA community/ideas. 

I also discuss some hesitations and counterarguments, of which the large decrease in neglectedness of existential risk from AI is most salient (and which I have not reflected too much on the implications of yet, though I still agree with the high-level takes this post argues for). 

Context/Why I am writing about this

I helped set up and run the Cambridge Boston Alignment Initiative (CBAI) and the MIT AI Alignment group this past year. I also helped out with Harvard’s AI Safety team programming, along with some broader university AI safety programming (e.g. a retreat, two MLAB-inspired bootcamps, and a 3-week research program on AI strategy). Before this, I ran the Stanford Existential Risks Initiative and effective altruism student group and have supported many other university student groups.

Why AI Safety Field Building over EA Community Building

From my experiences over the past few months, it seems that AI safety field building is generally more impactful than EA movement building for people able to do either well, especially at the university level (under the assumption that reducing AI x-risk is probably the most effective way to do good, which I assume in this article). Here are some reasons for this:

  1. AI-alignment-branded outreach is empirically attracting many more students with relevant skill sets and expertise than EA-branded outreach at universities. 
    1. Anecdotal evidence: At MIT, we received ~5x the number of applications for AI safety programming compared to EA programming, despite similar levels of outreach last year. This ratio was even higher when just considering applicants with relevant backgrounds and accomplishments. Around two dozen winners and top performers of international competitions (math/CS/science olympiads, research competitions) and students with significant research experience engaged with AI alignment programming, but very few engaged with EA programming. 
    2. This phenomenon at MIT has also roughly been matched at Harvard, Stanford, Cambridge, and I’d guess several other universities (though I think the relevant ratios are slightly lower than at MIT). 
    3. It makes sense that things marketed with a specific cause area (e.g. AI rather than EA) are more likely to attract individuals highly skilled, experienced, and interested in topics relevant to the cause area.
  2. Effective cause-area specific direct work and movement building still involves the learning, understanding, and application of many important principles and concepts in EA:
    1. Prioritization/Optimization are relevant, to maximally reduce existential risk.
      1. Relatedly, consequentialism/effectiveness/focusing on producing the best outcomes and what actually works, as well as willingness to pivot, seem important to emphasize as part of strong AI safety programming and discussions. 
      2. Intervention neutrality—Even within AI alignment, there are many ways to contribute: conceptual alignment research, applied technical research, lab governance, policy/government, strategy research, field-building/communications/advocacy, etc. Wisely determining which of these to focus on requires engagement with many principles core to EA.   
      3. (Low confidence) So far, I’ve gotten the impression that the students who have gotten most involved with AIS student groups are orienting to the problem with a “How can I maximally reduce x-risk?” frame, not “Which aspect of the problem seems most intellectually stimulating?”.
    2. The existential vs. non-existential risks distinction remains relevant, to prioritize mitigating the former
      1. This distinction also naturally leads to discussion about population ethics, moral philosophy, altruism (towards future generations), and other related ideas.
    3. Truth-seeking and strong epistemics remain relevant.
      1. Caveat: Empirically, maintaining strong epistemics and a culture of truth-seeking have not been emphasized as much in AIS groups from my experience, and it feels slightly unnatural to do so (though I think the case for its importance can be made pretty straightforwardly given how confusing AI and alignment is, the paucity of feedback loops, and the importance of prioritization given limited time and resources). 
    4. When much of the cause-area specific field-building work is done by EAs, and much of the research/content engaged with is from EAs, people will naturally interact with EAs, and some will be sympathetic to the ideas. 
  3. Cause-area specific movement building incentivizes a strong understanding of cause area object-level content, which both acts as a selection filter (which standard EA community building lacks), and helps make movement-builders better suited to pivot to object-level work. This makes organizing especially appealing for students who might not want to commit to movement building work long-term.
    1. I think it is useful for people running cause-area specific movement building projects (including student groups) to be pretty motivated to have their group maximally mitigate existential risk/improve the long-term future, since doing the aforementioned prioritization well and creating/maintaining strong culture (with e.g. high levels of truth-seeking, and a results-focused framework) is difficult and unlikely without these high-level goals. 
    2. A stronger object-level focus also makes engagement more appealing to individuals with subject matter expertise, like graduate students and professors. Empirically, grad student and professor engagement has been much stronger and more successful with AI safety groups than EA/existential risk focused groups so far. 
  4. The words “effective altruism” do not really elicit what I believe is most important and exciting about EA principles and the community, and what many of us currently think is most important to work on (e.g. global/universal impartial focus, prioritization/optimization, navigating and improving technological development and addressing its risks, etc).
    1. AI risk, existential risk, and longtermism get at some items listed above, but maybe don’t get at prioritization/optimization well. Still, perhaps STEM-heavy cause area programming naturally attracts people interested in applying optimization to real life. 
  5. The reputation of the EA community and name has (justifiably) taken a big hit in light of the several recent scandals, making EA CB look worse. On the other hand, AI alignment has been getting a ton of positive attention and concern from the general public and relevant stakeholders. 
    1. That being said, the effects of the scandals on top university students’ perception of EA seem much smaller than I initially expected (e.g. most people think of the FTX crash as an example of crypto being crazy/fake). According to a Rethink Priorities survey only 20% of people who have heard about EA have heard about FTX. 
  6. Not needing to externally justify expenditures on common-sense altruistic grounds: Many of the community building interventions that seem most exciting involve spending money in ways that seem unusual in a university or common-sense altruistic context (e.g. group organizing salaries and costs, organizing workshops at large venues, renting office spaces). I think that some of these are more socially acceptable when not done in the name of ‘altruism’ or charity even if the group has similar motivations to EA groups in its culture (or at the very least this helps to insulate EA from some negative reputational effects).
  7. Anecdotally, impartial/future-focused altruism is not the primary motivation for a large portion of individuals working full-time on AI existential risk reduction (and maybe the majority). Impartial altruism does not seem like the most compelling way one would get people to seriously consider working on existential risk reduction, as is discussed herehere, and here.


 

Counterarguments and Hesitations

  • I have not been working on AI safety/cause-area specific movement building for long enough (and AIS groups in general have not been very active for long enough) to feel confident that exciting leading indicators will translate into long-term impact. EA community building has a longer track record. The small sample sizes also reduce my confidence in the above takeaways.
  • Perhaps strong philosophical/ethical commitments (as opposed to say visceral urgency/concern and amazement at the capabilities of AI, or its rate of improvement) end up being more important than I currently estimate for long-term changes to career plans and behavior more generally. 
  • Maybe the non-altruistic case for existential risk mitigation isn’t sound, e.g. because someone’s likelihood of being able to contribute is too low to justify working on x-risk reduction, instead of achieving their goals another way. If so, maybe insufficiently altruistically motivated people will realize this and pivot to something else.
  • Figuring out what is true and helpful in the context of AI safety might be sufficiently difficult that the downsides of movement building and outreach (e.g. lower epistemic standards and lower-quality content on e.g. LessWrong/the alignment forum) might outweigh the upsides (e.g. more motivated/talented people working on AI alignment). 
  • AI safety is getting more mainstream than EA. Many of the people I expect to be most impactful would not have initially gotten involved with an AI safety group, but got into EA first and eventually switched to AI (though others like Open Philanthropy would have a better sense of this). The huge increase in discourse and attention on advanced AI might make the usefulness of proactive outreach and education about AI safety much lower moving forward than it was half a year ago. 
  • Historically, AI-alignment-driven writing and field-building seems to have significantly contributed to (speeding up) AI capabilities—potentially more than it has contributed alignment/making the future better. AI alignment field-building might continue (or start to) have this effect. 
    • My current intuition is: AGI hype has gotten high enough that the ratio of median capabilities researchers to safety researchers that would be beneficial from CB is pretty high (maybe >10:1, not sure), and definitely higher than what leading indicators suggest is produced by field-building at the moment.

Conclusion 

On the margin, I’d direct more resources towards AI safety movement building, though I still think EA movement-building can be very valuable and should continue to some extent. I’d be interested in hearing others’ experiences and thoughts on AI safety and other cause area field building compared to EA CB in the comments. 


 

Comments16


Sorted by Click to highlight new comments since:

Explicitly switching to AI only seems like a case of putting all our eggs in one highly speculative basket. We don't know how the case for AI safety will stack up in 10 years: if we commit too hard and it turns out to be overblown, will EA as a movement be over? 

I think the premise that EA will be over because of AI safety community building is confused given this is on the margins and EA movement building literally still exists? There's literally a companion piece to this by Jessica McCurdy about EA community building on AI Safety specific community building. I also don't think this piece makes the case for every resource to go to AI Safety community building. 

In case anyone is interested, here is that piece

Anecdotal evidence: At MIT, we received ~5x the number of applications for AI safety programming compared to EA programming, despite similar levels of outreach last year. This ratio was even higher when just considering applicants with relevant backgrounds and accomplishments. Around two dozen winners and top performers of international competitions (math/CS/science olympiads, research competitions) and students with significant research experience engaged with AI alignment programming, but very few engaged with EA programming. 

Dunno what the exact ratio would look like (since the different groups run somewhat different kinds of events), but we've definitely seen a lot of interest in AIS at Carnegie Mellon as well. There's also not very much overlap between the people who come to AIS things and those who come to EA things.

Thanks, you make a compelling argument for AI safety movement building. I especially like that you have a lot of experience with community building already to draw these conclusions from. However I think you might be (perhaps unintentionally) setting up the impression of a false dichotomy here between general EA community building and AI safety community building.

I might be wrong, but perhaps you are saying that EA should intentionally more heavily support the budding AI alignment community than they are now, and in some cases this community should be prioritised more than funding over other EA groups? That would seem reasonable to me at least. Your conclusion of "On the margin, I’d direct more resources towards AI safety movement building, though I still think EA movement-building can be very valuable and should continue to some extent."  seems to back up my take?

It makes sense to me that EA funds could experiment in investing a decent amount in communities built specifically around AI safety, then gather data for a couple of years and see if it produces both a consistent community and fruitful counterfactual AI safety efforts. Its seems likely these communities could be intertwined and connected with current EA communities to different extents in different places), but it could also be very separate. This might already be an explicit plan which is happening and I've missed it.

Also, initial recruitment numbers only tell part of the effectiveness story. One of the strengths of EA is that people, once joining the community often...
1. Devote a decent part of their life/time/resources to the community and the work
2. Have a decent likelihood of being in it for the long term (This must be quantified somewhere too)

Whether these features would also be present in an AI safety community remain to be seen.

Like titotal said, I don't think a drastic pivot pulling a huge amount of money away from EA community building and towards AI safety groups would be a great strategic move. Putting all our eggs in one basket and leaving established communities high and dry seems like a bad move  - mind you I don't think that will happen anyway.

Final Question "Anecdotally, impartial/future-focused altruism is not the primary motivation for a large portion of individuals working full-time on AI existential risk reduction (and maybe the majority)."  If not this, then what is their motivation outside of perhaps selfish fear for themselves or their family? I'm genuinely intrigued here.

Nice one!

the ratio of median capabilities researchers to safety researchers that would be beneficial from CB is pretty high (maybe >10:1, not sure), and definitely higher than what leading indicators suggest is produced by field-building at the moment.

What's your current best-guess for what the leading indicators would suggest?

I would guess the ratio is pretty skewed in the safety direction (since uni AIS CB is generally not counterfactually getting people interested in AI when they previously weren't, if anything EA might have more of that effect), so maybe something in the 1:10 - 1:50 range (1:20ish point estimate for median capabilities research: median safety research contribution ratio from AIS CB)?

I don't really trust my numbers though. This ratio is also more favorable now than I would have estimated a few months/years ago, when contribution to AGI hype from AIS CB would have seemed much more counterfactual (but also AIS CB seems less counterfactual now that AI x-risk is getting a lot of mainstream coverage). 

I would be surprised if the accurate number is as low as 1:20 or even 1:10. I wish there was more data on this, though it seems a bit difficult to collect since at least for university groups most of the impact (to both capabilities and safety) will occur a few+ years after the students start engaging with the group. 

I also think it depends a lot on what the best opportunities available to them are. It would depend heavily on what opportunities to work on AI safety exist in the near future versus on AI capabilities for people with their aptitudes. 

I agree with this, eg I think I know specific people who went through AIS CB (tho not the recent uni groups because they are younger and there's more lag) and either couldn't or wouldn't find AIS jobs so ended up working in AI capabilities.

Yeah, same. I know of recent university graduates interested in AI safety who are applying for jobs in AI capabilities alongside AI safety jobs. 

It makes me think that what matters more is changing the broader environment to care more about AI existential risk (via better arguments, more safety orgs focused on useful research/policy directions, better resources for existing ML engineers who want to learn about it etc.) rather than specifically convincing individual students to shift to caring about it.

I've also heard people doing SERI MATS for example explicitly talk/joke about this, about how they'd have to work in AI capabilities now if they don't get AI safety jobs 

I'm impressed the ratio is that favourable! One note to be careful of is that just because people start of hyped about AI safety doesn't mean they stay there - there's a decent chance they will swing to the dark side of capabilities, as we sore with Open AI and probably others as well. Just making the point that the starting ratio might look more favourable than after a few years.

Thanks, this is helpful!

Not worsening the current ratio would be a reasonable first guess, and although it depends a lot on how you define safety researchers, I'd say it's effectively somewhere around 20:1.

sorry are you saying that the current ratio of capabilities researchers to safety researchers produced by AIS field-building is 20:1, or that the current ratio of the researchers overall is 20:1?

(If the latter, then I think my original question was insufficiently clear and I should probably edit it).

The second one - I'm addressing what ratio would be beneficial, but maybe you wanted to understand what actually is? 

Curated and popular this week
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal
 ·  · 1m read
 · 
We’ve written a new report on the threat of AI-enabled coups.  I think this is a very serious risk – comparable in importance to AI takeover but much more neglected.  In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here’s a very basic threat model for AI takeover: 1. Humanity develops superhuman AI 2. Superhuman AI is misaligned and power-seeking 3. Superhuman AI seizes power for itself And now here’s a closely analogous threat model for AI-enabled coups: 1. Humanity develops superhuman AI 2. Superhuman AI is controlled by a small group 3. Superhuman AI seizes power for the small group While the report focuses on the risk that someone seizes power over a country, I think that similar dynamics could allow someone to take over the world. In fact, if someone wanted to take over the world, their best strategy might well be to first stage an AI-enabled coup in the United States (or whichever country leads on superhuman AI), and then go from there to world domination. A single person taking over the world would be really bad. I’ve previously argued that it might even be worse than AI takeover. [1] The concrete threat models for AI-enabled coups that we discuss largely translate like-for-like over to the risk of AI takeover.[2] Similarly, there’s a lot of overlap in the mitigations that help with AI-enabled coups and AI takeover risk — e.g. alignment audits to ensure no human has made AI secretly loyal to them, transparency about AI capabilities, monitoring AI activities for suspicious behaviour, and infosecurity to prevent insiders from tampering with training.  If the world won't slow down AI development based on AI takeover risk (e.g. because there’s isn’t strong evidence for misalignment), then advocating for a slow down based on the risk of AI-enabled coups might be more convincing and achieve many of the same goals.  I really want to encourage readers — especially those at labs or governments — to do something
Recent opportunities in Building effective altruism