Edit: Wow, it seems like a lot of people misconstrued this post as saying that we shouldn't criticize EAs who work on cutting-edge AI capabilities. I included some confusing wording in the original version of this piece and have crossed it out. To be utterly clear, I am talking about people who work on AI safety at large AI labs.

While I was at a party in the Bay Area during EAG, I overheard someone jokingly criticizing their friend for working at a large AI safety org. Since the org is increasing AI capabilities - so the reasoning goes - anyone who works at that org is "selling out" and increasing x-risk.

Although this interaction was a (mostly) harmless joke, I think it reflects a concerning and possibly growing dynamic in the EA community, and my aim in writing this post is to nip it in the bud before it becomes a serious problem. While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback. This is for a few reasons:

  1. People take jobs for a variety of reasons. For example, they might find the compensation packages at OpenAI and Anthropic appealing, relative to those they could get at other AI safety organizations. Also, they might have altruistic reasons to work at the organization: for example, they might sincerely believe that the organization they are working for has a good plan to reduce x-risk, or that their work at the org would be beneficial even if the org as a whole causes harm. If you don't know a person well, you don't have much visibility into what factors they're considering and how they're weighing those factors as they choose a job. Therefore, it is not your place to judge them. Rather than passing judgment, you can ask them why they decided to take a certain job and try to understand their motivations (cf. "Approach disagreements with curiosity").
  2. Relatedly, people don't respond well to unsolicited feedback. For instance, I have gotten a lot of unsolicited advice throughout my adult life, and I find it grating because it reflects a lack of understanding of my specific needs and circumstances. I do seek out advice, but only from people I trust, such as my advisor at 80,000 Hours. It is more polite to ask a person before giving them individual feedback or refrain from giving them feedback unless they ask for it. You can also phrase advice in a more humble way, such as "doing X works well for me because Y", rather than "you should do X because Y" (cf. "Aim to explain, not persuade").
  3. Finally, pitting "AI capabilities" people people who work on safety at big AI labs against "true" AI safety people creates unnecessary division in the EA community. Different AI safety orgs have different strategies for ensuring AGI will be safe, and we don't know which ones will work. In the face of this uncertainty, I think we should be kind and cooperative toward everyone who is trying in good faith to reduce AI risk. In particular, while we can legitimately disagree with an AI org's strategy, we shouldn't pass judgment on individuals who work for those organizations or ostracize them from the community.
Comments21


Sorted by Click to highlight new comments since:

EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback.

I don't think this is a good norm. I think our career choices matter a lot, this community thrives on having norms and a culture of trying to make the world better, and it seems clear to me that someone's standing in the EA community should be affected by their career choice, that's after all where likely the vast majority of their impact on the world will come from. 

I think it's also important to not be a dick about it, and to not pressure people with some kind of implied social consensus. I think it's good for people to approach other EAs who they see having careers that seem harmful or misguided to them, and then have conversations in which they express their concerns and also are honest about whether they think the person is overall causing harm. 

If you don't know a person well, you don't have much visibility into what factors they're considering and how they're weighing those factors as they choose a job. Therefore, it is not your place to judge them.

I don't understand the reasoning here. I agree that there can occasionally be good reasons to work in AI capabilities, and it's hard to tell in-advance for any given individual with very high confidence whether they did not  have any good reason for working in the space, but the net-effect of people working on AI capabilities seems clearly extremely negative to me, and if I hear that someone works on AI capabilities I think they are probably causing great harm, and think it makes sense to act on that despite the remaining uncertainty.

There is currently no career that seems to me to be as harmful as working directly on cutting edge AGI capabilities. I will respect you less if you work in this field, and I do honestly don't want you to be a member in good standing in the EA community if you do this without a very good reason, or contribute majorly in some other way. I might still be interested in having you occasionally  contribute intellectually or provide advise in various ways, and I am of course very open to trade of various forms, but I am not interested in you benefitting from the infrastructure, trust and institutions that usually comes from membership in the community. 

Indeed I am worried that the EA community overall will be net-negative due to causing many people to work on AI capabilities with flimsy justifications just to stay close to the plot of what will happen with AGI, or just for self-enrichment (or things like earning-to-give, which I would consider even more reprehensible than SBF stealing money from FTX customers and then donating that to EA charities). 

Thank you for explaining your position. Like you, I am concerned that organizations like OpenAI and the capabilities race they've created have robbed us of the precious time we need to figure out how to make AGI safe. However, I think we're talking past each other to an extent: importantly, I said that we mostly shouldn't criticize people for the organizations they work at, not for the roles they play in those organizations.

Most ML engineers have a lot of options of where to work, so choosing to work on AI capabilities research when they have a lot of alternatives outside of AI labs seems morally wrong. (On the other hand, given that AI capabilities teams exist, I'd rather they be staffed by engineers who are concerned about AI safety than engineers who aren't.) However, I think there are many roles that plausibly advance AI safety that you could only do at an AI lab, such as promoting self-regulation in the AI industry. I've also heard arguments that advancing AI safety work sometimes requires advancing AI capabilities first. I think this was more true earlier: GPT-2 taught the AI safety community that they need to focus on aligning large language models. But I am really doubtful that it's true now.

In general, if someone is doing AI safety technical or governance work at an AI lab that is also doing capabilities research, it is fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm. It is not acceptable to tell them that their choice of where to work means they are "AI capabilities people" who aren't serious about AI safety. Given that they are working on AI safety, it is likely that they have already weighed the obvious objections to their career choices.

There is also risk of miscommunication: in another interaction I had at another EA-adjacent party, I got lambasted after I told someone that I "work on AI". I quickly clarified that I don't work on cutting-edge stuff, but I feel that I shouldn't have had to do this, especially at a casual event.

Rereading your post, it does make sense now that you were thinking of safety teams at the big labs, but both the title about "selling out" and point #3 about "capabilities people" versus "safety people" made me think you had working on capabilities in mind.

If you think it's "fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm," then I'm confused about the framing of the post as being "please don't criticize EAs who 'sell out'," since this seems like "criticizing" to me. It also seems important to sometimes do this even when unsolicited, contra point 2. If the point is to avoid alienating people by making them feel attacked, then I agree, but the norms proposed here go a lot further than that.

Rereading your post, it does make sense now that you were thinking of safety teams at the big labs, but both the title about "selling out" and point #3 about "capabilities people" versus "safety people" made me think you had working on capabilities in mind.

Yes! I realize that "capabilities people" was not a good choice of words. It's a shorthand based on phrases I've heard people use at events.

In general, if someone is doing AI safety technical or governance work at an AI lab that is also doing capabilities research, it is fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm. It is not acceptable to tell them that their choice of where to work means they are "AI capabilities people" who aren't serious about AI safety. Given that they are working on AI safety, it is likely that they have already weighed the obvious objections to their career choices.

I think this perspective makes more sense than my original understanding of the OP, but I do think it is still misguided. Sadly, it is not very difficult for an organization to just label a job "AI Safety" and then have them work on stuff whose primary aim is to make them more money, in this case by working on things like AI bias, or setting up RLHF pipelines, which might help a bit with some safety, but where the primary result is still billions of additional dollars flowing into AI labs primarily doing scaling-related work. 

I sadly do not think that just because someone is working on "AI Safety" that they have weighed and properly considered the obvious objections to their career choices. Indeed, safety-washing seems easy and common, and if you can just hire top EAs by slapping a safety label on a capabilities position, then we will likely make the world worse.

I do react differently to someone working in a safety position, but I do actually have a separate additional negative judgement if I find out that someone is actually working in capabilities but calling their work safety. I think that kind of deception is increasingly happening, and additionally makes coordinating and working in this space harder.

I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment I'm almost entirely uninformed on that. If anyone's written a summary on where they think these grey-area research areas lie I'd be interested to read it. Scott's recent post was not a bad entry into the genre but obviously just worked a a very high level.

earning-to-give, which I would consider even more reprehensible than SBF stealing money from FTX customers and then donating that to EA charities

 

AI capabilities EtG being morally worse than defrauding-to-give sounds like a strong claim. 

There exist worlds where AI capabilities work is net positive. I appreciate that you may believe that we're unlikely to be in one of those worlds (and I'm sure lots of people on this forum agree).

However, given this uncertainty, it seems surprising to see language as strong as "reprehensible" being used.

AI capabilities EtG being morally worse than defrauding-to-give sounds like a strong claim. 

I mean, I do think causing all of humanity to go extinct is vastly worse than causing large-scale fraud. I of course think both are deeply reprehensible, but I also think that causing humanity's extinction is vastly worse and justifies a much stronger response. 

Of course, working on capabilities is a much smaller probabilistic increase in humanity's extinction than SBF's relatively direct fradulent activities, and I do think this means the average AI capabilities researcher is causing less harm than Sam. But someone founding an organization like OpenAI seems to me to have substantially worse consequences than Sam's actions (of course, for fraud we often have clearer lines we can draw, and norm enforcement should take into account uncertainty and ambiguity as well as whole host of other considerations, and so I don't actually support most people to react to someone working in capability labs to make money the same way as they would if they were to hear someone was participating in fraud, though I think both are deserving of a quite strong response).

There exist worlds where AI capabilities work is net positive.

I know very few people who have thought a lot about AI X-Risk who think that capability work marginally speeding up is good. 

There was a bunch of disagreement on this topic for the last few years, but I think we are now close enough to AGI that almost everyone I know in the space would wish for more time, and for things to marginally slow down. The people who do still think that marginally speeding up is good exist, and there are arguments remaining, but there are of course also arguments for participating in many other atrocities, and the pure existence of someone of sane mind supporting an endeavor of course should not protect it from criticism and should not serve as a strong excuse to do something anyways. 

There were extremely smart and reasonable people supporting the rise of the soviet union and the communist experiment and I of course think those people should be judged extremely negatively in-hindsight given the damage that has caused.

I've never seriously entertained the idea that EA is like a sect - until now. This is really uncanny.

Overall though, I agree with the point that it's possible to raise questions about someone's personal career choices without being unpleasant about it. And that doing this in a sensitive way is likely to be net positive

Setting aside the questions of the impacts of working at these companies, it seems to me like this post prioritizes the warmth and collegiality of the EA community over the effects that our actions could have on the entire rest of the planet in a way that makes me feel pretty nervous. If we're trying in good faith to do the most good, and someone takes a job we think is harmful, it seems like the question should be "how can I express my beliefs in a way that is likely to be heard, to find truth, and not to alienate the person?" rather than "is it polite to express these beliefs at all?" It seems like at least the first two reasons listed would also imply that we shouldn't criticize people in really obviously harmful jobs like cigarette advertising.

It also seems quite dangerous to avoid passing judgment on individuals within the EA community based on our impressions of their work, which, unless I'm missing something, is what this post implies we should do. Saying we should "be kind and cooperative toward everyone who is trying in good faith to reduce AI risk" kind of misses the point, because a lot of the evidence for them "trying in good faith" comes from our observations of their actions. And, if it seems to me that someone's actions make the world worse, the obvious next step is "see what happens if they're presented with an argument that their actions are making the world worse." If they have responses that make sense to me, they're more likely to be acting in good faith. If they don't, this is a significant red flag that they're not trustworthy, regardless of their inner motivations: either factors besides the social impact of their actions are dominating in a way that makes it hard to trust them, or their judgment is bad in a way that makes it hard to trust them. I don't get this information just by asking them open-ended questions; I get it by telling them what I think, in a polite and safe-feeling way.

I think the norms proposed in this post result in people not passing judgment on the individuals working at FTX, which in turn leads to trusting these individuals and trusting the institution that they run. (Indeed, I'm confused at the post's separation between criticizing the decisions/strategies made by institutions and those made by the individuals who make the decisions and choose to further the strategies.) If people had suspicions that FTX was committing fraud or otherwise acting unethically, confronting individuals at FTX with these suspicions -- and forming judgments of the individuals and of FTX -- could have been incredibly valuable.

Weaving these points together: if you think leading AGI labs are acting recklessly, telling this to individuals who work at these labs (in a socially competent way) and critically evaluating their responses seems like a very important thing to do. Preserving a norm of non-criticism also denies these people the information that (1) you think their actions are net-negative and (2) you and others might be forming judgments of them in light of this. If they are acting in good faith, it seems extremely important that they have this information -- worth the risk of an awkward conversation or hurt feelings, both of which are mitigable with social skills.

(Realizing that it would be hypocritical for me not to say this, so I'll add: if you're working on capabilities at an AGI lab, I do think you're probably making us less safe and could do a lot of good by switching to, well, nearly anything else, but especially safety research.)

A single data point: At a party at EAG, I met a developer who worked at Anthropic. I asked for his p(DOOM), and he said 50%. He told me he was working on AI capability. 

I inquired politely about his views on AI safety, and he frankly did not seem to have given the subject much thought. I do not recall making any joke about "selling out", but I may have asked what effect he thought his actions would have on X-risk. 

I don't recall anyone listening, so this was probably not the situation OP is referring to. 

It wasn't 🙂

Hmm, I expected to agree with this post based on the title, but actually find I disagree with a lot of your reasoning. 

For example, if someone has chosen to work for a very harmful organisation because it pays well, that seems like a totally respectable reason to criticise them; it is not acceptable to inflict large harms on humanity for the sake of personal enrichment. Similarly, if someone is doing something very harmful, it is totally acceptable to give them unsolicited feedback! To take a unambiguous example, if someone told me they planned to drunk drive, I would definitely tell them not to, regardless of whether or not they requested feedback, and regardless of whatever secret justifications they might have. Your more humble suggested phrasing -  'not working for OpenAI works well for me because I prefer to not cause the end of the world' - seems absurd in this context. Overall this seems like a recipe for a paralysis of humility.

I agree with the title, because I think it's pretty plausible that EAs working for these organisations is very good. Similarly, I don't think "selling out" is a good mental model for why EAs work for such places, vs the more charitable "they disagree about optimal strategy". My theory of change for how we can successfully navigate AI relies on AI workers being convinced to worry about safety, so I think EAs working for these orgs is (often) good. But if this thesis was wrong, and they were simply endangering mankind with no offsetting benefit, than it seems absurd to think we should bite our tongues. 

While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback.

Do you feel like there are some clearly harmful (legal) jobs where personal criticism is appropriate, or is it that you don't think AI capabilities clears this bar?

If the former this doesn't sound right to me? I agree there are ways of approaching this sort of interaction that are more or less likely to go well, but if I was telling you about my plan to make the world better through my direct work in breeding more feed-efficient chickens so more people could afford meat, or about my plans to engineer avian flu viruses for mammal-to-mammal transmission so we could plan vaccines, I think you'd be pretty reasonable trying to convince me that my work was harmful and I should do something else?

More example harmful jobs, from 80k in 2015:

  • Marketing and R&D for compulsive behaviours such as smoking, alcoholism, gambling, and payday loans

  • Factory farming

  • Homeopathy and other fraudulent medical technologies

  • Patent trolls

  • Lobbying for rent-seeking businesses or industries

  • Weapons research

  • Borderline fraudulent lending or otherwise making a financial firm highly risky

  • Fundraising for a charity that achieves nothing, or does harm

  • Forest clearing

  • Tax minimisation for the super rich

I think it depends a lot on the number of options the person has. Many people in the tech community, especially those from marginalized groups, have told me that they don't have the luxury to avoid jobs they perceive as harmful, such as many jobs in Big Tech and the military. But I think that doesn't apply to the case of someone applying to a capabilities position at OpenAI when they could apply literally anywhere else in the tech industry.

I downvoted this post originally because it originally appeared to be about not criticizing people who are working on AI capabilities at large labs. Now that it's edited to be about not offering unsolicited criticism for people working on AI safety at large labs (with arguments about why we should avoid unsolicited criticism in general), I still disagree, but I've removed my downvote.

If you believe that certain organizations increase the chances of human extinction significantly, then

  • it is fine to criticize these orgs
  • it is fine to criticize the people working at these orgs
  • it is fine to criticize the people working at these orgs even when those people are EAs
  • it is fine to apply additional scrutiny if these people receive extraordinarily high compensation for working at these orgs

That being said, in a lot of cases it is not obvious, whether a person's work at an AI org is net positive or not. It might very well be the case that the overall org is net negative, while an individual person who works at this org has a net positive contribution. But discussing when an individual's contribution to AI is net positive and when not should not be off-limits.

Wow, it seems like a lot of people misconstrued this post as saying that we shouldn't criticize EAs who work on cutting-edge AI capabilities. I included some confusing wording in the original version of this piece and have crossed it out. To be utterly clear, I am talking about people who work on AI safety at large AI labs.

I'm still confused, though: your key bolded "While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback" isn't specific to "people who work on AI safety at large AI labs"? Maybe part of the reaction was people thinking you were talking about AI capabilities work, but I think part of it is also your arguments naturally applying to all sorts of harmful work?

"While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback" isn't specific to "people who work on AI safety at large AI labs"?

That's true. It applies to a wide range of career decisions that could be considered "harmful" or suboptimal from the point of view of EA, such as choosing to develop ML systems for a mental health startup instead of doing alignment work. (I've actually been told "you should work on AI safety" several times, even after I started my current job working on giving tech.)

Curated and popular this week
 ·  · 20m read
 · 
Advanced AI could unlock an era of enlightened and competent government action. But without smart, active investment, we’ll squander that opportunity and barrel blindly into danger. Executive summary See also a summary on Twitter / X. The US federal government is falling behind the private sector on AI adoption. As AI improves, a growing gap would leave the government unable to effectively respond to AI-driven existential challenges and threaten the legitimacy of its democratic institutions. A dual imperative → Government adoption of AI can’t wait. Making steady progress is critical to: * Boost the government’s capacity to effectively respond to AI-driven existential challenges * Help democratic oversight keep up with the technological power of other groups * Defuse the risk of rushed AI adoption in a crisis → But hasty AI adoption could backfire. Without care, integration of AI could: * Be exploited, subverting independent government action * Lead to unsafe deployment of AI systems * Accelerate arms races or compress safety research timelines Summary of the recommendations 1. Work with the US federal government to help it effectively adopt AI Simplistic “pro-security” or “pro-speed” attitudes miss the point. Both are important — and many interventions would help with both. We should: * Invest in win-win measures that both facilitate adoption and reduce the risks involved, e.g.: * Build technical expertise within government (invest in AI and technical talent, ensure NIST is well resourced) * Streamline procurement processes for AI products and related tech (like cloud services) * Modernize the government’s digital infrastructure and data management practices * Prioritize high-leverage interventions that have strong adoption-boosting benefits with minor security costs or vice versa, e.g.: * On the security side: investing in cyber security, pre-deployment testing of AI in high-stakes areas, and advancing research on mitigating the ris
 ·  · 15m read
 · 
In our recent strategy retreat, the GWWC Leadership Team recognised that by spreading our limited resources across too many projects, we are unable to deliver the level of excellence and impact that our mission demands. True to our value of being mission accountable, we've therefore made the difficult but necessary decision to discontinue a total of 10 initiatives. By focusing our energy on fewer, more strategically aligned initiatives, we think we’ll be more likely to ultimately achieve our Big Hairy Audacious Goal of 1 million pledgers donating $3B USD to high-impact charities annually. (See our 2025 strategy.) We’d like to be transparent about the choices we made, both to hold ourselves accountable and so other organisations can take the gaps we leave into account when planning their work. As such, this post aims to: * Inform the broader EA community about changes to projects & highlight opportunities to carry these projects forward * Provide timelines for project transitions * Explain our rationale for discontinuing certain initiatives What’s changing  We've identified 10 initiatives[1] to wind down or transition. These are: * GWWC Canada * Effective Altruism Australia funding partnership * GWWC Groups * Giving Games * Charity Elections * Effective Giving Meta evaluation and grantmaking * The Donor Lottery * Translations * Hosted Funds * New licensing of the GWWC brand  Each of these is detailed in the sections below, with timelines and transition plans where applicable. How this is relevant to you  We still believe in the impact potential of many of these projects. Our decision doesn’t necessarily reflect their lack of value, but rather our need to focus at this juncture of GWWC's development.  Thus, we are actively looking for organisations and individuals interested in taking on some of these projects. If that’s you, please do reach out: see each project's section for specific contact details. Thank you for your continued support as we
 ·  · 11m read
 · 
Our Mission: To build a multidisciplinary field around using technology—especially AI—to improve the lives of nonhumans now and in the future.  Overview Background This hybrid conference had nearly 550 participants and took place March 1-2, 2025 at UC Berkeley. It was organized by AI for Animals for $74k by volunteer core organizers Constance Li, Sankalpa Ghose, and Santeri Tani.  This conference has evolved since 2023: * The 1st conference mainly consisted of philosophers and was a single track lecture/panel. * The 2nd conference put all lectures on one day and followed it with 2 days of interactive unconference sessions happening in parallel and a week of in-person co-working. * This 3rd conference had a week of related satellite events, free shared accommodations for 50+ attendees, 2 days of parallel lectures/panels/unconferences, 80 unique sessions, of which 32 are available on Youtube, Swapcard to enable 1:1 connections, and a Slack community to continue conversations year round. We have been quickly expanding this conference in order to prepare those that are working toward the reduction of nonhuman suffering to adapt to the drastic and rapid changes that AI will bring.  Luckily, it seems like it has been working!  This year, many animal advocacy organizations attended (mostly smaller and younger ones) as well as newly formed groups focused on digital minds and funders who spanned both of these spaces. We also had more diversity of speakers and attendees which included economists, AI researchers, investors, tech companies, journalists, animal welfare researchers, and more. This was done through strategic targeted outreach and a bigger team of volunteers.  Outcomes On our feedback survey, which had 85 total responses (mainly from in-person attendees), people reported an average of 7 new connections (defined as someone they would feel comfortable reaching out to for a favor like reviewing a blog post) and of those new connections, an average of 3