Edit: Wow, it seems like a lot of people misconstrued this post as saying that we shouldn't criticize EAs who work on cutting-edge AI capabilities. I included some confusing wording in the original version of this piece and have crossed it out. To be utterly clear, I am talking about people who work on AI safety at large AI labs.
While I was at a party in the Bay Area during EAG, I overheard someone jokingly criticizing their friend for working at a large AI safety org. Since the org is increasing AI capabilities - so the reasoning goes - anyone who works at that org is "selling out" and increasing x-risk.
Although this interaction was a (mostly) harmless joke, I think it reflects a concerning and possibly growing dynamic in the EA community, and my aim in writing this post is to nip it in the bud before it becomes a serious problem. While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback. This is for a few reasons:
- People take jobs for a variety of reasons. For example, they might find the compensation packages at OpenAI and Anthropic appealing, relative to those they could get at other AI safety organizations. Also, they might have altruistic reasons to work at the organization: for example, they might sincerely believe that the organization they are working for has a good plan to reduce x-risk, or that their work at the org would be beneficial even if the org as a whole causes harm. If you don't know a person well, you don't have much visibility into what factors they're considering and how they're weighing those factors as they choose a job. Therefore, it is not your place to judge them. Rather than passing judgment, you can ask them why they decided to take a certain job and try to understand their motivations (cf. "Approach disagreements with curiosity").
- Relatedly, people don't respond well to unsolicited feedback. For instance, I have gotten a lot of unsolicited advice throughout my adult life, and I find it grating because it reflects a lack of understanding of my specific needs and circumstances. I do seek out advice, but only from people I trust, such as my advisor at 80,000 Hours. It is more polite to ask a person before giving them individual feedback or refrain from giving them feedback unless they ask for it. You can also phrase advice in a more humble way, such as "doing X works well for me because Y", rather than "you should do X because Y" (cf. "Aim to explain, not persuade").
- Finally, pitting
"AI capabilities" peoplepeople who work on safety at big AI labs against "true" AI safety people creates unnecessary division in the EA community. Different AI safety orgs have different strategies for ensuring AGI will be safe, and we don't know which ones will work. In the face of this uncertainty, I think we should be kind and cooperative toward everyone who is trying in good faith to reduce AI risk. In particular, while we can legitimately disagree with an AI org's strategy, we shouldn't pass judgment on individuals who work for those organizations or ostracize them from the community.
I don't think this is a good norm. I think our career choices matter a lot, this community thrives on having norms and a culture of trying to make the world better, and it seems clear to me that someone's standing in the EA community should be affected by their career choice, that's after all where likely the vast majority of their impact on the world will come from.
I think it's also important to not be a dick about it, and to not pressure people with some kind of implied social consensus. I think it's good for people to approach other EAs who they see having careers that seem harmful or misguided to them, and then have conversations in which they express their concerns and also are honest about whether they think the person is overall causing harm.
I don't understand the reasoning here. I agree that there can occasionally be good reasons to work in AI capabilities, and it's hard to tell in-advance for any given individual with very high confidence whether they did not have any good reason for working in the space, but the net-effect of people working on AI capabilities seems clearly extremely negative to me, and if I hear that someone works on AI capabilities I think they are probably causing great harm, and think it makes sense to act on that despite the remaining uncertainty.
There is currently no career that seems to me to be as harmful as working directly on cutting edge AGI capabilities. I will respect you less if you work in this field, and I do honestly don't want you to be a member in good standing in the EA community if you do this without a very good reason, or contribute majorly in some other way. I might still be interested in having you occasionally contribute intellectually or provide advise in various ways, and I am of course very open to trade of various forms, but I am not interested in you benefitting from the infrastructure, trust and institutions that usually comes from membership in the community.
Indeed I am worried that the EA community overall will be net-negative due to causing many people to work on AI capabilities with flimsy justifications just to stay close to the plot of what will happen with AGI, or just for self-enrichment (or things like earning-to-give, which I would consider even more reprehensible than SBF stealing money from FTX customers and then donating that to EA charities).
I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment I'm almost entirely uninformed on that. If anyone's written a summary on where they think these grey-area research areas lie I'd be interested to read it. Scott's recent post was not a bad entry into the genre but obviously just worked a a very high level.