Edit: Wow, it seems like a lot of people misconstrued this post as saying that we shouldn't criticize EAs who work on cutting-edge AI capabilities. I included some confusing wording in the original version of this piece and have crossed it out. To be utterly clear, I am talking about people who work on AI safety at large AI labs.

While I was at a party in the Bay Area during EAG, I overheard someone jokingly criticizing their friend for working at a large AI safety org. Since the org is increasing AI capabilities - so the reasoning goes - anyone who works at that org is "selling out" and increasing x-risk.

Although this interaction was a (mostly) harmless joke, I think it reflects a concerning and possibly growing dynamic in the EA community, and my aim in writing this post is to nip it in the bud before it becomes a serious problem. While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback. This is for a few reasons:

  1. People take jobs for a variety of reasons. For example, they might find the compensation packages at OpenAI and Anthropic appealing, relative to those they could get at other AI safety organizations. Also, they might have altruistic reasons to work at the organization: for example, they might sincerely believe that the organization they are working for has a good plan to reduce x-risk, or that their work at the org would be beneficial even if the org as a whole causes harm. If you don't know a person well, you don't have much visibility into what factors they're considering and how they're weighing those factors as they choose a job. Therefore, it is not your place to judge them. Rather than passing judgment, you can ask them why they decided to take a certain job and try to understand their motivations (cf. "Approach disagreements with curiosity").
  2. Relatedly, people don't respond well to unsolicited feedback. For instance, I have gotten a lot of unsolicited advice throughout my adult life, and I find it grating because it reflects a lack of understanding of my specific needs and circumstances. I do seek out advice, but only from people I trust, such as my advisor at 80,000 Hours. It is more polite to ask a person before giving them individual feedback or refrain from giving them feedback unless they ask for it. You can also phrase advice in a more humble way, such as "doing X works well for me because Y", rather than "you should do X because Y" (cf. "Aim to explain, not persuade").
  3. Finally, pitting "AI capabilities" people people who work on safety at big AI labs against "true" AI safety people creates unnecessary division in the EA community. Different AI safety orgs have different strategies for ensuring AGI will be safe, and we don't know which ones will work. In the face of this uncertainty, I think we should be kind and cooperative toward everyone who is trying in good faith to reduce AI risk. In particular, while we can legitimately disagree with an AI org's strategy, we shouldn't pass judgment on individuals who work for those organizations or ostracize them from the community.

-4

0
0

Reactions

0
0

More posts like this

Comments21
Sorted by Click to highlight new comments since: Today at 1:35 PM

EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback.

I don't think this is a good norm. I think our career choices matter a lot, this community thrives on having norms and a culture of trying to make the world better, and it seems clear to me that someone's standing in the EA community should be affected by their career choice, that's after all where likely the vast majority of their impact on the world will come from. 

I think it's also important to not be a dick about it, and to not pressure people with some kind of implied social consensus. I think it's good for people to approach other EAs who they see having careers that seem harmful or misguided to them, and then have conversations in which they express their concerns and also are honest about whether they think the person is overall causing harm. 

If you don't know a person well, you don't have much visibility into what factors they're considering and how they're weighing those factors as they choose a job. Therefore, it is not your place to judge them.

I don't understand the reasoning here. I agree that there can occasionally be good reasons to work in AI capabilities, and it's hard to tell in-advance for any given individual with very high confidence whether they did not  have any good reason for working in the space, but the net-effect of people working on AI capabilities seems clearly extremely negative to me, and if I hear that someone works on AI capabilities I think they are probably causing great harm, and think it makes sense to act on that despite the remaining uncertainty.

There is currently no career that seems to me to be as harmful as working directly on cutting edge AGI capabilities. I will respect you less if you work in this field, and I do honestly don't want you to be a member in good standing in the EA community if you do this without a very good reason, or contribute majorly in some other way. I might still be interested in having you occasionally  contribute intellectually or provide advise in various ways, and I am of course very open to trade of various forms, but I am not interested in you benefitting from the infrastructure, trust and institutions that usually comes from membership in the community. 

Indeed I am worried that the EA community overall will be net-negative due to causing many people to work on AI capabilities with flimsy justifications just to stay close to the plot of what will happen with AGI, or just for self-enrichment (or things like earning-to-give, which I would consider even more reprehensible than SBF stealing money from FTX customers and then donating that to EA charities). 

Thank you for explaining your position. Like you, I am concerned that organizations like OpenAI and the capabilities race they've created have robbed us of the precious time we need to figure out how to make AGI safe. However, I think we're talking past each other to an extent: importantly, I said that we mostly shouldn't criticize people for the organizations they work at, not for the roles they play in those organizations.

Most ML engineers have a lot of options of where to work, so choosing to work on AI capabilities research when they have a lot of alternatives outside of AI labs seems morally wrong. (On the other hand, given that AI capabilities teams exist, I'd rather they be staffed by engineers who are concerned about AI safety than engineers who aren't.) However, I think there are many roles that plausibly advance AI safety that you could only do at an AI lab, such as promoting self-regulation in the AI industry. I've also heard arguments that advancing AI safety work sometimes requires advancing AI capabilities first. I think this was more true earlier: GPT-2 taught the AI safety community that they need to focus on aligning large language models. But I am really doubtful that it's true now.

In general, if someone is doing AI safety technical or governance work at an AI lab that is also doing capabilities research, it is fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm. It is not acceptable to tell them that their choice of where to work means they are "AI capabilities people" who aren't serious about AI safety. Given that they are working on AI safety, it is likely that they have already weighed the obvious objections to their career choices.

There is also risk of miscommunication: in another interaction I had at another EA-adjacent party, I got lambasted after I told someone that I "work on AI". I quickly clarified that I don't work on cutting-edge stuff, but I feel that I shouldn't have had to do this, especially at a casual event.

Rereading your post, it does make sense now that you were thinking of safety teams at the big labs, but both the title about "selling out" and point #3 about "capabilities people" versus "safety people" made me think you had working on capabilities in mind.

If you think it's "fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm," then I'm confused about the framing of the post as being "please don't criticize EAs who 'sell out'," since this seems like "criticizing" to me. It also seems important to sometimes do this even when unsolicited, contra point 2. If the point is to avoid alienating people by making them feel attacked, then I agree, but the norms proposed here go a lot further than that.

Rereading your post, it does make sense now that you were thinking of safety teams at the big labs, but both the title about "selling out" and point #3 about "capabilities people" versus "safety people" made me think you had working on capabilities in mind.

Yes! I realize that "capabilities people" was not a good choice of words. It's a shorthand based on phrases I've heard people use at events.

In general, if someone is doing AI safety technical or governance work at an AI lab that is also doing capabilities research, it is fair game to tell them that you think their approach will be ineffective or that they should consider switching to a role at another organization to avoid causing accidental harm. It is not acceptable to tell them that their choice of where to work means they are "AI capabilities people" who aren't serious about AI safety. Given that they are working on AI safety, it is likely that they have already weighed the obvious objections to their career choices.

I think this perspective makes more sense than my original understanding of the OP, but I do think it is still misguided. Sadly, it is not very difficult for an organization to just label a job "AI Safety" and then have them work on stuff whose primary aim is to make them more money, in this case by working on things like AI bias, or setting up RLHF pipelines, which might help a bit with some safety, but where the primary result is still billions of additional dollars flowing into AI labs primarily doing scaling-related work. 

I sadly do not think that just because someone is working on "AI Safety" that they have weighed and properly considered the obvious objections to their career choices. Indeed, safety-washing seems easy and common, and if you can just hire top EAs by slapping a safety label on a capabilities position, then we will likely make the world worse.

I do react differently to someone working in a safety position, but I do actually have a separate additional negative judgement if I find out that someone is actually working in capabilities but calling their work safety. I think that kind of deception is increasingly happening, and additionally makes coordinating and working in this space harder.

I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment I'm almost entirely uninformed on that. If anyone's written a summary on where they think these grey-area research areas lie I'd be interested to read it. Scott's recent post was not a bad entry into the genre but obviously just worked a a very high level.

earning-to-give, which I would consider even more reprehensible than SBF stealing money from FTX customers and then donating that to EA charities

 

AI capabilities EtG being morally worse than defrauding-to-give sounds like a strong claim. 

There exist worlds where AI capabilities work is net positive. I appreciate that you may believe that we're unlikely to be in one of those worlds (and I'm sure lots of people on this forum agree).

However, given this uncertainty, it seems surprising to see language as strong as "reprehensible" being used.

AI capabilities EtG being morally worse than defrauding-to-give sounds like a strong claim. 

I mean, I do think causing all of humanity to go extinct is vastly worse than causing large-scale fraud. I of course think both are deeply reprehensible, but I also think that causing humanity's extinction is vastly worse and justifies a much stronger response. 

Of course, working on capabilities is a much smaller probabilistic increase in humanity's extinction than SBF's relatively direct fradulent activities, and I do think this means the average AI capabilities researcher is causing less harm than Sam. But someone founding an organization like OpenAI seems to me to have substantially worse consequences than Sam's actions (of course, for fraud we often have clearer lines we can draw, and norm enforcement should take into account uncertainty and ambiguity as well as whole host of other considerations, and so I don't actually support most people to react to someone working in capability labs to make money the same way as they would if they were to hear someone was participating in fraud, though I think both are deserving of a quite strong response).

There exist worlds where AI capabilities work is net positive.

I know very few people who have thought a lot about AI X-Risk who think that capability work marginally speeding up is good. 

There was a bunch of disagreement on this topic for the last few years, but I think we are now close enough to AGI that almost everyone I know in the space would wish for more time, and for things to marginally slow down. The people who do still think that marginally speeding up is good exist, and there are arguments remaining, but there are of course also arguments for participating in many other atrocities, and the pure existence of someone of sane mind supporting an endeavor of course should not protect it from criticism and should not serve as a strong excuse to do something anyways. 

There were extremely smart and reasonable people supporting the rise of the soviet union and the communist experiment and I of course think those people should be judged extremely negatively in-hindsight given the damage that has caused.

I've never seriously entertained the idea that EA is like a sect - until now. This is really uncanny.

Overall though, I agree with the point that it's possible to raise questions about someone's personal career choices without being unpleasant about it. And that doing this in a sensitive way is likely to be net positive

Setting aside the questions of the impacts of working at these companies, it seems to me like this post prioritizes the warmth and collegiality of the EA community over the effects that our actions could have on the entire rest of the planet in a way that makes me feel pretty nervous. If we're trying in good faith to do the most good, and someone takes a job we think is harmful, it seems like the question should be "how can I express my beliefs in a way that is likely to be heard, to find truth, and not to alienate the person?" rather than "is it polite to express these beliefs at all?" It seems like at least the first two reasons listed would also imply that we shouldn't criticize people in really obviously harmful jobs like cigarette advertising.

It also seems quite dangerous to avoid passing judgment on individuals within the EA community based on our impressions of their work, which, unless I'm missing something, is what this post implies we should do. Saying we should "be kind and cooperative toward everyone who is trying in good faith to reduce AI risk" kind of misses the point, because a lot of the evidence for them "trying in good faith" comes from our observations of their actions. And, if it seems to me that someone's actions make the world worse, the obvious next step is "see what happens if they're presented with an argument that their actions are making the world worse." If they have responses that make sense to me, they're more likely to be acting in good faith. If they don't, this is a significant red flag that they're not trustworthy, regardless of their inner motivations: either factors besides the social impact of their actions are dominating in a way that makes it hard to trust them, or their judgment is bad in a way that makes it hard to trust them. I don't get this information just by asking them open-ended questions; I get it by telling them what I think, in a polite and safe-feeling way.

I think the norms proposed in this post result in people not passing judgment on the individuals working at FTX, which in turn leads to trusting these individuals and trusting the institution that they run. (Indeed, I'm confused at the post's separation between criticizing the decisions/strategies made by institutions and those made by the individuals who make the decisions and choose to further the strategies.) If people had suspicions that FTX was committing fraud or otherwise acting unethically, confronting individuals at FTX with these suspicions -- and forming judgments of the individuals and of FTX -- could have been incredibly valuable.

Weaving these points together: if you think leading AGI labs are acting recklessly, telling this to individuals who work at these labs (in a socially competent way) and critically evaluating their responses seems like a very important thing to do. Preserving a norm of non-criticism also denies these people the information that (1) you think their actions are net-negative and (2) you and others might be forming judgments of them in light of this. If they are acting in good faith, it seems extremely important that they have this information -- worth the risk of an awkward conversation or hurt feelings, both of which are mitigable with social skills.

(Realizing that it would be hypocritical for me not to say this, so I'll add: if you're working on capabilities at an AGI lab, I do think you're probably making us less safe and could do a lot of good by switching to, well, nearly anything else, but especially safety research.)

Hmm, I expected to agree with this post based on the title, but actually find I disagree with a lot of your reasoning. 

For example, if someone has chosen to work for a very harmful organisation because it pays well, that seems like a totally respectable reason to criticise them; it is not acceptable to inflict large harms on humanity for the sake of personal enrichment. Similarly, if someone is doing something very harmful, it is totally acceptable to give them unsolicited feedback! To take a unambiguous example, if someone told me they planned to drunk drive, I would definitely tell them not to, regardless of whether or not they requested feedback, and regardless of whatever secret justifications they might have. Your more humble suggested phrasing -  'not working for OpenAI works well for me because I prefer to not cause the end of the world' - seems absurd in this context. Overall this seems like a recipe for a paralysis of humility.

I agree with the title, because I think it's pretty plausible that EAs working for these organisations is very good. Similarly, I don't think "selling out" is a good mental model for why EAs work for such places, vs the more charitable "they disagree about optimal strategy". My theory of change for how we can successfully navigate AI relies on AI workers being convinced to worry about safety, so I think EAs working for these orgs is (often) good. But if this thesis was wrong, and they were simply endangering mankind with no offsetting benefit, than it seems absurd to think we should bite our tongues. 

A single data point: At a party at EAG, I met a developer who worked at Anthropic. I asked for his p(DOOM), and he said 50%. He told me he was working on AI capability. 

I inquired politely about his views on AI safety, and he frankly did not seem to have given the subject much thought. I do not recall making any joke about "selling out", but I may have asked what effect he thought his actions would have on X-risk. 

I don't recall anyone listening, so this was probably not the situation OP is referring to. 

It wasn't 🙂

While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback.

Do you feel like there are some clearly harmful (legal) jobs where personal criticism is appropriate, or is it that you don't think AI capabilities clears this bar?

If the former this doesn't sound right to me? I agree there are ways of approaching this sort of interaction that are more or less likely to go well, but if I was telling you about my plan to make the world better through my direct work in breeding more feed-efficient chickens so more people could afford meat, or about my plans to engineer avian flu viruses for mammal-to-mammal transmission so we could plan vaccines, I think you'd be pretty reasonable trying to convince me that my work was harmful and I should do something else?

More example harmful jobs, from 80k in 2015:

  • Marketing and R&D for compulsive behaviours such as smoking, alcoholism, gambling, and payday loans

  • Factory farming

  • Homeopathy and other fraudulent medical technologies

  • Patent trolls

  • Lobbying for rent-seeking businesses or industries

  • Weapons research

  • Borderline fraudulent lending or otherwise making a financial firm highly risky

  • Fundraising for a charity that achieves nothing, or does harm

  • Forest clearing

  • Tax minimisation for the super rich

I think it depends a lot on the number of options the person has. Many people in the tech community, especially those from marginalized groups, have told me that they don't have the luxury to avoid jobs they perceive as harmful, such as many jobs in Big Tech and the military. But I think that doesn't apply to the case of someone applying to a capabilities position at OpenAI when they could apply literally anywhere else in the tech industry.

I downvoted this post originally because it originally appeared to be about not criticizing people who are working on AI capabilities at large labs. Now that it's edited to be about not offering unsolicited criticism for people working on AI safety at large labs (with arguments about why we should avoid unsolicited criticism in general), I still disagree, but I've removed my downvote.

If you believe that certain organizations increase the chances of human extinction significantly, then

  • it is fine to criticize these orgs
  • it is fine to criticize the people working at these orgs
  • it is fine to criticize the people working at these orgs even when those people are EAs
  • it is fine to apply additional scrutiny if these people receive extraordinarily high compensation for working at these orgs

That being said, in a lot of cases it is not obvious, whether a person's work at an AI org is net positive or not. It might very well be the case that the overall org is net negative, while an individual person who works at this org has a net positive contribution. But discussing when an individual's contribution to AI is net positive and when not should not be off-limits.

Wow, it seems like a lot of people misconstrued this post as saying that we shouldn't criticize EAs who work on cutting-edge AI capabilities. I included some confusing wording in the original version of this piece and have crossed it out. To be utterly clear, I am talking about people who work on AI safety at large AI labs.

I'm still confused, though: your key bolded "While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback" isn't specific to "people who work on AI safety at large AI labs"? Maybe part of the reaction was people thinking you were talking about AI capabilities work, but I think part of it is also your arguments naturally applying to all sorts of harmful work?

"While it is fine to criticize organizations in the EA community for actions that may cause harm, EAs should avoid scrutinizing other community members' personal career choices unless those individuals ask them for feedback" isn't specific to "people who work on AI safety at large AI labs"?

That's true. It applies to a wide range of career decisions that could be considered "harmful" or suboptimal from the point of view of EA, such as choosing to develop ML systems for a mental health startup instead of doing alignment work. (I've actually been told "you should work on AI safety" several times, even after I started my current job working on giving tech.)

Curated and popular this week
Relevant opportunities