by Agrippa1 min read24th Feb 202240 comments
New Comment
40 comments, sorted by Click to highlight new comments since: Today at 7:49 PM

Why do I keep meeting so many damned capabilities researchers and AI salespeople? 
I thought that we agreed capabilities research was really bad. I thought we agreed that increasing the amount of economic activity in capabiliities was really bad. To me it seems like the single worst thing that I could even do! 

This really seems like a pretty consensus view among EA orthodoxy. So why do I keep meeting so many people who, as far as I can tell, are doing the single worst thing that it's even in their power to do? If there is any legal thing that could get you kicked out of EA spaces, that isn't sexual misconduct, wouldn't it be this?

I'm not even talking about people who maintain that safety/alignment research requires advancing capabilities or might do so. I'm just talking about people who do regular OpenAI or OpenAI competitor shit. 

If you're supposed to be high status in EA for doing good, aren't you supposed to be low status if you do the exact opposite? It honestly makes me feel like I'm going insane. Do EA community norms really demand that I'm supposed to act like something is normal and okay even though we all seem to believe that it really isn't okay at all? 

And yes I think there is a strong argument for ostracization. It seems like you would ostracize somebody for being a nuclear arms industry lobbyist. This seems worse. It's not behaviorally clear that these people care about anything except mild fun and status incentives, so IDK why in the community we would at all align fun and status with doing the most evil thing you can do.

Of course it does seem like 80k is somewhat to blame here since they continue to promote regular-ass jobs at OpenAI in the jobs board as far as I know. Not very clear to me why they do this.

For a lot of people, working on capabilities is the best way to gain skills before working on safety. And if, across your career, you spending half your effort on each goal, that is probably much better than not working on AI at all.

It would be nice to know more about how many EAs are getting into this plan and how many end up working in safety. I don't have the sense that most of them get to the safety half. I also think it is reasonable to believe that no amount of safety research can prevent armageddon, because the outcome of the research may just be "this is not safe", as EY seems to report, and have no impact (the capabilities researchers don't care, or, the fact that we aren't safe yet means they need to keep working in capabilities so that they can help with the safety problem). 

You seem frustrated that some EAs are working on leading AI labs, because you see that as accelerating AI timelines when we are not ready for advanced AI.

Here are some cruxes that might explain why working at leading AI labs might be a good thing:

We are uncertain of the outcomes of advanced AI

AI can be used to solve many problems, including eg poverty and health. It is plausible that we would be harming people who would benefit from this technology by delaying it. 

Also, accelerating progress of space colonization can ultimately give you access to a vast amount of resources, which otherwise we would not be able to physically reach because of the expansion of the universe. Under some worldviews (which I dont personally share), this is a large penalty to waiting.

Having people concerned about safety in leading AI labs is important to ensure a responsible deployment

If EAs systematically avoid working for top AI labs, they will be replaced by less safety-conscious staff. 

Safety-conscious researchers and engineers have done an incredible work setting up safety teams in OpenAI and DeepMind. 

I expect they will also be helpful for coordinating a responsible deployment of advanced AI in the future.

Having a large lead might be helpful to avoid race dynamics

If multiple labs are on the brink of transformative AI, they will be incentivized to cut corners to be the first to cross the finish line. Having fewer leaders can help them coordinate and delay deployment.

There might not be much useful safety research to be done now

Plausibly, AI safety research will need some experimentation and knowledge of future AI paradigms. So there might just not be much you can do to address AI risk right now.


Overall I think crux 2 is very strong, and I lend some credence to crux 1 and crux 3. I dont feel very moved by crux 4 - I think its too early to give up on current safety research, even if only because the current DL paradigm might scale to TAI already.

In any case, I am enormously glad to have safety-conscious researchers in DM and OpenAI. I think ostracizing them would be a huge error.

I agree "having people on the inside" seems useful. At the same time, it's  hard for me to imagine what an "aligned" researcher could have done at the Manhattan Project to lower nuclear risk. That's not meant as a total dismissal, it's just not very clear to me.

> Safety-conscious researchers and engineers have done an incredible work setting up safety teams in OpenAI and DeepMind. 

I don't know much about what successes here have looked like, I agree this is a relevant and important case study.

> I think ostracizing them would be a huge error.
My other comments better reflect my current feelings here.

You know in some sense I see EA as a support group for crazies. Normie reality involves accepting a lot of things as OK that are not OK. If you care a lot in any visceral sense about x risk, or animal welfare, then you are in for a lot of psychic difficulty coping with the world around you. Hell, even just caring about the shit that isn't remotely weird, like effective poverty interventions, is enough to cause psychic damage trying to cope with the way that your entire environment claims to care about helping people and behaviorally just doesn't.

So when I see similar patterns and norms applied to capabilities research, that outside of EA just get applied to everything ("oh you work in gain of function? that sounds neat"), it gives me the jeebs. 

This doesn't invalidate the kind of math @richard_ngo is doing ala "well if we get 1 safety researcher for each 5 capabilities researchers we tolerate/enable, that seems worth it". But I would like less jeebs. 

Is ostracization strategically workable? It seems like the safety community is much smaller than the capabilities community, and so ostracization (except of the most reckless capabilities researchers) could lead to capabilities people reacting in such a way that net turns people away from alignment work, or otherwise hurts the long-term strategic picture.

As a recent counterpoint to some collaborationist messages: https://forum.effectivealtruism.org/posts/KoWW2cc6HezbeDmYE/greg_colbourn-s-shortform?commentId=Cus6idrdtH548XSKZ

"It was disappointing to see that in this recent report by CSET, the default (mainstream) assumption that continued progress in AI capabilities is important was never questioned. Indeed, AI alignment/safety/x-risk is not mentioned once, and all the policy recommendations are to do with accelerating/maintaining the growth of AI capabilities! This coming from an org that OpenPhil has given over $50M to set up."

I'm comfortable  publicly criticising big orgs (I feel that I am independent enough for this), but would be less comfortable publicly criticising individual researchers (I'd be more inclined to try and persuade them to change course toward alignment; I have been trying to sow some seeds in this regard recently with some people keen on creating AGI that I've met).

yeah this is really alarming and aligns with my least possible charitable interpretation of my feelings / data.

it would help if i had a better picture of the size of the EA -> capabilities pipeline relative to not-EA -> capabilities pipeline.

to this point, why don't we take the opposite strategy? [even more] celebration of capabilities research and researchers. this would probably do a lot to ingraciate us. 

It seems like the safety community is much smaller than the capabilities community

my model is that EAs are the coolest and smartest people in the world and that status among them matters to people. so this argument seems weird to me for the same reason that it would be weird if you argued that young earth creationists shouldn't be low status in the community since there are so many of them. 

i mean there seems to be a very considerable EA to capabilities pipeline, even.

i mean if i understand your argument, it can just be applied to anything. shitheads are in the global majority on like any dimension. 

EAs may be the smartest people in your or my social circle, but they are likely not be the smartest people in the social circles of top ML people, for better or for worse. I suspect "coolest" is less well-defined and less commonly shared as a concept, as well.

yes i dont actually think that EAs are the globally highest status in the group in the world. my point here is that local status among EAs does matter to people; absolute numbers of "people in the world who agree with x" seems like a consideration that can be completely misleading in many cases. an implicit theory of change probably needs to be quite focused on local status.

i mean there's a compelling argument i'm vegan due to social pressure from the world's smartest and coolest people. i want the smartest and coolest people in the world to like me and being vegan sure seems to matter there. i don't buy an argument that the smartest and coolest people in the world should do less to align status among them with animal welfare. they seem to be quite locally effective at persuading people. 

like if you think about the people you personally know, who seem to influence people around  them (including yourself) to be much more ethical, i would be quite surprised to learn that hugbox norms got them there.

To me the core tension here is: even if a direct impact sense pure capabilities work is one of the most harmful things you can do (something which I feel fairly uncertain about), it's still also one of the most valuable things you can do, in an upskilling sense. So at least until the point where it's (ballpark) as effective and accessible to upskill in alignment by doing alignment directly rather than by doing capabilites, I think current charitability norms are better than the ostracism norms you propose. (And even after that point, charitability may still be better for talent acquisition, although the tradeoffs are more salient.)

I think this might be reasonable under charitability vs ostracism dichotomy.

However I think we can probably do better. I run a crypto venture group and we take "founders pledge" type stuff very seriously. We want to make strong, specific commitments, before its time to act on them (specifically, all upside past 2M post-tax for any member has to go towards EA crap).

Furthermore, when we talk to people, we don't really expect them (normatively speaking) to think we are aligned unless we emphasize these commitments. I would say we actively push the norm that we shouldnt receive charitability without track record. 

I would really advocate for the same thing here, if anything it seems of greater importance.

That's not to say it's obvious what these commitments should be, since its more straightforward for making money. 

My real point is that in normie land, charitability vs ostracism is the dichotomy. But I think in many cases EA already achieves more nuance, the norms demand proof of altruism in order to cash in on status.

Does that make sense? I think charitability is too strong of a norm and makes it too easy to be evil. I don't even apply it to myself! Even if there are good reasons to do things that are indistinguishable from just being bad, that doesnt mean everyone should just get benefit of the doubt. I do think that specific pledges matter. The threat of conditional shunning matters.

I can only see this backfiring and pushing people further away.

So much for open exchange of ideas

This is a very good point. I think the current sentiment comes from two sides:

  • Not wanting to alienate or make enemies with AI researchers, because alienating them from safety work would be even more catastrophic (this is a good reason)
  • Being intellectually fascinated by AI, and finding it really cool in a nerdy way (this is a bad reason, and I remember someone remarking that Bostroms book might have been hugely net-negative because it made many people more interested in AGI)

I agree that the current level of disincentives for working on capabilities is too low, and I resolve to telling AI capabilities people that I think their work is very harmful, while staying cordial with them.

I also basically feel like the norm is that I can't even begin to have these conversations bc it would violate charity norms. 

I don't think charity norms are good for talking to gain of function researchers or nuclear arms industry lobbyists. Like there are definitely groups of people that, if you just apply charity to them, you're gonna thoughtkill yourself, because they are actually doing bad shit for no good reason. 

I don't wanna be in an environment where I meet gain of function researchers at parties and have to act like they don't scare the shit out of me. 

Maybe I'm just off here about the consensus and nobody cares about what I understand to be the Yudkowsky line. In which case I'd have to ask why people think it's cool to do capabilities work without even a putative safety payoff. IDK I'd just expect at least some social controversy over this crap lol. 

like if at least 20% of the community thinks mundane capabilities work is actually really terrible (and at least 20% does seem to think this, to me), you would think that there would be pretty live debate over the topic? seems pressing and relevant? 

maybe the phrase im looking for is "missing moods" or something. it would be one thing if there was a big fight, everyone drew lines in the sand, and then agreed to get along. but nothing like that happened, i just talked to somebody tonight about their work selling AI and basically got a shrug in response to any ethical questions. so im going crazy.

I, for one, am really glad you raised this.

It seems plausible that some people caught the “AI is cool” bug along with the “EA is cool and nice and well-resourced” bug, and want to work on whatever they can that is AI-related. A justification like “I’ll go work on safety eventually” could be sincere or not.

Charity norms can swing much too far.

I’d be glad to see more 80k and forum talk about AI careers that point to the concerns here.

And I’d be glad to endorse more people doing what Richard mentioned — telling capabilities people that he thinks their work could be harmful while still being respectful.

Well, Holden says in his Appendix to his last post:

 

I don't get it. https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang?commentId=o58cMKKjGp87dzTgx 

I wont associate with people doing serious capabilities research.

https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/openai-general-support 

To me at this point the expected impact of the EA phenomena as a whole is negative. Hope we can right this ship, but things really seem off the rails.

Eliezer said something similar, and he seems similarly upset about it: https://twitter.com/ESYudkowsky/status/1446562238848847877

(FWIW I am also upset about it, I just don't know that I have anything constructive to say)

Eliezer's tweet is about the founding of OpenAI, whereas Agrippa's comment is about a 2017 grant to OpenAI (OpenAI was founded in 2015, so this was not a founding grant). It seems like to argue that Open Phil's grant was net negative (and so strongly net negative as to swamp other EA movement efforts), one would have to compare OpenAI's work in a counterfactual world where it never got the extra $30 million in 2017 (and Holden never joined the board) with the actual world in which those things happened. That seems a lot harder to argue for than what Eliezer is claiming (Eliezer only has to compare a world where OpenAI didn't exist vs the actual world where it does exist).

Personally, I agree with Eliezer that the founding of OpenAI was a terrible idea, but I am pretty uncertain about whether Open Phil's grant was a good or bad idea. Given that OpenAI had already disrupted the "nascent spirit of cooperation" that Eliezer mentions and was going to do things, it seems plausible that buying a board seat for someone with quite a bit of understanding of AI risk is a good idea (though I can also see many reasons it could be a bad idea).

One can also argue that EA memes re AI risk led to the creation of OpenAI, and that therefore EA is net negative (see here for details). But if this is the argument Agrippa wants to make, then I am confused why they decided to link to the 2017 grant.

Has Holden written any updates on outcomes associated with the grant? 

One can also argue that EA memes re AI risk led to the creation of OpenAI, and that therefore EA is net negative (see here for details). But if this is the argument Agrippa wants to make, then I am confused why they decided to link to the 2017 grant.

I am not making this argument but certainly I am alluding to it. EA strategy (weighted by impact) has been to do things that in actuality accelerate timelines, and even cooperate with doing so under the "have a good person standing nearby" theory.

I don't think that lobbying against OpenAI, other adversarial action, would have been that hard. But OpenPhil and other EA leadership of the time decided to ally and hope for the best instead. This seems off the rails to me.

Has Holden written any updates on outcomes associated with the grant?

Not to my knowledge.

I don't think that lobbying against OpenAI, other adversarial action, would have been that hard.

It seems like once OpenAI was created and had disrupted the "nascent spirit of cooperation", even if OpenAI went away (like, the company and all its employees magically disappeared), the culture/people's orientation to AI stuff ("which monkey gets the poison banana" etc.) wouldn't have been reversible. So I don't know if there was anything Open Phil could have done to OpenAI in 2017 to meaningfully change the situation in 2022 (other than like, slowing AI timelines by a bit). Or maybe you mean some more complicated plan like 'adversarial action against OpenAI and any other AI labs that spring up later, and try to bring back the old spirit of cooperation, and get all the top people into DeepMind instead of spreading out among different labs'.

I don't mean to say anything pro DeepMind and I'm not sure there is anything positive to say re: DeepMind.

I think that once the nascent spirit of cooperation is destroyed, you can indeed take the adversarial route. It's not hard to imagine successful lobbying efforts that lead to regulation -- most people are in fact skeptical of tech giants wielding tons of power using AI! Among other things known to slow progress and hinder organizations. It is beyond me why such things are so rarely discussed or considered. I'm sure that Open Phil and 80k open cooperation with OpenAI has a big part in shaping narrative away from this kind of thing.

This post includes some great follow up questions for the future. Has anything been posted re: these follow up questions?

As far as I can tell liberal nonviolence is a very popular norm in EA. At the same time I really cannot thing of anything more mortally violent I could do than to build a doomsday machine. Even if my doomsday machine is actually a 10%-chance-of-doomsday machine or 1% or etcetera (nobody even thinks it's lower than that). How come this norm isn't kicking in? How close to completion does the 10%-chance-of-doomsday machine have to be before gentle kindness is not the prescribed reaction? 

[x-post from a comment]

You know in some sense I see EA as a support group for crazies. Normie reality involves accepting a lot of things as OK that are not OK. If you care a lot in any visceral sense about x risk, or animal welfare, then you are in for a lot of psychic difficulty coping with the world around you. Hell, even just caring about the shit that isn't remotely weird, like effective poverty interventions, is enough to cause psychic damage trying to cope with the way that your entire environment claims to care about helping people and behaviorally just doesn't.

So when I see similar patterns and norms applied to capabilities research, that outside of EA just get applied to everything ("oh you work in gain of function? that sounds neat"), it gives me the jeebs. 

This doesn't invalidate the kind of math @richard_ngo is doing ala "well if we get 1 safety researcher for each 5 capabilities researchers we tolerate/enable, that seems worth it". But I would like less jeebs. 

[original comment: https://forum.effectivealtruism.org/posts/qjsWZJWcvj3ug5Xja/agrippa-s-shortform?commentId=bgf3BJZEyYik9gCti]

My favorite thing about EA has always been the norm that in order to get cred for being altruistic, you actually are supposed to have helped people. This is a great property, just align incentives. But now re: OpenAI I so often hear people say that gentle kindness is the only way, if you are openly adversarial then they will just do the opposite of what you want even more. So much for aligning incentives.