87 karmaJoined


I’m a complete outsider of all this, but I get the feeling that it may be impolitic of me to write this comment for reasons I don’t know. If so, please warn me and I can remove it.

Here impressions as an observer over the years. I don’t know what’s going on with OpenAI at the moment – just to preempt disappointment – but I remember what it was like in 2015 when it launched.

  1. Maybe 2015? Elon Musk was said to have read Superintelligence. I was pleasantly surprised because I liked Superintelligence.
  2. Late 2015:
    1. OpenAI was announced and I freaked out. I’m a bit on the mild-mannered side, so my “freaking out” might’ve involved the phrase, “this seems bad.” That’s a very strong statement for me. Also this was probably in my inner dialogue only.
    2. Gleb Tsipursky wrote a long open letter also making the case that this is bad. EY asked him not to publish it (further), and now I can’t find it anymore. (I think his exact words were, “Please don’t.”) I concluded that people must be trying to salvage the situation behind the scenes and that verbal hostility was not helping.
    3. Scott Alexander wrote a similar post. I was confused as to why he published it when Gleb wouldn’t? Maybe Scott didn’t ask? In any case, I was glad to have an article to link to when people asked why I was freaking out about OpenAI.
  3. Early 2016: Bostrom’s paper on openness was posted somewhere online or otherwise made accessible enough that I could read it. I didn’t learn anything importantly new from it, so it seemed to me that those were cached thoughts of Bostrom’s that he had specifically reframed to address openness to get it read by OpenAI people. (The academic equivalent of “freaking out”?) It didn’t seem like a strong critique to me, but perhaps that was the strategically best move to try to redirect the momentum of the organization into a less harmful direction without demanding that it change its branding? I wondered whether EY really didn’t contribute to it or whether he had asked to be removed from the acknowledgements.
    1. I reread the paper a few years later together with some friends. One of them strongly disliked the paper for not telling her anything new or interesting. I liked the paper for being a sensible move to alleviate the risk from OpenAI. That must’ve been one of those few times when two people had completely opposite reactions to a paper without even disagreeing on anything about it.
  4. March 30 (?), 2017: It was April 1 when I read a Facebook post to the effect that Open Phil had made a grant of $30m to OpenAI. OpenAI seemed clearly very bad to me and $30m were way more than all previous grants, so my thoughts were almost literally, “C’mon, an April Fools has to be at least remotely plausible for people to fall for it!” I think I didn’t even click the link that day. Embarrassing. I actually quickly acknowledged that getting a seat on the board of the org to try to stear it into a less destructive direction got to be worth a lot (and that $30m weren’t so much for OpenAI that it would greatly accelerate their AGI development). So after my initial shock had settled, I congratulated Open Phil on that bold move. (Internally. I don’t suppose I talk to people much.)
  5. Later I learned that Paul Christiano and other people I trusted or who were trusted by people I trusted had joined OpenAI. That further alleviated my worry.
  6. OpenAI went on to not publish some models they had generated, showing that they were backing away from their dangerous openness focus.

When Paul Christiano left OpenAI, I heard or read about it in some interview where he also mentioned that he’s unsure whether that’s a good decision on balance but that there are safety-minded people left at OpenAI. On the one hand I really want him to have the maximal amount of time available to pursue IDA and other ideas he might have. But on the other hand, his leaving (and mentioning that others left too) did rekindle that old worry about OpenAI.

I can only send hopes and well wishes to all safety-minded people who are still left at OpenAI!

[Epistemic status: I find the comments here to be one-sided, so I’m mostly filling in some of the missing counterarguments. But I feel strong cognitive dissonance over this topic.]

I’m worried about these developments because of the social filtering and dividing effect that controversy-seeking speakers have and because of the opposition to EA that they can create.

Clarification 1: Note that the Munich group was not worried that their particular talk might harm gender equality but that this idea of Hanson’s might have that effect if it becomes much more popular, and that they don’t want to contribute to that. My worries are in a similar vein. The most likely effect of any individual endorsement, invitation, or talk will likely be small, but I think the expected effect is much more worrying and driven by accumulation and tail risks.

Clarification 2: I’m not concerned with truth-seeking but with controversy-seeking (edit: a nice step-by-step guide). In some cases it’s hard to tell whether someone has a lot of heterodox ideas and lacks a bit in eloquence and so often ruffles feathers, or whether the person has all those heterodox ideas but is particularly attracted to all the attention they get if they say things that are just on the edge of the Overton window.

The second type of people thereby capitalize on the narcissism of small differences to sow discord among sufficiently similar groups of people, which divides the groups and makes them martyr of one and anathema of the other – so a well-known figure in both.

A lot of social movements have succumbed to infighting. If we seem to endorse or insufficiently discourage controversy-seeking, we’re running a great risk of EA also succumbing to infighting, attrition, external opposition, and (avoidable) fragmentation.

It seems only mildly more difficult to me to rephrase virtually any truth-seeking insight in an empathetic way. The worst that can be said, in my opinion, against that is that it raises the bar to expression slightly and disadvantages less eloquent people. Those problems can probably be overcome. Scott Alexander for example reported that friends often ask him whether an idea they have is socially appropriate to mention, and he advises them on it. Asking friends for help seems like a strong option.

And no, this will not prevent everyone from finding fault or affront with your writings, but it will maximize the chances that the only people who continue to find fault with your writings are ones who do it because they’re high on the social power that they wield. This is a real problem but one very separate from the Munich case and other mere withdrawals of endorsement.

Clarification 3: I would also like to keep two more things completely separate. The examples of cancel culture that involve personal threats and forced resignations are (from what little, filtered evidence I’ve seen) completely disproportionate. But there is a proportionate way of responding to controversy-seeking, and I think not inviting someone and maybe even uninviting someone from an event is proportionate.

In fact, if a group of people disapproves of the behavior of a member (the controversy-seeking, not the truth-seeking), it is a well-proven mechanism from cultural evolution to penalize the behavior in proportionate ways. Both tit for tat and the Pavlov strategy work because of such penalties – verbal reprimands, withdrawal of support and endorsement, maybe at some point fines. Because of the ubiquity of such proportionate penalties, it seems to me not neutral but like an endorsement to invite someone and maybe also to fail to uninvite them.

Given what Hanson has written, I find it disproportionate to put him at the center of all these discussions. That seems a lot more stressful than the mere cancellation of an event (without all the ensuing discussion). So please read this as a general comment on movement/community strategy and not as a comment on his character.

Clarification 4: My opinion on the behaviors of Hanson and also Alexander is quite weakly held. I’m mildly skeptical but could easily see myself revising that opinion if I knew them better. My stronger concern is that I see really bad things (such as the examples collected by Larks above) used against good things (such as the social norms espoused by, say, Encompass or some “blue tribe” groups).

I don’t think anyone has figured out the optimal set of social norms yet. But it seems (unintentionally) unfair and divisive to (unintentionally) weaponize the bad behavior of some students or some people on Twitter or Reddit against the empathetic, careful, evolving, feedback-based norms that a lot of blue tribe people want to establish to push back against discrimination, oppression, or even just disrespect. I know a lot of the people in the second camp, and they would never character-assassinate someone, judge someone by an ambiguous hand sign, or I try to get them fired over the research they’re doing.

I want to stress that I think this happens unintentionally. Larks strikes me as having very fair and honest intentions, and the tone of the article is commendable too.


That said, I see a number of ways in which it is risky to invite, cite, refer, or otherwise endorse people that show controversy-seeking behavior. I’ve seen each of these happening, which is worrying, but especially that there are a number of independent paths along which it is risky increases my worry.

Failure to build a strong network:

If you invite/cite/endorse anyone randomly, the invitation/citation/endorsement is uninformative. But no one does that, so onlookers are justified in thinking that you invite them for one or several things that are special about them. Even if they divide the probability evenly over ten ways in which the person is special, of which only one is objectionable, that leaves a much greater than 10% chance that you invited them also for the objectionable reason.

A greater than 10% chance that you invite/cite/endorse someone also because of their objectionable ideas is enough for many smart but not-yet-involved people to stay away from your events. (Reforming tort law is also a sufficiently obscure topic that people will be excused if they think that you invited the speaker for their name rather than the topic.)

A greater than 10% chance that you invite/cite/endorse someone also because of their objectionable ideas is also enough for powerful parties to avoid associating and cooperating with you.


No one quite clearly sees whether the people who endorse or defend controversy-seeking behavior or the people who show controversy-seeking behavior are a vocal, low-status minority, or whether it’s widespread. Even if the majority rejects controversy-seeking behavior but is merely silent about it, that may cause that majority to disassociate from the rest.

I’m thinking of the risk of a majority of EAs disassociating from EA and fragmenting into smaller camps that have clearer antiracist, antisexist, etc. social norms. EA may for example split into separate high and low agreeableness camps that don’t interact.

External opposition:

Outside opposition to EA may ramp up to the point where it is unsafe to use EA vocabulary in public communication. This will make it a lot harder to promote cost-effectiveness analysis for altruistic ends and cause neutrality because they’ll be associated with the right-wing, and the actual right-wing will not actually be interested in them. It may become just as difficult to discuss longtermism in public as it is to discuss genetic modification to increase well-being.

Or, less extremely, the success of some parts of EA may depend on a lot of people. Encompass uses the fitting term people of the global majority for people of color. If animal rights remains a fad among affluent white people because the rest are appalled by the community that comes with it, not a whole lot of animals are helped. This, I think, should be a great concern for animal rights, though, arguably, it’s less of a concern for AI safety because of the very small set of people who’ll likely determine how that will play out.

Attracting bad actors:

Further I see the risk that actual bad actors will pick up on the lax behavioral norms and the great controversy potential, and so will be disproportionately attracted to the community. The’ll be the actual controversy-seeking narcissists who will sow discord wherever they can to be at the center of attention. This will exacerbate the risk of to all the failure modes above and may lead to a complete collapse of the community, because no one else wants to waste such a great share of their time and energy on infighting.

Harm to society:

Finally, EA has become more powerful at an astounding rate. Maybe the current blue tribe norms are fine-tuned to prevent or fight back against discrimination and oppression at a tolerable cost of false positives (just like any law). If EA becomes sufficiently powerful and then promotes different norms, those may be more exploitable, and actual harm – escalating discrimination and oppression – may result.

Conversely, maybe we can also come up with superior behavioral norms. I don’t yet see them, but maybe someone will start a metacharity to research optimal social norms. Maybe a Cause X candidate?

Finally, I think mere disclaimers that your invitation/citation/recommendation is not an endorsement of this and that that the person has said go ~ 80% of the way to solving this problem I think. (Though I could easily be wrong.) Every link to their content could have a small footnote to that effect, every talk invitation a quick preamble to that effect, and verbal recommendations could also come with a quick note. Admittedly, these are very hard to write. Then again others can copy the phrasings they like and save time.