I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment I'm almost entirely uninformed on that. If anyone's written a summary on where they think these grey-area research areas lie I'd be interested to read it. Scott's recent post was not a bad entry into the genre but obviously just worked a a very high level.

Can you describe exactly how much you think the average person, or average AI researcher, is willing to sacrifice on a personal level for a small chance at saving humanity? Are they willing to halve their income for the next ten years? Reduce by 90%?

I think in a world where there was a top down societal effort to try to reduce alignment risk, you might see different behavior. In the current world, I think the "personal choice" framework really is how it works because (for better or worse) there is not (yet) strong moral or social values attached to capability vs safety work.

Here's something else I'd like to know on that survey:

  • what proportion of respondents wants to post on EAF or engage in other discussions they think are important for EA's goals, but don't, or will only do so anonymously because they are worried about the consequences?
  • how does that compare to the proportion who feel free to contribute without fear of retribution?
  • what proportion thinks they have been in fact passed over for an opportunity because they have criticized EA or said something else "politically incorrect" here?

Surveys of these types are often anonymous, because 

  • while it is possible for people to make false responses, that doesn't happen very much, because it is time consuming, and unethical, and there just aren't that many people out there who are all of unethical, have lots of time on their hands, and want to manipulate our survey. Manipulated responses are generally more of a danger for short polls (e.g., "which political party would you vote for"), but less of an issue for 10 minute + surveys.
  • there are means of probabilistically filtering false responses out, including eliminating identical copies of responses, comparing IP addresses, and so on
  • It is quite expensive, difficult, and risky to verify identities and at the same time, guaranteeing anonymity
  • For that reason verifying identities can discourage genuine responses

This is a great idea. EA already runs an annual community survey. So it wouldn't necessary to create a whole new survey to get this data --just add some questions to the existing community survey. If they aren't already on there it would be great to see them on the next survey.

EA has copped a lot of media criticism lately. Some of it (especially the stuff more directly associated with FTX) is well-deserved. There are some other loud critics who seem to be motivated by personal vendettas and/or seem to fundamentally object with the movement's core aims and values, but rather than tackling those head-on, seem to be trying to simply through everything that'll stick, no matter how flimsy. 

None of that excuses dismissal of the concerning patterns of abuse you've raised, but I think it explains some of the defensiveness around here right now.


It sounds like you want to engage constructively to reduce abuse in the community, and I appreciate that. The community will be stronger in the long run if it can be a safer and more welcoming space.

I know we're a bunch of weirdos with a very specific set of subcultural tics, but I hope everyone appreciates your efforts to help. I think people here really are unusually motivated to do good and there is a lot of goodwill as a result. On the other hand, I think a lot of that is ego driven. And it's a very nerdy culture, male-dominated and probably many people here have a predictable set of blind spots as a result.

Wish I had more to say, or could do more to help, but I'm not in the bay area, don't work in tech, and don't have very much context for the cultural problems you're encountering.


Thank you for your explanation. I appreciate you taking the time to explain your reasoning on that point and find it useful for being confident in the rest of what you have to say here.


I don't mean this as a comment on the particular case reported in the TIME article, though I'd reject using naive base rate calculations as a last word on someone's probabilistic guilt. but the "only 2-3% of allegations are false" stuck out to me because I read a better estimate is probably more like 2-10%.∞/ there's a lot of ambiguity here--issues like not every "report" is an "allegation" because sometimes reports don't name a perpetrator. I have no idea what the correct figure is, but it seems to me the 2-3% figure gets probably bandied around a lot probably with a sense of precision and finality that isn't warranted by the evidence. Happy to see new evidence or information to the contrary, and whether the rate is 2% or 10% it can certainly be described as "low".

The article is a nearly a decade old, and for all I know there might be newer research. But I hope when people are thinking about best practices for the future, they do so on the basis of the best evidence available.

