Can you describe exactly how much you think the average person, or average AI researcher, is willing to sacrifice on a personal level for a small chance at saving humanity? Are they willing to halve their income for the next ten years? Reduce by 90%?
I think in a world where there was a top down societal effort to try to reduce alignment risk, you might see different behavior. In the current world, I think the "personal choice" framework really is how it works because (for better or worse) there is not (yet) strong moral or social values attached to capability vs safety work.
Here's something else I'd like to know on that survey:
Surveys of these types are often anonymous, because
EA has copped a lot of media criticism lately. Some of it (especially the stuff more directly associated with FTX) is well-deserved. There are some other loud critics who seem to be motivated by personal vendettas and/or seem to fundamentally object with the movement's core aims and values, but rather than tackling those head-on, seem to be trying to simply through everything that'll stick, no matter how flimsy.
None of that excuses dismissal of the concerning patterns of abuse you've raised, but I think it explains some of the defensiveness around here right now.
It sounds like you want to engage constructively to reduce abuse in the community, and I appreciate that. The community will be stronger in the long run if it can be a safer and more welcoming space.
I know we're a bunch of weirdos with a very specific set of subcultural tics, but I hope everyone appreciates your efforts to help. I think people here really are unusually motivated to do good and there is a lot of goodwill as a result. On the other hand, I think a lot of that is ego driven. And it's a very nerdy culture, male-dominated and probably many people here have a predictable set of blind spots as a result.
Wish I had more to say, or could do more to help, but I'm not in the bay area, don't work in tech, and don't have very much context for the cultural problems you're encountering.
I don't mean this as a comment on the particular case reported in the TIME article, though I'd reject using naive base rate calculations as a last word on someone's probabilistic guilt. but the "only 2-3% of allegations are false" stuck out to me because I read a better estimate is probably more like 2-10%. https://slatestarcodex.com/2014/02/17/lies-damned-lies-and-social-media-part-5-of-∞/ there's a lot of ambiguity here--issues like not every "report" is an "allegation" because sometimes reports don't name a perpetrator. I have no idea what the correct figure is, but it seems to me the 2-3% figure gets probably bandied around a lot probably with a sense of precision and finality that isn't warranted by the evidence. Happy to see new evidence or information to the contrary, and whether the rate is 2% or 10% it can certainly be described as "low".
The article is a nearly a decade old, and for all I know there might be newer research. But I hope when people are thinking about best practices for the future, they do so on the basis of the best evidence available.
I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment I'm almost entirely uninformed on that. If anyone's written a summary on where they think these grey-area research areas lie I'd be interested to read it. Scott's recent post was not a bad entry into the genre but obviously just worked a a very high level.