I'm Buck Shlegeris. I am the CTO of Redwood Research, a nonprofit focused on applied alignment research. Read more about us here: https://www.redwoodresearch.org/
Can you give some examples of other strategies you think seem better?
I think it was unhelpful to refer to “Harry Potter fanfiction” here instead of perhaps “a piece of fiction”—I don’t think it’s actually more implausible that a fanfic would be valuable to read than some other kind of fiction, and your comment ended up seeming to me like it was trying to use the dishonest rhetorical strategy of implying without argument that the work is less likely to be valuable to read because it’s a fanfic.
I found Ezra's grumpy complaints about EA amusing and useful. Maybe 80K should arrange to have more of their guests' children get sick the day before they tape the interviews.
I agree that we should tolerate people who are less well read than GPT-4 :P
For what it’s worth, gpt4 knows what rat means in this context: https://chat.openai.com/share/bc612fec-eeb8-455e-8893-aa91cc317f7d
I think this is a great question. My answers:
My attitude, and the attitude of many of the alignment researchers I know, is that this problem seems really important and neglected, but we overall don't want to stop working on alignment in order to work on this. If I spotted an opportunity for research on this that looked really surprisingly good (e.g. if I thought I'd be 10x my usual productivity when working on it, for some reason), I'd probably take it.
It's plausible that I should spend a weekend sometime trying to really seriously consider what research opportunities are available in this space.
My guess is that a lot of the skills involved in doing a good job of this research are the same as the skills involved in doing good alignment research.
Thanks Lizka. I think you mean to link to this video:
Holden's beliefs on this topic have changed a lot since 2012. See here for more.
I really like this frame. I feel like EAs are somewhat too quick to roll over and accept attacks from dishonest bad actors who hate us for whatever unrelated reason.