Chief Technology Officer @ Redwood Research
6144 karmaJoined Sep 2014Berkeley, CA, USA


I'm Buck Shlegeris. I am the CTO of Redwood Research, a nonprofit focused on applied alignment research. Read more about us here: https://www.redwoodresearch.org/


Can you give some examples of other strategies you think seem better?


I think it was unhelpful to refer to “Harry Potter fanfiction” here instead of perhaps “a piece of fiction”—I don’t think it’s actually more implausible that a fanfic would be valuable to read than some other kind of fiction, and your comment ended up seeming to me like it was trying to use the dishonest rhetorical strategy of implying without argument that the work is less likely to be valuable to read because it’s a fanfic.

I found Ezra's grumpy complaints about EA amusing and useful. Maybe 80K should arrange to have more of their guests' children get sick the day before they tape the interviews.

I agree that we should tolerate people who are less well read than GPT-4 :P

For what it’s worth, gpt4 knows what rat means in this context: https://chat.openai.com/share/bc612fec-eeb8-455e-8893-aa91cc317f7d

I think this is a great question. My answers:

  • I think that some plausible alignment schemes seem like they could plausibly involve causing suffering to the AIs. I think that it seems pretty bad to inflict huge amounts of suffering on AIs, both because it's unethical and because it seems potentially inadvisable to make AIs justifiably mad at us.
  • If unaligned AIs are morally valuable, then it's less bad to get overthrown by them, and perhaps we should be aiming to produce successors who we're happier to be overthrown by. See here for discussion. (Obviously the plan A is to align the AIs, but it seems good to know how important it is to succeed at this, and making unaligned but valuable successors seems like a not-totally-crazy plan B.)
Answer by BuckApr 27, 202334

My attitude, and the attitude of many of the alignment researchers I know, is that this problem seems really important and neglected, but we overall don't want to stop working on alignment in order to work on this. If I spotted an opportunity for research on this that looked really surprisingly good (e.g. if I thought I'd be 10x my usual productivity when working on it, for some reason), I'd probably take it.

It's plausible that I should spend a weekend sometime trying to really seriously consider what research opportunities are available in this space.

My guess is that a lot of the skills involved in doing a good job of this research are the same as the skills involved in doing good alignment research.

Thanks Lizka. I think you mean to link to this video: 

Holden's beliefs on this topic have changed a lot since 2012. See here for more.


I really like this frame. I feel like EAs are somewhat too quick to roll over and accept attacks from dishonest bad actors who hate us for whatever unrelated reason.

Load more