Aron

I agree the implicit strat here doesn’t seem like it’ll make progress on knowing whether the hard problems are real. Lots of the hard problems (generalising well ood, existence of sharp left turns) just don’t seem very related to the easier problems (like making LLMs say nice things), and unless you’re explicitly looking for evidence of hard problems I think you’ll be able to solve the easier problems in ways that won’t generalise (e.g. hammering LLMs with enough human supervision in ways that aren’t scalable, but are sufficient to ‘align’ it).

Effective Altruism Forum
EA Forum

Posts
1

Comments
1

Aron

Posts 1

Comments1

Posts
1

Comments
1