Cross-posting a good question from Reddit. Answer there, here, or in both places; I'll make sure the Reddit author knows about this post.
Eric Herboso's answer on Reddit (the only one so far) includes these examples:
Scott Garrabrant on Finite Factored Sets (May)
Paul Christiano on his Research Methodology (March)
Rob Miles on Misaligned Mesa-Optimisers (Feb part 1 May part 2, both describing a paper from 2019)
nit: link on "reasons" was pasted twice. For others it's https://www.lesswrong.com/posts/PZtsoaoSLpKjjbMqM/the-case-for-aligning-narrowly-superhuman-models
Also hadn't seen that paper. Thanks!