Hi friends! I just a wrote a post on Substack to share some of my thoughts on the prospects of mech interp and CoT usefulness for safety/alignment. It’s inspired by the 80,000 Hours podcast episode with Neel Nanda on the same subject.https://open.substack.com/pub/mostlyharmlessmachines/p/thinking-machines?r=pm1lz&utm_medium=ios
Would love your thoughts and feedback.
Hi friends! I just a wrote a post on Substack to share some of my thoughts on the prospects of mech interp and CoT usefulness for safety/alignment. It’s inspired by the 80,000 Hours podcast episode with Neel Nanda on the same subject.
https://open.substack.com/pub/mostlyharmlessmachines/p/thinking-machines?r=pm1lz&utm_medium=ios
Would love your thoughts and feedback.