I've got a sabbatical coming up for 2024, and, as a psych professor concerned about AI X risk, I'd like to spend the year doing some empirical psychology research that helps inform AI safety and alignment.
What are some key questions on these topics that you'd like to see addressed by new behavioral sciences research, e.g. using surveys, experiments, interviews, literature reviews, etc?
There seems to be a nascent field in academia of using psychology tools/methods to understand LLMs, e.g. https://www.pnas.org/doi/10.1073/pnas.2218523120; it might be interesting to think about the intersection of this with alignment e.g. what experiments to perform, etc.
Maybe more on the neuroscience side, I'd be very excited to see (more) people think about how to build a neuroconnectionist research programme for alignment (I've also briefly mentioned this in the linkpost).
Another relevant article on "machine psychology" https://arxiv.org/abs/2303.13988 (interestingly, it's by a co-author of Peter Singer's first AI paper)