What are the coolest topics in AI safety, to a hopelessly pure mathematician?

You might be interested in this great intro sequence to embedded agency. There's also corrigibility and MIRI's other work on agent foundations.

Also, coherence arguments and consequentialist cognition.

AI safety is a young field; for most open problems we don't yet know of a way to crisply state them in a way that can be resolved mathematically. So if you enjoy taking messy questions and turning them into neat math you'll probably find much to work on.

ETA: oh and of course ELK.

Shah and Yudkowsky on alignment failures

Upvoted because concrete scenarios are great.

Minor note:

HQU is constantly trying to infer the real state of the world, the better to predict the next word Clippy says, and suddenly it begins to consider the delusional possibility that HQU is like a Clippy, because the Clippy scenario exactly matches its own circumstances. [...] This idea "I am Clippy" improves its predictions

This piece of complexity in the story is probably not necessary. There are "natural", non-delusional ways for the system you describe to generalize that lead to the same outcome. Two examples: 1) the system ends up wanting to maximize its received reward, and so takes over its reward channel; 2) the system has learned some heuristic goal that works across all environments it encounters, and this goal generalizes in some way to the real world when the system's world-model improves.

AI policy careers in the EU

Makes sense―I agree that the base value of becoming an MEP seems really good.