All of Lauro Langosco's Comments + Replies

What are the coolest topics in AI safety, to a hopelessly pure mathematician?

You might be interested in this great intro sequence to embedded agency. There's also corrigibility and MIRI's other work on agent foundations.

Also, coherence arguments and consequentialist cognition.

AI safety is a young field; for most open problems we don't yet know of a way to crisply state them in a way that can be resolved mathematically. So if you enjoy taking messy questions and turning them into neat math you'll probably find much to work on.

ETA: oh and of course ELK.

Shah and Yudkowsky on alignment failures

Upvoted because concrete scenarios are great.

Minor note:

HQU is constantly trying to infer the real state of the world, the better to predict the next word Clippy says, and suddenly it begins to consider the delusional possibility that HQU is like a Clippy, because the Clippy scenario exactly matches its own circumstances. [...] This idea "I am Clippy" improves its predictions

This piece of complexity in the story is probably not necessary. There are "natural", non-delusional ways for the system you describe to generalize that lead to the same outcome. T... (read more)

Oh, the whole story is strictly speaking unnecessary :). There are disjunctively many stories for an escape or disaster, and I'm not trying to paint a picture of the most minimal or the most likely barebones scenario.

The point is to serve as a 'near mode' visualization of such a scenario to stretch your mind, as opposed to a very 'far mode' observation like "hey, an AI could make a plan to take over its reward channel". Which is true but comes with a distinct lack of flavor. So for that purpose, stuffing in more weird mechanics before a reward-hacking twis... (read more)

AI policy careers in the EU

Makes sense―I agree that the base value of becoming an MEP seems really good.