Max Nadeau

121 karmaJoined May 2022


Best of luck with your new gig; excited to hear about it! Also, I really appreciate the honesty and specificity in this post.

From the post: "We plan to have some researchers arrive early, with some people starting as soon as possible. The majority of researchers will likely participate during the months of December and/or January."

Answer by Max NadeauOct 28, 2022130

Artir Kel (aka José Luis Ricón Fernández de la Puente) at Nintil wrote an essay broadly sympathetic to AI risk scenarios but doubtful of a particular step in the power-seeking stories Cotra, Gwern, and others have told. In particular, he has a hard time believing that a scaled-up version of present systems (e.g. Gato) would learn facts about itself (e.g. that it is an AI in a training process, what its trainers motivations would be, etc) and incorporate those facts into its planning (Cotra calls this "situational awareness"). Some AI safety researchers I've spoken to personally agree with Kel's skepticism on this point. 

Since incorporating this sort of self-knowledge into one's plans is necessary for breaking out of training, initiating deception, etc, this seems like a pretty important disagreement. In fact, Kel claims that if he came around on this point, he would agree almost entirely with Cotra's analysis.

Can she describe in more detail what situational awareness means? Could it be demonstrated with current/nearterm models? Why does she think that Kel (and others) think it's so unlikely?

It is possible but unlikely that such a person would be a TA. Someone with little prior ML experience would be a better fit as a participant.

We intended that sentence to be read as: "In addition to people who plan on doing technical alignment, MLAB can be valuable to other sorts of people (e.g. theoretical researchers)".