I'd love to hear his thoughts on defensive measures for "fuzzier" threats from advanced AI, e.g. manipulation, persuasion, "distortion of epistemics", etc. Since it seems difficult to delineate when these sorts of harms are occuring (as opposed to benign forms of advertising/rhetoric/expression), it seems hard to construct defenses.
This is a related concept mechanisms for collective epistemics like prediction markets or community notes, which Vitalik praises here. But the harms from manipulation are broader, and could route through "superstimuli", addictive platforms, etc. beyond just the spread of falsehoods. See manipulation section here for related thoughts.
Disclaimer: I joined OP two weeks ago in the Program Associate role on the Technical AI Safety team. I'm leaving some comments describing questions I wanted to know to assess whether I should take the job (which, obviously, I ended up doing).
What sorts of personal/career development does the PA role provide? What are the pros and cons of this path over e.g. technical research (which has relatively clear professional development in the form of published papers, academic degrees, high-status job titles that bring public credibility)?
How inclined are you/would the OP grantmaking strategy be towards technical research with theories of impact that aren’t “researcher discovers technique that makes the AI internally pursue human values” -> “labs adopt this technique”. Some examples of other theories of change that technical research might have:
How much do the roles on the TAIS team involve engagement with technical topics? How do the depth and breadth of “keeping up with” AI safety research compare to being an AI safety researcher?
What does OP’s TAIS funding go to? Don’t professors’ salaries already get paid by their universities? Can (or can't) PhD students in AI get no-strings-attached funding (at least, can PhD students at prestigious universities)?
Is it way easier for researchers to do AI safety research within AI scaling labs (due to: more capable/diverse AI models, easier access to them (i.e. no rate limits/usage caps), better infra for running experiments, maybe some network effects from the other researchers at those labs, not having to deal with all the logistical hassle that comes from being a professor/independent researcher)?
Does this imply that the research ecosystem OP is funding (which is ~all external to these labs) isn't that important/cutting-edge for AI safety?
Sampled from my areas of personal interest, and not intended to be at all thorough or comprehensive:AI researchers (in no particular order):
Economists who have written (esp. but not only deflationary arguments contra Davidson) on AI’s economic impact:
The three I would personally be most excited to listen to: Toby Shevlane, Matt Clancy, Iason Gabriel.
Best of luck with your new gig; excited to hear about it! Also, I really appreciate the honesty and specificity in this post.
From the post: "We plan to have some researchers arrive early, with some people starting as soon as possible. The majority of researchers will likely participate during the months of December and/or January."
Artir Kel (aka José Luis Ricón Fernández de la Puente) at Nintil wrote an essay broadly sympathetic to AI risk scenarios but doubtful of a particular step in the power-seeking stories Cotra, Gwern, and others have told. In particular, he has a hard time believing that a scaled-up version of present systems (e.g. Gato) would learn facts about itself (e.g. that it is an AI in a training process, what its trainers motivations would be, etc) and incorporate those facts into its planning (Cotra calls this "situational awareness"). Some AI safety researchers I've spoken to personally agree with Kel's skepticism on this point.
Since incorporating this sort of self-knowledge into one's plans is necessary for breaking out of training, initiating deception, etc, this seems like a pretty important disagreement. In fact, Kel claims that if he came around on this point, he would agree almost entirely with Cotra's analysis.
Can she describe in more detail what situational awareness means? Could it be demonstrated with current/nearterm models? Why does she think that Kel (and others) think it's so unlikely?