MN

Max Nadeau

189 karmaJoined May 2022

Comments
13

I'd love to hear his thoughts on defensive measures for "fuzzier" threats from advanced AI, e.g. manipulation, persuasion, "distortion of epistemics", etc. Since it seems difficult to delineate when these sorts of harms are occuring (as opposed to benign forms of advertising/rhetoric/expression), it seems hard to construct defenses.

This is a related concept mechanisms for collective epistemics like prediction markets or community notes, which Vitalik praises here. But the harms from manipulation are broader, and could route through "superstimuli", addictive platforms, etc. beyond just the spread of falsehoods. See manipulation section here for related thoughts.

Disclaimer: I joined OP two weeks ago in the Program Associate role on the Technical AI Safety team. I'm leaving some comments describing questions I wanted to know to assess whether I should take the job (which, obviously, I ended up doing).

What sorts of personal/career development does the PA role provide? What are the pros and cons of this path over e.g. technical research (which has relatively clear professional development in the form of published papers, academic degrees, high-status job titles that bring public credibility)?

Disclaimer: I joined OP two weeks ago in the Program Associate role on the Technical AI Safety team. I'm leaving some comments describing questions I wanted to know to assess whether I should take the job (which, obviously, I ended up doing).

How inclined are you/would the OP grantmaking strategy be towards technical research with theories of impact that aren’t “researcher discovers technique that makes the AI internally pursue human values” -> “labs adopt this technique”. Some examples of other theories of change that technical research might have:

  • Providing evidence for the dangerous capabilities of current/future models (should such capabilities emerge) that can more accurately inform countermeasures/policy/scaling decisions.
  • Detecting/demonstrating emergent misalignment from normal training procedures. This evidence would also serve to more accurately inform countermeasures/policy/scaling decisions.
  • Reducing the ease of malicious misuse of AIs by humans.
  • Limiting the reach/capability of models instead of ensuring their alignment.

Disclaimer: I joined OP two weeks ago in the Program Associate role on the Technical AI Safety team. I'm leaving some comments describing questions I wanted to know to assess whether I should take the job (which, obviously, I ended up doing).

How much do the roles on the TAIS team involve engagement with technical topics? How do the depth and breadth of “keeping up with” AI safety research compare to being an AI safety researcher?

Disclaimer: I joined OP two weeks ago in the Program Associate role on the Technical AI Safety team. I'm leaving some comments describing questions I wanted to know to assess whether I should take the job (which, obviously, I ended up doing).

What does OP’s TAIS funding go to? Don’t professors’ salaries already get paid by their universities? Can (or can't) PhD students in AI get no-strings-attached funding (at least, can PhD students at prestigious universities)?

Disclaimer: I joined OP two weeks ago in the Program Associate role on the Technical AI Safety team. I'm leaving some comments describing questions I wanted to know to assess whether I should take the job (which, obviously, I ended up doing).

Is it way easier for researchers to do AI safety research within AI scaling labs (due to: more capable/diverse AI models, easier access to them (i.e. no rate limits/usage caps), better infra for running experiments, maybe some network effects from the other researchers at those labs, not having to deal with all the logistical hassle that comes from being a professor/independent researcher)? 

Does this imply that the research ecosystem OP is funding (which is ~all external to these labs) isn't that important/cutting-edge for AI safety?

Sampled from my areas of personal interest, and not intended to be at all thorough or comprehensive:

AI researchers (in no particular order):

  • Prof. Jacob Steinhardt: author of multiple fascinating pieces on forecasting AI progress and contributor/research lead on numerous AI safety-relevant papers.
  • Dan Hendrycks: director of the multi-faceted and hard-to-summarize research and field-building non-profit CAIS.
  • Prof. Sam Bowman: has worked on many varieties of AI safety research at Anthropic and NYU
  • Ethan Perez: researcher doing fascinating work to display and address misalignments in today’s AIs.
  • Toby Shevlane: Model Evaluations for Extreme Risks
  • Jess Whittlestone: head of AI policy at Center for Long-Term Resilience, much research here
  • Plenty of others: Jade Leung (AI governance and evaluations at OpenAI), Prof. David Krueger (varied AI safety research), Prof. Percy Liang (evaluating models), Prof. Roger Grosse (influence functions for interpretability), many others listed here

 

Economists who have written (esp. but not only deflationary arguments contra Davidson) on AI’s economic impact:

  • Chad Jones (see here)
  • Ben Jones (see e.g. this, but also all his research)
  • Matt Clancy (see this debate, though an episode with him should also address his non-AI work as well!)
  • Daron Acemoglu (see Power and Progress)
  • Maybe other reviewers here?

 

Ethicists:

 

The three I would personally be most excited to listen to: Toby Shevlane, Matt Clancy, Iason Gabriel.

Best of luck with your new gig; excited to hear about it! Also, I really appreciate the honesty and specificity in this post.

From the post: "We plan to have some researchers arrive early, with some people starting as soon as possible. The majority of researchers will likely participate during the months of December and/or January."

Artir Kel (aka José Luis Ricón Fernández de la Puente) at Nintil wrote an essay broadly sympathetic to AI risk scenarios but doubtful of a particular step in the power-seeking stories Cotra, Gwern, and others have told. In particular, he has a hard time believing that a scaled-up version of present systems (e.g. Gato) would learn facts about itself (e.g. that it is an AI in a training process, what its trainers motivations would be, etc) and incorporate those facts into its planning (Cotra calls this "situational awareness"). Some AI safety researchers I've spoken to personally agree with Kel's skepticism on this point. 

Since incorporating this sort of self-knowledge into one's plans is necessary for breaking out of training, initiating deception, etc, this seems like a pretty important disagreement. In fact, Kel claims that if he came around on this point, he would agree almost entirely with Cotra's analysis.

Can she describe in more detail what situational awareness means? Could it be demonstrated with current/nearterm models? Why does she think that Kel (and others) think it's so unlikely?

Load more