[[THIRD EDIT: Thanks so much for all of the questions and comments! There are still a few more I'd like to respond to, so I may circle back to them a bit later, but, due to time constraints, I'm otherwise finished up for now. Any further comments or replies to anything I've written are also still be appreciated!]]
Hi!
I'm Ben Garfinkel, a researcher at the Future of Humanity Institute. I've worked on a mixture of topics in AI governance and in the somewhat nebulous area FHI calls "macrostrategy", including: the long-termist case for prioritizing work on AI, plausible near-term security issues associated with AI, surveillance and privacy issues, the balance between offense and defense, and the obvious impossibility of building machines that are larger than humans.
80,000 Hours recently released a long interview I recorded with Howie Lempel, about a year ago, where we walked through various long-termist arguments for prioritizing work on AI safety and AI governance relative to other cause areas. The longest and probably most interesting stretch explains why I no longer find the central argument in Superintelligence, and in related writing, very compelling. At the same time, I do continue to regard AI safety and AI governance as high-priority research areas.
(These two slide decks, which were linked in the show notes, give more condensed versions of my views: "Potential Existential Risks from Artificial Intelligence" and "Unpacking Classic Arguments for AI Risk." This piece of draft writing instead gives a less condensed version of my views on classic "fast takeoff" arguments.)
Although I'm most interested in questions related to AI risk and cause prioritization, feel free to ask me anything. I'm likely to eventually answer most questions that people post this week, on an as-yet-unspecified schedule. You should also feel free just to use this post as a place to talk about the podcast episode: there was a thread a few days ago suggesting this might be useful.
Hi Ofer,
Thanks for the comment!
I actually do think that the instrumental convergence thesis, specifically, can be mapped over fine, since it's a fairly abstract principle. For example, this recent paper formalizes the thesis within a standard reinforcement learning framework. I just think that the thesis at most weakly suggests existential doom, unless we add in some other substantive theses. I have some short comments on the paper, explaining my thoughts, here.
Beyond the instrumental convergence thesis, though, I do think that some bits of the classic arguments are awkward to fit onto concrete and plausible ML-based development scenarios: for example, the focus on recursive self-improvement, and the use of thought experiments in which natural language commands, when interpretted literally and single-mindedly, lead to unforeseen bad behaviors. I think that Reframing Superintelligence does a good job of pointing out some of the tensions between classic ways of thinking and talking about AI risk and current/plausible ML engineering practices.
This may not be what you have in mind, but: I would be surprised if the FB newsfeed selection algorithm became existentially damaging (e.g. omnicidal), even in the limit of tremendous amounts of training data and compute. I don't know the algorithm actually works, but as a simplication: let's imagine that it produces an ordered list of posts to show a user, from the set of recent posts by their friends, and that it's trained using something like the length of the user's FB browsing session as the reward. I think that, if you kept training it, nothing too weird would happen. It might produce some unintended social harms (like addiction, polarization, etc.), but the system wouldn't, in any meaningful sense, have long-run objectives (due to the shortness of sessions). It also probably wouldn't have the ability or inclination to manipulate the external world in the pursuit of complex schemes. Figuring out how to manipulate the external world in precise ways would require a huge amount of very weird exploration, deep in a section of the space of possible policies where most of the policies are terrible at maximizing reward; in the unlikely event that the necessary exploration happened, and the policy started moving in this direction, I think it would be conspicuous before the newsfeed selection algorithm does something like kill everyone to prevent ongoing FB sessions from ending (if this is indeed possible given the system's limited space of possible actions.)