[[THIRD EDIT: Thanks so much for all of the questions and comments! There are still a few more I'd like to respond to, so I may circle back to them a bit later, but, due to time constraints, I'm otherwise finished up for now. Any further comments or replies to anything I've written are also still be appreciated!]]
Hi!
I'm Ben Garfinkel, a researcher at the Future of Humanity Institute. I've worked on a mixture of topics in AI governance and in the somewhat nebulous area FHI calls "macrostrategy", including: the long-termist case for prioritizing work on AI, plausible near-term security issues associated with AI, surveillance and privacy issues, the balance between offense and defense, and the obvious impossibility of building machines that are larger than humans.
80,000 Hours recently released a long interview I recorded with Howie Lempel, about a year ago, where we walked through various long-termist arguments for prioritizing work on AI safety and AI governance relative to other cause areas. The longest and probably most interesting stretch explains why I no longer find the central argument in Superintelligence, and in related writing, very compelling. At the same time, I do continue to regard AI safety and AI governance as high-priority research areas.
(These two slide decks, which were linked in the show notes, give more condensed versions of my views: "Potential Existential Risks from Artificial Intelligence" and "Unpacking Classic Arguments for AI Risk." This piece of draft writing instead gives a less condensed version of my views on classic "fast takeoff" arguments.)
Although I'm most interested in questions related to AI risk and cause prioritization, feel free to ask me anything. I'm likely to eventually answer most questions that people post this week, on an as-yet-unspecified schedule. You should also feel free just to use this post as a place to talk about the podcast episode: there was a thread a few days ago suggesting this might be useful.
Hi Ben. I just read the transcript of your 80,000 Hours interview and am curious how you'd respond to the following:
Analogy to agriculture, industry
You say that it would be hard for a single person (or group?) acting far before the agricultural revolution or industrial revolution to impact how those things turned out, so we should be skeptical that we can have much effect now on how an AI revolution turns out.
Do you agree that the goodness of this analogy is roughly proportional to how slow our AI takeoff is? For instance if the first AGI ever created becomes more powerful than the rest of the world, then it seems that anyone who influenced the properties of this AGI would have a huge impact on the future.
Brain-in-a-box
You argue that if we transition more smoothly from super powerful narrow AIs that slowly expand in generality to AGI, we'll be less caught off guard / better prepared.
It seems that even in a relatively slow takeoff, you wouldn't need that big of a discontinuity to result in a singleton AI scenario. If the first AGI that's significantly more generally intelligent than a human is created in a world where lots of powerful narrow AIs exist, wouldn't having a super smart thing at the center of control of a bunch of narrow AI tools plausibly be way more powerful than having human brains at the center of that control?
It seems plausible that in a "smooth" scenario the time between when the first group created AGI and the second group creating an equally powerful one could be months apart. Do you think a months-long discontinuity is not enough for an AGI to pull sufficiently ahead?
Even if multiple groups create AGIs within a short time, isn't having a bunch of unaligned AGIs all trying to get power at the same time also an existential risk? It doesn't seem clear that they'd automatically keep each other in check. One might simply be better at growing or better at sabotaging other AIs. Or if they reach a stalemate they might start cooperating with each other to achieve unaligned goals as a compromise.
Maybe narrow AIs will work better
You say that since today's AIs are narrow, and since there's often benefit in specialization, maybe in the future specialized AIs will continue to dominate. You say "maybe the optimal level of generality actually isn’t that high."
My model is: if you have a central control unit (a human brain, or group of human brains) who is deciding how to use a bunch of narrow AIs, then if you replace that central control unit with one that it more intelligent / fast acting, the whole system will be more effective.
The only way I can think of where that wouldn't be true would be if the general AI required so many computational resources that the narrow AIs that were acting as tools of the AGI were crippled by lack of resources. Is that what you're imagining?
Deadline model of AI progress
You say you disagree with the idea that the day when we create AGI acts as a sort of 'deadline', and if we don't figure out alignment before then we're screwed.
A lot of your argument is about how increasing AI capability and alignment are intertwined processes, so that as we increase an AI's capabilities we're also increasing its alignment. You discuss how it's not like we're going to create a super powerful AI and then give it a module with its goals at the end of the process.
I agree with that, but I don't see it as substantially affecting the Bostrom/Yudkowsky arguments.
Isn't the idea that we would have something that seemed aligned as we were training it (based on this continuous feedback we were giving it), but then only when it became extremely powerful we'd realize it wasn't actually aligned?
This seems to be a disagreement about "how hard is AI alignment?". I think Yudkowsky would say that it's super hard such that your AI can look perfectly aligned when it's less powerful than you, but you get something slightly wrong that only manifests itself when it has taken over. Do you agree that's a crux?
You talk about how AIs can behave very differently in different environments. Isn't the environment of an AI which happens to be the most powerful agent on earth fundamentally different than the any environment we could provide when training an AI (in terms of resources at its disposal, strategies it might be aware of, etc)?
Instrumental convergence
You talk about how even if almost all goals would result in instrumental convergence, we're free to pick any goals we like, so we can pick from a very small subset of all goals which don't result in instrumental convergence.
It seems like there's a tradeoff between AI capability and not exhibiting instrumental convergence, since to avoid instrumental convergence you basically need to tell the AI "You're not allowed to do anything in this broad class of things that will help you achieve your goals." An AI that amasses power and is willing to kill to achieve its goals is by definition more powerful than one that eschews becoming powerful and killing.
In a situation where they may be many groups trying to create an AGI, doesn't this imply that the first AGI that does exhibit instrumental convergence will have a huge advantage over any others?
Thanks to Ben for doing this AMA, and to Elliot for this interesting set of questions!
Just wanted to mention two links that readers might find interesting in this context. Firstly, Tomasik's Will Future Civilization Eventually Achieve Goal Preservation? Here's the summary:
... (read more)