Hi Ben. I just read the transcript of your 80,000 Hours interview and am curious how you'd respond to the following:
Analogy to agriculture, industry
You say that it would be hard for a single person (or group?) acting far before the agricultural revolution or industrial revolution to impact how those things turned out, so we should be skeptical that we can have much effect now on how an AI revolution turns out.
Do you agree that the goodness of this analogy is roughly proportional to how slow our AI takeoff is? For instance if the first AGI ever created becomes more powerful than the rest of the world, then it seems that anyone who influenced the properties of this AGI would have a huge impact on the future.
You argue that if we transition more smoothly from super powerful narrow AIs that slowly expand in generality to AGI, we'll be less caught off guard / better prepared.
It seems that even in a relatively slow takeoff, you wouldn't need that big of a discontinuity to result in a singleton AI scenario. If the first AGI that's significantly more generally intelligent than a human is created in a world where lots of powerful narrow AIs exist, wouldn't having a super smart thing at the center of control of a bunch of narrow AI tools plausibly be way more powerful than having human brains at the center of that control?
It seems plausible that in a "smooth" scenario the time between when the first group created AGI and the second group creating an equally powerful one could be months apart. Do you think a months-long discontinuity is not enough for an AGI to pull sufficiently ahead?
Even if multiple groups create AGIs within a short time, isn't having a bunch of unaligned AGIs all trying to get power at the same time also an existential risk? It doesn't seem clear that they'd automatically keep each other in check. One might simply be better at growing or better at sabotaging other AIs. Or if they reach a stalemate they might start cooperating with each other to achieve unaligned goals as a compromise.
Maybe narrow AIs will work better
You say that since today's AIs are narrow, and since there's often benefit in specialization, maybe in the future specialized AIs will continue to dominate. You say "maybe the optimal level of generality actually isn’t that high."
My model is: if you have a central control unit (a human brain, or group of human brains) who is deciding how to use a bunch of narrow AIs, then if you replace that central control unit with one that it more intelligent / fast acting, the whole system will be more effective.
The only way I can think of where that wouldn't be true would be if the general AI required so many computational resources that the narrow AIs that were acting as tools of the AGI were crippled by lack of resources. Is that what you're imagining?
Deadline model of AI progress
You say you disagree with the idea that the day when we create AGI acts as a sort of 'deadline', and if we don't figure out alignment before then we're screwed.
A lot of your argument is about how increasing AI capability and alignment are intertwined processes, so that as we increase an AI's capabilities we're also increasing its alignment. You discuss how it's not like we're going to create a super powerful AI and then give it a module with its goals at the end of the process.
I agree with that, but I don't see it as substantially affecting the Bostrom/Yudkowsky arguments.
Isn't the idea that we would have something that seemed aligned as we were training it (based on this continuous feedback we were giving it), but then only when it became extremely powerful we'd realize it wasn't actually aligned?
This seems to be a disagreement about "how hard is AI alignment?". I think Yudkowsky would say that it's super hard such that your AI can look perfectly aligned when it's less powerful than you, but you get something slightly wrong that only manifests itself when it has taken over. Do you agree that's a crux?
You talk about how AIs can behave very differently in different environments. Isn't the environment of an AI which happens to be the most powerful agent on earth fundamentally different than the any environment we could provide when training an AI (in terms of resources at its disposal, strategies it might be aware of, etc)?
You talk about how even if almost all goals would result in instrumental convergence, we're free to pick any goals we like, so we can pick from a very small subset of all goals which don't result in instrumental convergence.
It seems like there's a tradeoff between AI capability and not exhibiting instrumental convergence, since to avoid instrumental convergence you basically need to tell the AI "You're not allowed to do anything in this broad class of things that will help you achieve your goals." An AI that amasses power and is willing to kill to achieve its goals is by definition more powerful than one that eschews becoming powerful and killing.
In a situation where they may be many groups trying to create an AGI, doesn't this imply that the first AGI that does exhibit instrumental convergence will have a huge advantage over any others?