In this article I will provide a brief critique of the way an ‘AI takeover scenario’ is typically presented in EA discourse. I originally wanted this piece to be more comprehensive, but owing to time constraints I have focused on three recent articles on the EA forum which discuss this issue, authored respectively by Ajeya, Yudkowsky, and Karnofsky. Each of these posts provides arguments for why a highly competent Artificial General Intelligence (AGI) poses a grave risk of accomplishing a ‘hostile takeover’ of Earth’s resources. This corresponds to what Nick Bostrom in Superintelligence calls a ‘decisive strategic advantage’, meaning that humans lack the ability to prevent the AGI from doing as it wishes. The plausibility of such a takeover scenario is critical for the argument as to why AI alignment is such an important cause area, however it seems to have received less attention in comparison to other issues, such as the likely timescale for developing AGI, or questions about how to value potential future lives.
My contention in this piece is that much EA discussion about AI takeover scenarios is highly superficial, and fails to seriously consider numerous practical questions of relevance for assessing the plausibility of such scenarios. I frame my discussion around three such key issues: the plausibility that scientific research can be rapidly accelerated by increasing cognitive resources, the difficulty of developing and constructing novel weapons technologies, and the ability of an AGI to successfully manipulate large groups of humans. I argue that greater attention must be paid to these issues, and serious consideration given to how and when an AGI would be able to overcome these obstacles. Note that for simplicity I will use the generic term ‘AGI’ to refer to a highly cognitively sophisticated agent; see the appendix for quotes presenting definitions of the level of intelligence assumed by each article.
Accelerating scientific research
It is often argued that AGI would be able to expand its capabilities by rapidly conducting scientific research, thereby increasing its knowledge of the world. This would directly increase its capabilities through increased knowledge, and more importantly would serve as the basis for further technological development. Karnofsky presents an idea he calls “PASTA”, a Process for Automating Scientific and Technological Advancement, which would be an AI system that could “automate all of the human activities needed to speed up scientific and technological advancement”. Such a speedup of scientific and technological progress is typically argued to be one of the primary mechanisms by which an AGI would gain a decisive strategic advantage over humans. Some representative quotes making similar points:
“The AIs can conduct massive amounts of research on how to use computing power more efficiently, which could mean still greater numbers of AIs run using the same hardware. This in turn could lead to a feedback loop and explosive growth in the number of AIs.”, Karnofksy
“The copies of Alex could do ML research on how they can improve their own knowledge, think more efficiently, etc... This would lead to a dynamic of explosive scientific and technological advancement: the various copies of Alex do R&D work, and that R&D work increases the number and intellectual capabilities of these copies, and that in turn leads to even more R&D work.”, Ajeya
“Once research is being carried out by copies of the AI, it would progress much faster than it would if similar tasks were done by human scientists, because (like existing ML models) each copy is capable of processing information many times faster than a human. For example, an AI would be able to churn out hundreds of lines of code, or read thousands of pages of information, in just a few minutes.”, Ajeya
“Alpha Zero blew past all accumulated human knowledge about Go after a day or so of self-play, with no reliance on human playbooks or sample games... AGI will not be upper-bounded by human ability or human learning speed. Things much smarter than human would be able to learn from less evidence than humans require”, Yudkowsky
“Today's computers and AIs aren't able to do all of the things required to have new ideas and get themselves copied more efficiently. They play a role in innovation, but innovation is ultimately bottlenecked by humans, whose population is only growing so fast. This is what PASTA would change.”, (Forecasting Transformative AI)
These passages seem to imply that the rate of scientific progress is primarily limited by the number and intelligence level of those working on scientific research. It is not clear, however, that the evidence supports this. The number of academic articles published each year has increased more than one hundred times in the past seventy years or so, and yet it is not clear that the rate of scientific or economic progress has changed much at all (indeed by some metrics it has slowed down). Of course, research is about far more than simply producing papers, though funding and number of researchers have also increased substantially without any obvious acceleration in progress. Historically, larger civilizational centres have likewise not consistently exhibited faster scientific growth than have smaller ones. For instance, the end of the Islamic Golden Age or the beginning of the Scientific Revolution in Europe were not primarily caused by major changes in population. Rather, it appears that rates of scientific progress are primarily determined by social, political, and economic institutions and forces that are poorly understood. As such, it is unclear whether an AGI could significantly accelerate its accumulation of scientific and technical knowledge simply by devoting more cognitive resources or computational power to research purposes.
Comparisons such as Alpha Zero learning the game of Go within the space of a day are highly misleading. Frist, though the actual model training was rapid, the entire process of developing Alpha Zero was far more protracted. Focusing on the day of training presents a highly misleading picture of the actual rate of progress of this particular example. Second, Go is a fully-observable, discrete-time, zero-sum, two-player board game. It is a very complex and intricate game, but in the scope of possible environments and problems, it is very simple and well-constrained. By contrast, scientific research involves many unknowns, no clear parameters for success, and an effectively unbounded number of evidential constraints to consider. While Alpha Zero was able to improve its performance rapidly by playing games against itself, it does not seem possible for an artificial scientist to ‘research against itself’ in order to rapidly increase its ability to solve scientific problems. Owing to these considerations, I believe cases such as Alpha Zero do not provide a relevant comparison for assessing the plausibility of an AGI rapidly accelerating its accumulation of scientific knowledge.
There is also reason to doubt the assumption that most forms of scientific progress are primarily limited by human intelligence or the number of available researchers. The mere fact human intelligence and manpower are necessary for scientific and technical progress doesn’t mean that these are the major limiting factors on the rate of progress. In many fields, it seems that technological limitations on the quantity and quality of data that can be collected are key limiting factors in scientific progress. Conceptual breakthroughs also play a major role in science, but here too there is reason to think these require the appropriate conceptual prerequisites, and will be produced only when these are present. This appears to be supported by the frequency with which key breakthroughs are made independently by multiple researchers around the same time. It appears then when the requisite intellectual framework is present, sufficient researchers are available for multiple people to simultaneously come up with a particular idea. Of course much remains to be learned about the process by which scientific discoveries are made, but I believe there is reason to think that many factors beyond the number and intelligence of researchers play a crucial role in determining the rate of scientific progress. If this is true, then an AGI may struggle to gain a decisive technical or scientific lead over the rest of humanity in a short space of time, even if it is able to devote considerably more cognitive resources to research. Such an agent may achieve some acceleration of scientific progress, but I believe that doing so would take considerable time, and require the ability to reshape human societies and economies that would already require extensive control over human affairs.
Many discussions of AI takeover present cases whereby AGIs would be able to fairly rapidly develop weapons technologies which would enable them to either kill all of humanity, or at least to defeat the world’s militaries, enabling them to take decisive control over the usage of Earth’s resources. Some illustrative quotes regarding this are shown below.
“My lower-bound model of how a sufficiently powerful intelligence would kill everyone… is that it gets access to the Internet, emails some DNA sequences to any of the many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery… The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer.”, Yudkowsky
“Before the scientist model is deployed, the pace of scientific and economic progress in the world is roughly similar to what it is today; after it’s deployed, the effective supply of top-tier talent is increased by orders of magnitude. This is enough to pack decades of innovation into months, bringing the world into a radically unfamiliar future -- one with digital people, atomically precise manufacturing, Dyson spheres, self-replicating space probes, and other powerful technology that enables a vast and long-lasting galaxy-scale civilization -- within a few years.”, Ajeya
“A relatively modest amount of property safe from shutdown could be sufficient for housing a huge population of AI systems that are recruiting further human allies, making money (via e.g. quantitative finance), researching, and developing advanced weaponry (e.g., bioweapons), setting up manufacturing robots to construct military equipment, thoroughly infiltrating computer systems worldwide to the point where they can disable or control most others' equipment, etc. Through these and other methods, a large enough population of AIs could develop enough military technology and equipment to overpower civilization.”, Karnofksy
I do not find these sorts of scenarios very credible. They appear to ignore the fact that research and development of novel technology requires much more than general-purpose cognitive capabilities. This also requires precision manufacturing facilities, specific raw materials, prolonged testing and iterative development cycles, adequate experimental data, and highly specific technical know-how that is acquired by learning-by-doing. These all take time to develop and put into place, which is why the development of novel technologies takes a long time. For example, the Lockheed Martin F-35 took about fifteen years from initial design to scale production. The Gerald R. Ford aircraft carrier took about ten years to build and fit out. Semiconductor fabrication plants cost billions of dollars, and the entire process from the design of a chip to manufacturing takes years. Given such examples, it seems reasonable to expect that even a nascent AGI would require years to design and build a functioning nanofactory. Doing so in secret or without outside interference would be even more difficult given all the specialised equipment, raw materials, and human talent that would be needed. A bunch of humans hired online cannot simply construct a nanofactory from nothing in a few months, regardless of how advanced is the AGI overseeing the process.
Furthermore, it must be asked how an AGI would even know how to design and manufacture an artificial bacterium capable of killing most of humanity, let alone a dyson sphere or other such science-fiction technologies. The scientific and technical knowledge necessary for these projects simply do not exist, and acquiring it would take years of laboratory research, and (in the case of any bioweapon) extensive testing on various organisms, including humans. New drugs take years of development and rounds of study to establish their effectiveness, and even if an AGI isn’t interested in safety, it would still want to ensure that its killer bacterium was effective the first time it was used. Even a superintelligent AGI will not be able to rely on simulations for everything (both due to limited data and limited compute power), and will therefore need to conduct time-consuming in vitro and in vivo experiments to acquire the necessary data. Given these considerations, I find it difficult to believe that such development and manufacturing timelines can be dramatically accelerated by improved intelligence and planning. Certainly this could help, but at the end of the day construction, transportation, experiments, and learning-by-doing take considerable time and resources beyond mere cognitive power.
Manipulation of humans
Another important capability of an AGI would be its ability to influence or manipulate humans to carry out tasks necessary to achieve its purposes. This is especially important given that producing bodies capable of interacting with the world in as versatile and delicate manner as humans does not seem to likely in the near future, as fine motor control has proven an extremely difficult problem. As such, many writers considering AGI takeover scenarios assume that they will recruit humans to perform many tasks on their behalf. Some illustrative quotes to this effect follow.
“(AGIs) could recruit human allies through many different methods - manipulation, deception, blackmail and other threats, genuine promises along the lines of ‘We're probably going to end up in charge somehow, and we'll treat you better when we do’.”, Karnofksy
“Human allies could be… asked to rent their own servers and acquire their own property where an AI headquarters can be set up. Since the AI headquarters would officially be human property, it could be very hard for authorities to detect and respond to the danger. Via threats, AIs might be able to get key humans to cooperate with them - such as political leaders, or the CEOs of companies running lots of AIs. This would open up further strategies.”, Karnofksy
“My lower-bound model of how a sufficiently powerful intelligence would kill everyone… is that it gets access to the Internet, emails some DNA sequences to any of the many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker…”, Yudkowsky
Many of the most influential humans in history have been highly successful at persuading others to follow them. Leaders such as Napoleon, Genghis Khan, and Hitler were able to shape so much of the world in line with their goals not primarily because of their personal cognitive capacity or intelligence (though obviously this played a role), but largely because they were able to persuade large numbers of people to follow their orders. They were able to do this because of their personal presence, oratory skills, interpersonal skills, and personal prowess and accomplishments. It is much less clear how an AGI communicating via text or video would be able to generate such loyalty and conviction. Relying on a ‘front man’ to serve as the face of the AGI would be highly dangerous, as the AGI would become dependent on this person for ensuring the loyalty of its followers. Of course one might argue that a combination of bribery and threats could be sufficient, but this is not the primary means by which successful leaders in history have obtained obedience and popularity, so an AGI limited to these tools would be at a significant disadvantage. Furthermore, an AGI reliant on control over money is susceptible to intervention by government authorities to freeze assets and hamper the transfer of funds. This would not be an issue if the AGI had control over its own territory, but then it would be subject to blockade and economic sanctions. For instance, it would take an AGI considerable effort to acquire the power of Vladimir Putin, and yet he is still facing considerable practical difficulties in exerting his will on his own (and neighbouring) populations without the intervention of the rest of the world. While none of these problems are necessarily insuperable, I believe they are significant issues that must be considered in an assessment of the plausibility of various AI takeover scenarios.
It is also unclear how an AGI would gain the skills needed to manipulate and manage large numbers of humans in the first place. It is by no means evident why an AGI would be constructed with this capability, or how it would even be trained for this task, which does not seem very amenable to traditional reinforcement learning approaches. In many discussions, an AGI is simply defined as having such abilities, but it is not explained why such skills would be expected to accompany general problem-solving or planning skills. Even if a generally competent AGI had instrumental reasons to develop such skills, would it have the capability of doing so? Humans learn social skills through years of interaction with other humans, and even then, many otherwise intelligent and wealthy humans possess such skills only to a minimal degree. Unless a credible explanation can be given as to how such an AI would acquire such skills or they why should necessarily follow from broader capabilities, I do not think it is reasonable to simply define an AGI as possessing them, and then assuming this as part of a broader takeover narrative. This presents a major issue for takeover scenarios which rely on an AGI engaging large numbers of humans in its employment for the development of weapons or novel technologies.
In this article I have raised several issues which I believe existing proposed scenarios for an AI takeover do not adequately address. Whether an AGI would be able to rapidly accelerate its accumulation of scientific knowledge, quickly develop weapons capable of gaining a decisive advantage over humanity, or manipulate large groups of humans to do its will are critical questions which need to be explored in more detail. Section 6.3.1 of the excellent paper ‘Is Power-Seeking AI an Existential Risk?’ raises these and other pertinent questions pertaining to AI takeover scenarios, and I encourage interested readers to consult this reference for further discussion of some of these issues. A better understanding of the plausible mechanisms for AI takeover will allow for more accurate assessments of the risks of AI takeover scenarios, while also providing critical insights into the likely mechanisms or plausible trajectories of such scenarios. As such, I believe it is vital for the EA community to move beyond simplistic sketches and begin detailed research into these and other vital questions pertinent to understanding possible AI takeover scenarios.
Appendix: Defining AGI
Here I present the definitions of AGI as presented in the sources consulted for this article. Although each author describe the form of AI they are focused on differently, I do not believe the precise details are vital for my purpose in this article. Nonetheless, these are provided for reference and completeness.
AI Could Defeat All Of Us Combined, Holden Karnofksy
“I don't think the danger relies on the idea of cognitive superpowers or superintelligence - both of which refer to capabilities vastly beyond those of humans. I think we still have a problem even if we assume that AIs will basically have similar capabilities to humans, and not be fundamentally or drastically more intelligent or capable.”
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover, Ajeya
“In this hypothetical scenario I’m imagining: A Process for Automating Scientific and Technological Advancement (“PASTA”) is developed in the form of a single unified transformative model (a “scientist model”) which has flexible general-purpose research skills.”
AGI Ruin: A List of Lethalities, Eliezer Yudkowsky
“I am using awkward constructions like 'high cognitive power' because standard English terms like 'smart' or 'intelligent' appear to me to function largely as status synonyms. 'Superintelligence' sounds to most people like 'something above the top of the status hierarchy that went to double college', and they don't understand why that would be all that dangerous? Earthlings have no word and indeed no standard native concept that means 'actually useful cognitive power'.”
Is Power-Seeking AI an Existential Risk?, Joseph Carlsmith
“The ease with which an AI system can access such resources (via hacking, buying, renting, manufacturing, etc) therefore seems important—and note that compute may be both prevalent and in high demand in an increasingly AI-driven economy, and that the manufacturing process may require signiﬁcant time and/or resources (current semiconductor fabs, for example, cost billions of dollars).”
Superintelligence: Paths, Dangers, Strategies, Nick Bostrom
“We can tentatively define a superintelligence as any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.”
Thanks for this critique! I agree this is an important subject that is relatively understudied compared to other aspects of the problem. As far as I can tell there just isn't a science of takeover; there's military science and there's the science of how to win elections in a democracy and there's a bit of research and a few books on the topic of how to seize power in a dictatorship... but for such an important subject when you think about it it's unfortunate that there isn't a general study of how agents in multi-agent environments accumulate influence and achieve large-scale goals over long time periods.
I'm going to give my reactions below as I read:
I mean it's clearly more than JUST the number and intelligence of the people involved, but surely those are major factors! Piece of evidence: Across many industries performance on important metrics (e.g. price) seems to predictably improve exponentially with investment/effort (this is called experience curve effect). Another piece of evidence: AlphaFold 2.
Later you mention the gradual accumulation of ideas and cite the common occurrence of repeated independent discoveries. I think this quite plausible. But note that a society of AIs would be thinking and communicating much faster than a society of humans, so the process of ideas gradually accumulating in their society would also be sped up.
Sure, and similarly if AI R&D ability is like AI Go ability, there'll be a series of better and better AIs over the course of many years that gradually get better at various aspects of R&D, until one day an AI is trained that is better than the most brilliant genius scientists. I actually expect things to be slower and more smoothed out than this, probably, because training will take more like a year. This is all part of the standard picture of AI takeover, not an objection to it.
I agree that the real world is more complex etc. and that just doing the same sort of self-play won't work. There may be more sophsiticated forms of self-play that work though. Also you don't need self-play to be superhuman at something, e.g. you could use decision transformers + imitation learning.
I'd be interested to hear your thoughts on this post which details a combination of "near-future" military technologies. Perhaps you'll agree that the technologies on this list could be built in a few months or years by a developed nation with the help of superintelligent AI? Then the crux would be whether this tech would allow that nation to take over the world. I personally think that military takeover scenarios are unlikely because there are much easier and safer methods, but I still think military takeover is at least on the table -- crazier things have happened in history.
That said, I don't concede the point -- You are right that it would take modern humans many years to build nanofactories etc. but I don't think this is strong evidence that a superintelligence would also take many years. Consider video games and speedrunning. Even if speedrunners don't allow themselves to use bugs/exploits, they still usually go significantly faster than reasonably good players. Consider also human engineers building something that is well-understood already how to build vs. building something for the first time ever. The point is, if you are really smart and know what you are doing, you can do stuff much faster. You said that a lot of experimentation and experience is necessary -- well, maybe it's not. In general there's a tradeoff between smarts and experimentation/experience; if you have more of one you need less of the other to reach the same level of performance. Maybe if you crank up smarts to superintelligence level -- so intelligent that the best human geniuses seem a rounding error away from the average -- you can get away with orders of magnitude less experimentation/experience. Not for everything perhaps, but for some things. Suppose there are N crazy sci-fi technologies that an AI could use to get a huge advantage: nanofactories, fusion, quantum shenanigans, bioengineering ... All it takes if for 1 of them to be such that you can mostly substitute superintelligence for experimentation. And also you can still do experimentation, and you can do it much faster than humans do it too because you know what you are doing. Instead of toying around until hypotheses gradually coalesce in your brain, you can begin with a million carefully crafted hypotheses consistent with all the evidence you've seen so far and an experiment regime designed to optimally search through the space of hypotheses as fast as possible.
I expect it to take somewhere between a day and five years to go from what you might call human-level AI to nanobot swarms. Perhaps this isn't that different from what you think? (Maybe you'd say something like 3 to 10 years?)
History has many examples of people ruling from behind the throne, so to speak. Often they have no official title whatsoever, but the people with the official titles are all loyal to them. Sometimes the people with the official titles do rebel and stop listening to the power behind the throne, and then said power behind the throne loses power. Other times, this doesn't happen.
AGI need not rule from behind the scenes though. If it's charismatic enough it can rule over a group of Blake Lemoines. Have you seen the movie Her? Did you find the behavior of the humans super implausible in that movie -- no way they would form personal relationships with an AI, no way they would trust it?
It currently looks like most future AIs, and in particular AGIs, will have been trained on reading the whole internet & chatting to millions of humans over the course of several months. So, that's how they'll gain those skills.
(But also, if you are really good at generalizing to new tasks/situations, maybe manipulation of humans is one of the things you can generalize to. And if you aren't really good at generalizing to new tasks/situations, maybe you don't count as AGI.)
So far all I've done is critique your arguments but hopefully one day I'll have assembled some writing laying out my own arguments on this subject.
Anyhow, thanks again for writing this! I strongly disagree with your conclusions but I'm glad to see this topic getting serious & thoughtful attention.
Noting that the passage you quote in your appendix from my report isn't my definition of the type of AI I'm focused on. I'm focused on AI systems with the following properties:
See section 2.1 in the report for more in-depth description.
Just to mention that with sufficiently good simulation technology, experimental data may not be necessary, and if experimental data sets your timescale then things could happen a lot faster than you're estimating. We don't have that tech now, but in at least some domains it has the shape of something that could be solved with lots of cognitive resources thrown at the problem.
I'm thinking specifically about simulating systems of large (but still microscopic) numbers of atoms, where we know the relevant physical laws and mostly struggle to approximate them in realistic ways.
My intuition here is rough, but I think the core factors driving it are:
I agree about the difficulty of developing major new technologies in secret. But you seem to be mostly overstating the problems with accelerating science. E.g.: