Cross-posted from my new blog.
The last several years have witnessed a strong rise of activity on the topic of AI safety. Institutional and academic support has vindicated several elements of the embryonic Friendly AI research program. However, I believe that the degree of attention it has received is undue when compared to other aspects of artificial intelligence and the far future. It resembles the concept of an “availability cascade”, defined by Wikipedia as follows:
An availability cascade is a self-reinforcing cycle that explains the development of certain kinds of collective beliefs. A novel idea or insight, usually one that seems to explain a complex process in a simple or straightforward manner, gains rapid currency in the popular discourse by its very simplicity and by its apparent insightfulness. Its rising popularity triggers a chain reaction within the social network: individuals adopt the new insight because other people within the network have adopted it, and on its face it seems plausible. The reason for this increased use and popularity of the new idea involves both the availability of the previously obscure term or idea, and the need of individuals using the term or idea to appear to be current with the stated beliefs and ideas of others, regardless of whether they in fact fully believe in the idea that they are expressing. Their need for social acceptance, and the apparent sophistication of the new insight, overwhelm their critical thinking.
In this post I’m going to argue for a different approach which should bring more balance to the futurist ecosystem. There are significant potential problems which are related to AI development but are not instances of value alignment and control, and I think that they are more deserving of additional effort at the margin.
The prospects for a single superintelligence
Bostrom (2016) says that a recursively self-improving artificial general intelligence with a sufficient lead over competitors would have a decisive strategic advantage that is likely to ensure that it controls the world. While this is plausible, it is not inevitable and may not be the most likely scenario.
Little argument has been given that this scenario should be our default expectation as opposed to merely plausible. Yudkowsky (2013) presents an argument that the history of human cognitive evolution indicates that an exponential takeoff in intelligence should be expected, though the argument has yet to be formally put together and presented. Computer scientists frequently refer to complexity theory, which implies that getting better at problem solving rapidly becomes very difficult, towards asymptotic limits. In broader economic strokes, Bloom et al (2017) argue that there is a general trend of diminishing returns to research. Both these points suggest that for an agent to acquire a decisive strategic advantage in cognition would either take a very long time or not happen at all.
It seems to me, intuitively, that if superintelligence is the sort of thing that one agent cannot obtain rapidly enough to outcompete all other agents, then it’s also the sort of thing which cannot be obtained rapidly enough by a small subset of agents, like three or four of them. So it will be widespread, or alternatively, it cannot be obtained at all, leaving billions of humans or other agents at the top of the hierarchy. So while I don’t think that a true multi-agent scenario (with scores or more agents, as is typically meant by the term in game theory) is inevitable in the event that there is no single superintelligence, I think it’s conditionally probable.
The Importance of Multi-agent Analysis: Three Scenarios
Whole brain emulation and economic competition
Robin Hanson (2016) writes that the future of human civilization will be a fast-growing economy dominated by whole brain emulations. The future looks broadly good in this scenario given approximately utilitarian values and the assumption that ems are conscious, with a large growing population of minds which are optimized for satisfaction and productivity, free of disease and sickness. Needless to say, without either of the above premises, the em scenario looks very problematic. But other aspects of it would potentially lead to suboptimal utility: social hierarchy, wealth inequality and economic competition. Also, while Hanson gives a very specific picture of the type of society which “ems” will inhabit, he notes that the conjunction of all his claims is extremely unlikely, so there is room for unforeseen issues to arise. It is plausible to me that the value of an em society is heavily contingent upon how ems are built, implemented and regulated.
However, the idea of whole brain emulation as a path to general artificial intelligence has been criticized and is a minority view. Bostrom (2016) argues that there seem to be greater technological hurdles to em development than to other kinds of progress in intelligence. The best current AI is far more capable than the best current emulation (OpenWorm). Industry and academia seem to be placing much more effort into even the very speculative strains of AI research than into emulation.
The future of evolution
If humans are not superseded by a monolithic race of ems, then trends in technological progress and evolution might have harmful effects upon the composition of the population. Bostrom (2009) writes that “freewheeling evolutionary developments, while continuing to produce complex and intelligent forms of organization, lead to the gradual elimination of all forms of being that we care about.” With the relaxation of contemporary human social and biological constraints, two possibilities are plausible: a Malthusian catastrophe where the population expands until welfare standards are neutral or negative, and the evolution of agents which outperform existing ones but without the same faculties of consciousness. Either of these scenarios would entail the extinction of most or all that we find valuable.
Andres Gomez Emilsson also writes that this is a possibility on his blog, saying:
I will define a pure replicator, in the context of agents and minds, to be an intelligence that is indifferent towards the valence of its conscious states and those of others. A pure replicator invests all of its energy and resources into surviving and reproducing, even at the cost of continuous suffering to themselves or others. Its main evolutionary advantage is that it does not need to spend any resources making the world a better place.
Bostrom does not believe that the problem is unavoidable, saying that a ‘singleton’ could combat this process. By singleton he refers to not just a superintelligence but also to any global governing body or even a set of moral codes with the right properties. He writes that such an institution should implement “a coordinated policy to prevent internal developments from ushering it onto an evolutionary trajectory that ends up toppling its constitutional agreement, and doing this would presumably involve modifying the fitness function for its internal ecology of agents.”
Augmented intelligence and military competition
Daniel McIntosh (2010) writes that the near-inevitable adoption of transhuman technologies poses a significant security dilemma due to the political, economic, and battlefield advantages provided by agents with augmented cognitive and physical capabilities. Critics who argue for restraint “tend to deemphasize the competitive and hedonic pressures encouraging the adoption of these products.” Not only is this a problem on its own, but I see no reason to think that the conditions described above wouldn’t apply for scenarios where AI agents turned out to be the primary actors and decisionmakers rather than transhumans or posthumans.
Whatever the type of agent, arms races in future technologies would lead to opportunity costs in military expenditures and would interfere with the project of improving welfare. It seems likely that agents designed for security purposes would have preferences and characteristics which fail to optimize for the welfare of themselves and their neighbors. It’s also possible that an arms race would destabilize international systems and act as a catalyst for warfare.
These trends might continue indefinitely with technological progress. McIntosh rejects the assumption that a post-singularity world would be peaceful:
In a post-singularity, fifth-generation world, there would always be the possibility that the economic collapse or natural disaster was not the result of chance, but of design. There would always be the possibility that internal social changes are being manipulated by an adversary who can plan several moves ahead, using your own systems against you. The systems themselves, in the form of intelligences more advanced than we can match, could be the enemy. Or it might be nothing more than paranoid fantasies. The greatest problem that individuals and authorities might have to deal with may be that one will never be sure that war is not already under way. Just as some intelligence analysts cited the rule that “nothing is found that is successfully hidden” – leading to reports of missile gaps and Iraqi WMD – a successful fifth generation war would [be] one that an opponent never even realized he lost.
Almost by definition, we cannot precisely predict what will happen in a post-singularity world or develop policies and tools that will be directly applicable in such a world. But this possibility highlights the importance of building robust cooperative systems from the ground up, rather than assuming that technological changes will somehow remove these problems. A superintelligent agent with a sufficient advantage over other agents would presumably be able to control a post-singularity world sufficiently to avoid this, but as has been noted, it’s not clear that this is the most likely scenario.
Multi-agent systems are neglected
The initiatives and independent individuals close to the EA sphere who are working towards developing reliable, friendly AI include the Machine Intelligence Research Institute, the Future of Humanity Institute, Berkeley’s Center for Human-Compatible AI, Roman Yampolskiy, and all the effective altruists who are students of AI as far as I can tell. There is less attention towards multi-agent outcomes, as Robin Hanson, Nick Bostrom and Andres Gomez Emilsson seem to be the only ones who have done research on it (and Bostrom seems to be focused on superintelligence), while the Foundational Research Institute has given a general nod towards looking into this direction with its concerns over AI suffering, cooperation, and multipolar takeoffs.
The disparity is preserved as you look farther afield. Pragmatic industry-oriented initiatives to make individual AI systems safe, ethical and reliable include the Partnership on AI among the six major tech companies, some attention from the White House on the subject, and a notable amount of academic work at universities. The work in universities and industry from researchers on multi-agent systems and game theory seems to be entirely focused on pragmatic problems like distributed computational systems and traffic networks; only a few researchers have indicated the need for analyzing multi-agent systems of the future, let alone actually done so. Finally, in popular culture, Bostrom’s Superintelligence has received 319 Amazon reviews to Age of Em’s 30 despite being published at a similar time, and the disparity in general media and journalism on the two general topics seems comparably large.
I do not expect this to change in the future. Multi-agent outcomes are varied and complex, while superintelligence is highly available and catchy. My conclusion is that the former is significantly more neglected than the latter.
Is working on multi-agent systems of the future a tractable project?
The main point of Scott Alexander’s “Meditations on Moloch” is essentially that “the only way to avoid having all human values gradually ground down by optimization-competition is to install a Gardener over the entire universe who optimizes for human values.” In other words, given the problems which have been described above, the only way to actually achieve a really valuable society is to have a singleton which has the right preferences and keeps everyone in line.
This is not different from what Bostrom argues. But remember that the singleton need not be a superintelligence with a decisive strategic advantage. This is fortunate, since it is plausible that computational difficulties will prevent such an entity from ever existing. Instead, the Gardener of the universe might be a much more complex set of agents and institutions. For instance, Peter Railton and Steve Petersen are (I believe) both working on arguments that agents will be linked via a teleological thread where they accurately represent the value functions of their ancestors. We’ll need to think more carefully about how to implement this sort of thing in a way that reliably maximizes welfare.
This is why analysis in multi-agent game theory and mechanism design is important. The very idea behind game theory in general is that you can find useful conclusions by abstracting away from the details of a situation and only looking at players as abstract entities with basic preferences and strategies. This means that analyses and institutions are likely to be pertinent to a wide range of scenarios of technological progress.
While ideas of preventing evolution, economic competition and arms races sound extremely difficult, there is some historical precedent for human institutions to install robust regulations and international agreements on this type of issue. Admittedly, none of it has been on nearly the same scale that would be required to solve the problems described above. But due to the preliminary stage of this line of research, I think that additional research, or literature review at minimum, is needed at least to investigate the various possibilities which we might pursue. Also, there is a similar problem with cooperation when it comes to ordinary AI safety anyway (Armstrong et al 2013).
Conclusion and proposal
I believe I have shown that recent interest in AI and the future of humanity has disproportionately neglected the idea of working on a broader range of futures in which society is not controlled by a single agent. There is still value in AI safety work insofar as alignment and control would help us with building the right agents in multi-agent scenarios, but there are other parts of the picture which need to be explored.
First, there are specific questions which should be answered. How likely are the various scenarios described above, and how can we ensure that they turn out well? Should we prefer that society is governed by a superintelligence with a decisive strategic advantage, and if so, then how much of a priority is it?
Second, there are specific avenues where practical work now can uncover the proper procedures and mindsets for increasing the probability of a positive future. Aside from setting precedents for international cooperation on technical issues, we can start steering the course of machine ethics as it is implemented in modern-day systems. Better systems of machine ethics which don’t require superintelligence to be implemented (as coherent extrapolated volition does) are likely to be valuable for mitigating potential problems involved with AI progress, although they won’t be sufficient (Brundage 2014). Generally speaking, we can apply tools of game theory, multi-agent systems and mechanism design to issues of artificial intelligence, value theory and consciousness.
Given the multiplicity of the issues and the long timeline from here to the arrival of superhuman intelligence, I would like to call for a broader, multifaceted approach to the long term future of AI and civilization. Rather than having a singleminded focus on averting a particular failure mode, it should be a more ambitious and positive project towards a pattern of positive and self-reinforcing interactions between social institutions and intelligent systems, supported by a greater amount of human and financial capital.
References
Armstrong, Stuart et al (2016). Racing to the Precipice: a Model of Artificial Intelligence Development. AI & Society.
Bloom, Nicholas et al (2017). Are Ideas Getting Harder To Find?
Bostrom, Nick (2009). The Future of Human Evolution. Bedeutung.
Bostrom, Nick (2016). Superintelligence. Oxford University Press.
Brundage, Miles (2014). Limitations and Risks of Machine Ethics. Journal of Experimental & Theoretical Artificial Intelligence.
Hanson, Robin (2016). Age of Em. Oxford University Press.
McIntosh, Daniel (2010). The Transhuman Security Dilemma. Journal of Evolution and Technology.
Yudkowsky, Eliezer (2013). Intelligence Explosion Microeconomics.
It's great to see people thinking about these topics and I agree with many of the sentiments in this post. Now I'm going to write a long comment focusing on those aspects I disagree with. (I think I probably agree with more of this sentiment than most of the people working on alignment, and so I may be unusually happy to shrug off these criticisms.)
Contrasting "multi-agent outcomes" and "superintelligence" seems extremely strange. I think the default expectation is a world full of many superintelligent systems. I'm going to read your use of "superintelligence" as "the emergence of a singleton concurrently with the development of superintelligence."
I don't consider the "single superintelligence" scenario likely, but I don't think that has much effect on the importance of AI alignment research or on the validity of the standard arguments. I do think that the world will gradually move towards being increasingly well-coordinated (and so talking about the world as a single entity will become increasingly reasonable), but I think that we will probably build superintelligent systems long before that process runs its course.
On total utilitarian values, the actual experiences of brain emulations (including whether they have any experiences) don't seem very important. What matters are the preferences according to which emulations shape future generations (which will be many orders of magnitude larger).
Evolution doesn't really select against what we value, it just selects for agents that want to acquire resources and are patient. This may cut away some of our selfish values, but mostly leaves unchanged our preferences about distant generations.
(Evolution might select for particular values, e.g. if it's impossible to reliably delegate or if it's very expensive to build systems with stable values. But (a) I'd bet against this, and (b) understanding this phenomenon is precisely the alignment problem!)
(I discuss several of these issues here, Carl discusses evolution here.)
It seems like you are paraphrasing a standard argument for working on AI alignment rather than arguing against it. If there weren't competitive pressure / selection pressure to adopt future AI systems, then alignment would be much less urgent since we could just take our time.
There may be other interventions that improve coordination/peace more broadly, or which improve coordination/peace in particular possible worlds etc., and those should be considered on their merits. It seems totally plausible that some of those projects will be more effective than work on alignment. I'm especially sympathetic to your first suggestion of addressing key questions about what will/could/should happen.
Over time it seems likely that society will improve our ability to make and enforce deals, to arrive at consensus about the likely consequences of conflict, to understand each others' situations, or to understand what we would believe if we viewed others' private information.
More generally, we would like to avoid destructive conflict and are continuously developing new tools for getting what we want / becoming smarter and better-informed / etc.
And on top of all that, the historical trend seems to basically point to lower and lower levels of violent conflict, though this is in a race with greater and greater technological capacity to destroy stuff.
I would be more than happy to bet that the intensity of conflict declines over the long run. I think the question is just how much we should prioritize pushing it down in the short run.
I disagree with this. See my earlier claim that evolution only favors patience.
I do agree that some kinds of coordination problems need to be solved, for example we must avoid blowing up the world. These are similar in kind to the coordination problems we confront today though they will continue to get harder and we will have to be able to solve them better over time---we can't have a cold war each century with increasingly powerful technology.
This conclusion seems safe, but it would be safe even if you thought that early AI systems will precipitate a singleton (since one still cares a great deal about the dynamics of that transition).
By "don't require superintelligence to be implemented," do you mean systems of machine ethics that will work even while machines are broadly human level? That will work even if we need to solve alignment prior long before the emergence of a singleton? I'd endorse both of those desiderata.
I think the main difference in alignment work for unipolar vs. multipolar scenarios is how high we draw the bar for "aligned AI," and in particular how closely competitive it must be with unaligned AI. I probably agree with your implicit claim, that they either must be closely competitive or we need new institutional arrangements to avoid trouble.
I think the mandate of AI alignment easily covers the failure modes you have in mind here. I think most of the disagreement is about what kinds of considerations will shape the values of future civilizations.
At this level of abstraction I don't see how this differs from alignment. I suspect the details differ a lot, in that the alignment community is very focused on the engineering problem of actually building systems that faithfully pursue particular values (and in general I've found that terms like "teleological thread" tend to be linked with persistently low levels of precision).
Thanks for the comments.
Evolution favors replication. But patience and resource acquisition aren't obviously correlated with any sort of value; if anything, better resource-acquirers are destructive and competitive. The claim isn't that evolution is intrinsically "against" any particular value, it's that it's extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic. Furthermore, competitive dynamics lead to systematic failures. See the citation.
Shulman's post assumes that once somewhere is settled, it's permanently inhabited by the same tribe. But I don't buy that. Agents can still spread through violence or through mimicry (remember the quote on fifth-generation warfare).
All I am saying is that the argument applies to this issue as well.
The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict. Better technology yields better monitoring, but also better hiding - which is easier, monitoring ICBMs in the 1970's or monitoring cyberweapons today?
One of the most critical pieces of information in these cases is intentions, which are easy to keep secret and will probably remain so for a long time.
Yes, or even implementable in current systems.
The failure modes here are a different context where the existing research is often less relevant or not relevant at all. Whatever you put under the umbrella of alignment, there is a difference between looking at a particular system with the assumption that it will rebuild the universe in accordance with its value function, and looking at how systems interact in varying numbers. If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn't very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics. Figuring out how to optimize large systems of agents is absolutely not a simple matter of figuring out how to build one good agent and then replicating it as much as possible.
I don't think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with "well that research was silly anyway.")
Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which "multi-agent dynamics" do you think change the technical situation?
If evolution isn't optimizing for anything, then you are left with the agents' optimization, which is precisely what we wanted. I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where "anyone who wants to destroy the world has the option," as is the security dilemma, and so forth.)
We are probably on the same page here. We should figure out how to build AI systems so that they do what we want, and we should start implementing those ideas ASAP (and they should be the kind of ideas for which that makes sense). When trying to figure out whether a system will "do what we want" we should imagine it operating in a world filled with massive numbers of interacting AI systems all built by people with different interests (much like the world is today, but more).
You're right.
Unsurprisingly, I have a similar view about the security dilemma (e.g. think about automated arms inspections and treaty enforcement, I don't think the effects of technological progress are at all symmetrical in general). But if someone has a proposed intervention to improve international relations, I'm all for evaluating it on its merits. So maybe we are in agreement here.
Parenthesis is probably true, e.g. most of MIRI's traditional agenda. If agents don't quickly gain decisive strategic advantages then you don't have to get AI design right the first time; you can make many agents and weed out the bad ones. So the basic design desiderata are probably important, but it's just not very useful to do research on them now. Not familiar enough with your line of work to comment on it, but just think about the degree to which a problem would no longer be a problem if you can build, test and interact with many prototype human-level and smarter-than-human agents.
Aside from the ability to prototype as described above, there are the same dynamics which plague human society when multiple factions with good intentions end up fighting due to security concerns or tragedies of the commons, or when multiple agents with different priors interpret every new piece of evidence they see differently and so go down intractably separate paths of disagreement. FAI can solve all the problems of class, politics, economics, etc by telling everyone what to do, for better or for worse. But multiagent systems will only be stable with strong institutions, unless they have some other kind of cooperative architecture (such as universal agreement in value functions, in which case you now have the problem of controlling everybody's AIs but without the benefit of having an FAI to rule the world). Building these institutions and cooperative structures may have to be done right the first time, since they are effectively singletons, and they may be less corrigible or require different kinds of mechanisms to ensure corrigibility. And the dynamics of multiagent systems means you cannot accurately predict the long term future merely based on value alignment, which you would (at least naively) be able to do with a single FAI.
Well it leads to agents which are optimal replicators in their given environments. That's not (necessarily) what we want.
That too!
There's a lot of value in having an AI safety orthodoxy for coordination purposes; there's also a lot of value in this sort of heterodox criticism of the orthodoxy. Thanks for posting.
One additional area of orthodoxy that I think could use more critique is the community's views on consciousness. A few thoughts here (+comments): http://effective-altruism.com/ea/14t/principia_qualia_blueprint_for_a_new_cause_area/
Also: nobody seems to be really looking into the state of AI safety & x-risk memes inside of China. Whether they're developing a different 'availability cascade' seems hugely important and under-studied.
Optimizing for a narrower set of criteria allows more optimization power to be put behind each member of the set. I think it is plausible that those who wish to do the most good should put their optimization power behind a single criteria, as that gives it some chance to actually succeed. The best candidate afaik is right to exit, as it eliminates the largest possible number of failure modes in the minimum complexity memetic payload. Interested in arguments why this might be wrong.
Only if you assume that there are high thresholds for achievements.
I do not understand what you are saying.
Edit: do you mean, the option to get rid of technological developments and start from scratch? I don't think there's any likelihood of that, it runs directly counter to all the pressures described in my post.
right to exit means right to suicide, right to exit geographically, right to not participate in a process politically etc.
Isn't Elon Musk's OpenAI basically operating under this assumption? His main thing seems to be to make sure AGI is distributed broadly so no one group with evil intentions controls it. Bostrom responded that might be a bad idea, since AGI could be quite dangerous, and we similarly don't want to give nukes to everyone so that they're "democratized."
Multi-agent outcomes seem like a possibility to me, but I think the alignment problem is still quite important. If none of the AGI have human values, I'd assume we're very likely screwed, while we might not be if some do have human values.
For WBE I'd assume the most important things for its "friendliness" is that we upload people who are virtuous and our ability and willingness to find "brain tweaks" that increase things like compassion. If you're interested, here's a paper I published where I argued that we will probably create WBE by around 2060 if we don't get AGI through other means first: https://www.degruyter.com/view/j/jagi.2013.4.issue-3/jagi-2013-0008/jagi-2013-0008.xml
"Industry and academia seem to be placing much more effort into even the very speculative strains of AI research than into emulation." Actually, I'm gonna somewhat disagree with that statement. Very little research is done on advancing AI towards AGI, while a large portion of neuroscience research and also a decent amount of nanotechnology research (billions of dollars per year between the two) are clearly pushing us towards the ability to do WBE, even if that's not the reason that research is conducting right now.
Yes, but I mean they're not trying to figure out how to do it safely and ethically. The ethics/safety worries are 90% focused around what we have today, and 10% focused on superintelligence.
Great to see a nuanced different perspective I'd be interested in how work on existing multi-agent problems can be translated into improving the value-alignment of a potential singleton (reducing the risk of theoretical abstraction uncoupling from reality with).
Amateur question: would it help to also include back-of-the-envelop calculations to make your arguments more concrete?
Don't think so. It's too broad and speculative with ill-defined values. It just boils down to (a) whether my scenarios are more likely than the AI-Foom scenario, and (b) whether my scenarios are more neglected. There's not many other factors that a complicated calculation could add.
Just a comment on growth functions: I think a common prior here is that once we switch to computer consciousnesses, progress will go with Moore's law, which is exponential with doubling time of roughly 18 months (Ray Kurzweil says it is actually slow exponential growth in the exponent). Hanson sees the transition to a much shorter doubling time, something around one month. Others have noted that if the computer consciousnesses are making the progress and they are getting faster with Moore's law, you actually get a hyperbolic shape which goes to infinity in a finite time (around three years). Then you get to recursive self-improvement of AI, which could have a doubling time of days or weeks, and I think this is roughly the Yudkowsky position (though he does recognize that progress could get harder). I think this is the most difficult to manage. Then going the other direction from the Moore's law prior would be many economists who see continued exponential growth with the doubling time of decades. Then we have historical economists who think economic growth rate will go back to zero. Next you have the resource (or climate) doomsters who think there will be slow negative economic growth. Further down, you have faster catastrophes, which we might recover from. Finally, you have sudden catastrophes with no recovery. Quite the diversity in opinion: It would be an interesting project (or paper?) to try to plot this out.
One implication of a multi-agent scenario is that there would likely be enormous variety in the types of minds that exist, as each mind design could be optimised for a different niche. So in such a scenario, it seems quite plausible that each feature of our minds would turn out to be a good solution in at least a few situations, and so would be reimplemented in minds designed for those particular niches.