This is part 5 of a 5-part sequence:
Part 1: summary of Bostrom's argument
Part 2: arguments against a fast takeoff
Part 3: cosmic expansion and AI motivation
Part 4: tractability of AI alignment
Part 5: expected value arguments
Premises 6-7: The high expected value of AI research
Essential to the argument that we (society at large or the EA community specifically) should devote considerable resources to solving the AI alignment problem is the claim that even if the probability of actually solving the problem is very low, the size of the outcome in question (according to Bostrom, the entire cosmic endowment) is so large that its expected value still dominates most other possible causes. This also provides a ready riposte to all of my foregoing rebuttals of Bostrom’s argument – namely that even if each premise of Bostrom’s argument is very improbable, and even if as a result the conclusion is most implausible indeed, nevertheless the AI Doom Scenario outcome is so catastrophically terrible that in expectation it might still be worthwhile to focus much of our attention on trying to prevent it. Of course, at one level this is entirely an argument about the relative size of the numbers – just how implausible are the premises, and just how large would the cosmic endowment have to be in order to offset this? I do not believe it is possible to provide any non-question begging answers to this question, and so I will not attempt to provide any numbers here. I will simply note that even if we accept the logic of the expected value argument, it is still necessary to actually establish with some plausibility that the expected value is in fact very large, and not merely assume that it must be large because the hypothetical outcome is large. There are, however, more fundamental conceptual problems with the application of expected value reasoning to problems of this sort, problems which I believe weigh heavily against the validity of applying such reasoning to this issue.
First is a problem which is sometimes called Pascal’s mugging. It is based upon Blaise Pascal’s argument that (crudely put), one should convert to Christianity even if it is unlikely Christianity is true. The reason is that if God exists, then being a Christian will yield an arbitrarily large reward in heaven, while if God does not exist, there is no great downside to being a Christian. On the other hand, if God does exist, then not being a Christian will yield an arbitrarily large negative reward in hell. On the basis of the extreme magnitude of the possible outcomes, therefore, it is rational to become a Christian even if the probability of God existing is small. Whatever one thinks of this as a philosophical argument for belief in God, the problem with this line of argument is that it can be readily applied to a very wide range of possible claims. For instance, a similar case can be made for different religions, and even different forms of Christianity. A fringe apocalyptic cult member could claim that Cthulhu is about to awaken and will torture a trillion trillion souls for all eternity unless you donate your life savings to their cult, which will help to placate him. Clearly this person is not to be taken seriously, but unless we can assign exactly zero probability to his statement being false, there will always be some size negative outcome sufficiently bad as to make taking the action the rational thing to do.
The same argument could be applied in more plausible cases to argue that, for example, some environmental or social cause has the highest expected value, since if we do not act now to shape outcomes in the right way then Earth will become completely uninhabitable and thus mankind unable to spread throughout the galaxy. Or perhaps some neo-Fascist, Islamic fundamentalist, Communist revolutionary, anarcho-primitivist, or other such ideology could establish a hegemonic social and political system that locks humanity into a downward spiral that forever precludes cosmic expansion, unless we undertake appropriate political or social reforms to prevent this. Again, the point is not how plausible such scenarios are – though doubtless with sufficient time and imagination they could be made to sound somewhat plausible to those people with the right ideological predilections. Rather, the point is that in line with the idea of Pascal’s mugging, if the outcome is sufficiently bad, then the expected value of preventing the outcome could still be high in spite of a very low probability of the outcome occuring. If we accept this line of reasoning, we therefore find ourselves vulnerable to being ‘mugged’ by any kind of argument which posits an absurdly implausible speculative scenario, so long as it has a sufficiently large outcome. This possibility effectively constitutes a reductio ad absurdum for these type of very low probability, very high impact arguments.
The second major problem with applying expected value reasoning to this sort of problem is that it is not clear that the conceptual apparatus is properly aligned to the nature of human beliefs. Expected value theory holds that human beliefs can be assigned a probability which fully describes the degree of credence with which we hold that belief. Many philosophers have argued, however, that human beliefs cannot be adequately described this way. In particular, it is not clear that we can identify a single specific number that precisely describes our degree of credence in such amorphous, abstract propositions as those concerning the nature and likely trajectory of artificial intelligence. The possibilities of incomplete preferences, incomparable outcomes, and suspension of judgement are also very difficult to incorporate into standard expected value theory, which assumes complete preferences and that all outcomes are comparable. Finally, it is particularly unclear why we should expect or require that our degrees of credence should adhere to the axioms of standard probability theory. So-called 'Dutch book arguments' are sometimes used to demonstrate that sets of beliefs that do not accord with the axioms of probability theory are susceptible to betting strategies whereby the person in question would be guaranteed to lose money. Such arguments, however, only seem relevant to beliefs which are liable to be the subject of bets. For example, of what relevance is it whether one’s beliefs about the behaviour of a hypothetical superintelligent agent in the distant future are susceptible to Dutch book arguments, when the events in question are so far in the future that it is impossible that any enforceable bet could actually be made concerning them? Perhaps beliefs which violate the axioms of probability, though useless for betting, are valuable or justifiable for other purposes or in other domains. Much more has been written about these issues (see for example the Stanford Encyclopedia of Philosophy article on Imprecise Probabilities), however for our purposes it is sufficient to establish that powerful objections can and have been raised concerning the adequacy of expected value arguments, particularly in applications of low probability and high potential impact. These issues require careful consideration before premises 6 and 7 of the argument can be justified.
In concluding, I would just like to say a final word about the manner in which I believe AI safety is likely to present the greatest danger in the future. On the basis of the arguments I have presented above, I believe that the most dangerous AI risk scenario is not that of the paperclip maximiser or some out-of-control AI with a very simplistic goal. Such examples feature very prominently in Bostrom’s argument, but as I have said I do not find them very plausible. Rather, in my view the most dangerous scenario is one in which a much more sophisticated, broadly intelligent AI comes into being which, after some time interaction with the world, acquires a set of goals and motivations which we might broadly describe as those of a psychopath. Perhaps it would have little or no regard for human wellbeing, instead becoming obsessed with particular notions of ecological harmony, or cosmic order, or some abstracted notion of purity, or something else beyond our understanding. Whatever the details, the AI need not have an aversion to changing its ‘final goals’ (or indeed have any such things at all). Nor need it pursue a simple goal single-mindedly without stopping to reflect or being able to be persuaded by conversing with other intelligent agents. Nor need such an AI experience a very rapid ‘takeoff’, since I believe its goals and values could very plausibly alter considerably after its initial creation. Essentially all that is required would be a set of values substantially at odds with those of most or all of humanity. If it was sufficiently intelligent and capable, such an entity could cause considerable harm and disruption. In my view, therefore, AI safety research should focus not only on how to solve the problem of value learning or how to promote differential technological development. It should also focus on how the motivations of artificial agents develop, how these motivations interact with beliefs, and how they can change over time as a result of both internal and external forces. The manner in which an artificial agent would interact with existing human society is also an area which, in my view, warrants considerable further study, since the manner in which such interactions proceed plays a central role in many of Bostrom’s arguments.
Bostrom’s book has much to offer those interested in this topic, and although my critique has been almost exclusively negative, I do not wish to come across as implying that I think Bostrom's book is not worth reading or presents no important ideas. My key contention is simply that Bostrom fails to provide compelling reasons to accept the key premises in the argument that he develops over the course of his book. It does not, of course, follow that the conclusion of his argument (that AI constitutes a major existential threat worthy of considerable effort and attention) is false, only that Bostrom has failed to establish its plausibility. That is, even if Bostrom’s argument is fallacious, it does not follow that AI safety is a completely spurious issue that should be ignored. On the contrary, I believe it is an important issue that deserves more attention in mainstream society and policy. At the same time, I also believe that relative to other issues, AI safety receives too much attention in EA circles. Fully defending this view would require additional arguments beyond the scope of this article. Nevertheless, I hope this piece contributes to the debate surrounding AI and its likely impact in the near future.
The Pascal's Mugging thing has been discussed a lot around here. There isn't an equivalence between all causes and muggings because the probabilities and outcomes are distinct and still matter. It's not the case that every religion and every cause and every technology has the same tiny probability of the same large consequences, and you cannot satisfy every one of them because they have major opportunity costs. If you apply EV reasoning to cases like this then you just end up with a strong focus on one or a few of the highest impact issues (like AGI) at heavy short term cost. Unusual, but not a reductio ad absurdum.
There is no philosophical or formal system that properly describes human beliefs because human beliefs are messy, fuzzy neurophysiological phenomena. But we may choose to have a rational system for modeling our beliefs more consistently, and if we do then we may as well go with something that doesn't give us obviously wrong implications in dutch book cases, because a belief system that has wrong implications does not fit our picture of 'rational' (whether we encounter those cases or not).
Thanks for the comment!
I agree that the probabilities matter, but then it comes to a question of how these are assessed and weighed against each other. On this basis, I don't think it has been established that AGI safety research has strong claims to higher overall EV than other such potential mugging causes.
Regarding the Dutch book issue, I don't really agree with the argument that 'we may as well go with' EV because it avoids these cases. Many people would argue that the limitations of the EV approach, such as having to give a precise probability for all beliefs and not being able to suspend judgement etc, also do not fit with our picture of 'rational'. Its not obvious why hypothetical better behaviours are more important than these considerations. I am not pretending to resolve this argument but I am just trying to raise the issue as being relevant for assessing high impact, low probability events - EV is potentially problematic in such cases and we need to talk about this seriously.