Our forthcoming AI Safety book

by len.hoang.lnh11 min read30th Aug 201912 comments



Dear all,

We, El Mahdi El Mhamdi and Lê Nguyên Hoang, are about to release a book in French on AI safety. The deadline for the final version of the manuscript is September 30. Next, we will be working on the English translation of the book. But before all that, we would love to have a few feedbacks. Hence this article.

The book defends three main theses and provides an understandable explanation of key technical aspects of AI Safety for a wide audience.

Thesis 1. Making AIs beneficial is urgent.
Thesis 2. Making AIs beneficial is a huge challenge.

Combining these two theses, we then conclude with the main thesis.

Thesis 3. All sorts of talents should be put in the best conditions to help make AIs beneficial.

Defending Thesis 3 is really the goal of the book. We want to convince readers that a lot more focus and investments should be made to encourage and allow all kinds of talents, technical and nontechnical, to contribute actively and efficiently to make AIs beneficial.

To this end, we then detail Theses 1 and 2 with further arguments.

Making AIs beneficial is urgent

Chapter 2 insists on the fact that AIs are already widespread and hugely influential. We present several reasons for this. Most notably, AIs process huge amounts of data at minimal costs, with now reasonable, and sometimes superhuman, performances. In particular, we stress that today's most powerful AIs are probably (Facebook and YouTube's) recommender systems, which influence billions of people every day. For instance, there are now more views on YouTube than searches on Google. They add up to over 1 billion watch-time hours per day! This is an average of half an hour for the 2 billion users. Yet 70% of these views are results of recommendations by YouTube's AI.

Chapter 3 discusses the large-scale negative side effects of recommender systems, like privacy, biases, filter bubbles, polarization, addiction to social media, mental disorders, junk news, pandemics of anger, and so on. In particular, based on the cases of the spread of anti-vaccination on one hand, and of the spread of stigmatization of minorities in Myanmar on the other hand, we argue that AI (already) kills. Not "intentionally". But as a side effect of its attention maximization.

Chapter 4 takes a step back at the notion of information. We argue that the critical role of information, in science and civilizations, is neglected. In fact, recent advancements in physics, biology or economics are often related to greater focus on information. Meanwhile, some of greatest breakthroughs in history, like the invention of language, writing or printing, consist of a greater mastery of information. Finally, these days, it seems that nearly all high-paying jobs are mostly information processing jobs. Given that AIs are information processing tools, they seem to be bound to completely upset our societies. It seems urgent to be aware of and to direct the upcoming upheavals towards good.

Chapter 5 then argues against the possibility to greatly slow down the pace of AI progress, mostly because of financial and political incentives. But also, we argue, because of (perceived) moral incentives. From healthcare to energy, from research to activism, we argue that there are enormous benefits to developing AIs. In fact, if we are concerned by AIs, we argue, our focus should probably not be to slow down the progress of any particular AI. It probably should rather be to make AIs, especially influential AIs, beneficial, as argued for example by OpenAI.

Chapter 6 addresses the possibility of human-level AI. We first insist on the fact that such an AI's side effects should be expected to be much larger than are today's AIs'. Next, based on surveys of experts, past performances of expert predictions and further more theoretical considerations, we argue (very conservatively) that the probability of human-level AI by 2025 should be given a probability of at least 1%. We argue that this is definitely sufficient to be extremely worried about human-level AI (though, to avoid controversies with human-level AI skeptics, we do make a point that our claim is not necessary in order to defend Thesis 1). This concludes our defense of Thesis 1. We then move on to Thesis 2.

Making AIs beneficial is a huge challenge

Chapter 7 discusses the risk of AI race, and the constraints that this AI race implies. We argue that, because of this inevitable race, we cannot demand too constraining constraints on AIs. Or put differently, there are constraints on which constraints can be required to make AIs beneficial. In particular, to determine which constraints should be required, it seems essential to have an in-depth technical understanding of AIs. We also discuss ways to mitigate the AI race, by discussing the advantages of monopolies in terms of long-term planning and safety incentives, among other things. We also make the distinction between making beneficial AIs and making AIs beneficial. The latter seems much harder.

Chapter 8 addresses the control problem. We insist on the fact that this is much harder than one might naively think, by focusing on the example of the YouTube recommender system. You would probably need to convince a huge fraction of YouTube's board to achieve the interruption of YouTube's AI. This cannot be done easily. We even argue that designing a control switch may even be a flaw for AI safety. Indeed, any switch might then be activated by some tired, drunk or blackmailed engineer, or even by some malicious hacker. In software security, such a "backdoor" is often regarded as a potential breach to be avoided. In fact, we argue that AIs should be made safe despite humans.

Chapter 9 (finally!) introduces machine learning. We argue that today's and tomorrow's most influential AIs will likely rely on (something similar to) the reinforcement learning framework. Essentially, each AI will be associated with a reward mechanism, and will choose actions that seem to maximize discounted future expected rewards. We explain the basics of this in a (hopefully) understandable language. We also discuss exploitation versus exploration, and unsafe exploration.

Chapter 10 stresses the importance of the objective function (= reward mechanism) in reinforcement learning. We stress the difficulty of designing adequate rewards, by illustrating Goodhart's law and reward hacking. We also present the orthogonality thesis, instrumental goals and instrumental convergence. We finally argue that AI alignment, i.e. making sure the AI's objective reflects humans' values, is nearly a necessary and sufficient condition for making powerful AIs robustly beneficial. We then move on to ideas and challenges to implement robust AI alignement.

A roadmap towards beneficial AIs

Chapter 11 insists on the importance of quality data and quality inference from the data to do good (and avoid doing harm). We stress numerous difficulties, like privacy and adversarial machine learning. We also argue for the importance of modelling uncertainty. We argue for more research in this direction.

Chapter 12 then moves on the difficulty to agree on what objective to assign to AIs. It stresses the relevancy of voting systems and so-called social choice theories to achieve agreement in a limited amount of time. Based on the Moral Machine experiment, we present several additional difficulties, like biased voter demographics and limited computational power. We discuss ideas to mitigate these issues, like inverse reinforcement learning and heuristics. Again, more research in this direction seems desirable.

Chapter 13 then argues that we should not design AI's goals based on humans' (declared) preferences. In particular, we stress the presence of numerous cognitive biases and inconsistencies in our moral judgments. Instead, we argue for Yudkowsky's coherent extrapolated volition. We also argue that the Moral Machine experiment could be regarded as a primitive form of coherent extrapolated volition, though still unsatisfactory and limited to a very restricted kind of moral dilemma. We also discuss at length normative moral uncertainty, by arguing for instance that we should be aware of our difficulty to distinguish preferences (what we want) from volitions (what we would want to want). Unfortunately, there seems to be little research on this problem.

Chapter 14 discusses the risk of wireheading, i.e. the AI hacking its own reward mechanism. Because of this, we argue that an AI should not be given directly the rewards we want it to maximize. Instead, we argue that the rewards should be such that the AI will want to protect, and even enhance, the reward mechanism. We make parallels with how this applies to humans as well, who might hijack their reward mechanism by, say, taking drugs. In fact, we argue that designing incentive-compatible rewards is perhaps the most critical (and most neglected) challenge to make AIs robustly beneficial. We believe that a lot more attention should be given to this problem.

Chapter 15 adds another challenge: decentralization. If all computations were made on a single machine, then a crash of this machine would break down the AI. To avoid this, today's large-scale AIs (like the YouTube recommender system) are already widely distributed among a large number of different machines. We argue that future AIs will likely be similar, which raises additional difficulties, like Byzantine fault tolerance or specialized reward mechanism design. Research on distributed machine learning has begun to flourish recently. But it seems that many of the questions we raise have not been sufficiently addressed yet. This concludes the defense of Thesis 2.

Remarks and conclusion

The book ends with two chapters somewhat different. Chapter 16 promotes what we called "computational moral philosophy". It essentially consists of combining moral philosophy with computer science. In particular, we stress the importance of data-driven approach to moral philosophy. We also use complexity theory to discuss additional pragmatic constraints on moral philosophy. This chapter attempts a response to the need for a "philosophy on a deadline" that technology creates.

Finally, Chapter 17 concludes by inviting readers to reflect further on AIs and on the theses of the book (which we were told to do more within the book as well). We also suggest a wide range of possible actions to contribute to make AIs beneficial, like funding research, raising awareness (with care!) especially among talents, valuing ethics and safety, learning and teaching machine learning, organizing discussion groups and following content creators like 80,000 Hours or the EA forum :)

In particular, we stress the need to have all sorts of different expertises working on AI alignment, like social scientists, psychologists, economists, mathematicians, legislators, jurists, investors, entrepreneurs, managers, directors and so on.


While we both strongly believe that the book is very likely to be overall greatly beneficial, concerns have been brought to our attention about the possibility that our book may not be sufficiently robustly beneficial. It may have undesirable side effects. The main concern that we agree with is the risk of turning AI Safety into a political debacle, where subtle and nuanced ideas are torn apart. In particular, our current preferred choice of title has been hotly debated and questioned, especially by other EAs.

Right now, it is "AI kills - The fabulous enterprise to make it robustly beneficial". We will be making sure that the subtitle will be written in a particularly large font on the cover. Moreover, in much of the book, especially in the last chapter, we have done our best to show excitement and enthusiasm about this "fabulous enterprise". In fact, we believe that making AIs robustly beneficial may be the greatest and most intellectually stimulating enterprise that mankind will ever have to tackle!

Evidently, the "AI kills" part is designed to be clickbait. We do hope to reach a wide audience, which we regard as desirable to accelerate the spread of AI ethics in all sorts of corporations. Evidently, we expect the title to be misused and abused by many people. But we are confident that a 3-minute discussion with a calm person is sufficient to convince them of the relevancy of the title (see Chapter 3). Moreover, we believe that focusing on the example of YouTube's AI will be helpful to move beyond the Terminator cliché, and to stress the problem of side effects. Nevertheless, we more than welcome your feedbacks on this hotly debated topic. In particular, we are still seeking better alternatives. So far though, it seems to us that the current title maximizes quite well the goal of convincing a sufficiently large audience of Thesis 3.

EDIT: Thanks to your useful feedbacks, we decided to change the title into something like "The fabulous enterprise to make artificial intelligence robustly beneficial". In French: "Le fabuleux chantier pour rendre l'intelligence artificielle robustement bénéfique".

Evidently, we also welcome any of your feedbacks on other aspects of the book. In particular, if you feel that we may have missed out on some important topic relevant to AI alignment, or if you think that there is a point we might be skipping over too quickly, please let us know (though, as you can guess, a lot more details are present in the book).

Unfortunately, so far, the book is only in French. But if you can read French and would like to (p)review our book, please feel free to get in touch with us (our emails can easily be found, for instance on the EPFL website).

Thank you so much for your attention!

Mahdi and Lê.


12 comments, sorted by Highlighting new comments since Today at 2:54 PM
New Comment

I appreciate you seeking feedback here. A book targeted at the general public seems very well placed to shape the discussion in many unintended ways.

Evidently, the "AI kills" part is designed to be clickbait. We do hope to reach a wide audience, which we regard as desirable to accelerate the spread of AI ethics in all sorts of corporations. Evidently, we expect the title to be misused and abused by many people. But we are confident that a 3-minute discussion with a calm person is sufficient to convince them of the relevancy of the title (see Chapter 3).

I wonder why you think the sensationalist title is worth it. Less sensationalist title will probably cause

  • less unproductive debate
  • lower risk for unintendedly causing the discussion around AI to go down a confused and politicized path
  • less people will read it
  • but the people that will read it are probably more informed and interested to have a sober discussion

Can you explain why you think that AI ethics being discussed in all sorts of corporations is very useful? My impression is that AI Safety mostly needs more academic talent directed to research, and not being another hot topic in corporations that don‘t even do AI research.

This is a good point. The book focuses a lot on research questions indeed.

We do see value in many corporations discussing AI ethics. In particular, there seems to be a rise of ethical discussions within the big tech companies, which we hope to encourage. In fact, in Chapter 7, we urge AI companies like Google and Facebook to, not only take part of the AI ethics discussion and research, but to actively motivate, organize and coordinate it, typically by sharing their AI ethics dilemmas and perhaps parts of their AI codes. In a sense, they already started to do so.

Another point is that, given our perceived urgency of AI Safety, it seems that it may be useful to reach out to academic talents in many different ways. Targeted discussions do improve the quality of the discussions. But we fear that they may not "scale" sufficiently. We feel that some academics might be quite receptive to reflecting on the public discussion. But we may be underestimating the difficulty to make this discussion productive...

(I have given a large number of public talks, and found it quite easy to raise the concerns of the book for all sorts of audiences, including start-ups / tech companies, but I do greatly fear what could happen with medias...)

I should add that the book really goes on and on to encourage calm thinking and fruitful discussions on the topic. We even added a section in Chapter 1, where we apologize for the title and clarify the purpose of the book. We also ask readers to be themselves pedagogical and benevolent when criticizing or defending the theses of the book. But clearly, such contents of the book will only have an impact on those who actually read the book.

Anyways, thanks for your comment. We're definitely pondering it!

Just registering that I'm not convinced this justifies the title.

Well, you were more than right to do so! You (and others) have convinced us. We changed the title of the book :)

Dear all, just an announcement, we settled on the title dilemma, the final title is:

"Le fabuleux chantier pour rendre l'intelligence artificielle robustement bénéfique"

which in the upcoming english version will be something like "The fabulous endeavour to make AI robustly beneficial".

keep providing feedback as we are still doing the last round of edits on the manuscript.

thanks a lot.

Lê and Mahdi

Thank you for changing the title; I think this is significantly better.

Chapter 7 discusses the risk of AI race, and the constraints that this AI race implies. We argue that, because of this inevitable race, we cannot demand too constraining constraints on AIs.

I‘ve heard arguments that one wants to be careful when talking about AI races, as quotes like yours arguing about an inevitable race might fuel race dynamics further. Race dynamics are worrying and we should discuss the risk, of course. But maybe a book for the general audience, which I expect will jump on issues like the associated political conflicts as they are much easier to understand than technical challenges, might not be the best place for this discussion.

Can you say any more about the circumstances under which your book is being published? What kinds of books does your published normally release, if you are working with one? What audience do you plan to target?

Also, for other users' reference, "EPFL" refers to a bilingual French/English university in Switzerland.

The book will be published by EDP Sciences. They focus a lot on textbooks. But they also work on outreach books. I published my first book with them on Bayesianism.

We hope to reach out to all sorts of people who are intrigued by AI but do not have any background on the topic. We also hope that more technical readers will be interested in the book to have an overview on AI Safety.

I should point out that I run a YouTube channel, whose audience will likely be the base audience of the book too.

Hello Lê,

I think it's not the point of your book, but you speak of AI "killing", and reading your article I don't see any sign of showing how AI has proved being positive & beneficial as of today. As your book is directed toward a wide public, you should show both sides (even if I understand that the point of your book is to emphasize on challenges), to debunk both AI hype and AI fear. Moreover, I think you're impersonating (=personifier) too much "AI" (as you sometimes did in your (great) series of videos). When you say 'AI kills', I find it important too remind that an AI is only a function which makes a prediction from input variables, nothing more (even if it has big consequences, as you rightly say).

Hope you find this constructive!


This is a fair point. We do not discuss much the global improvement of the world. I guess that we try to avoid upsetting those who have a negative vision of AI so far.

However, Chapter 5 does greatly insist on the opportunities of (aligned) AIs, in a very large number of fields. In fact, we argue that there is a compelling argument to say that fighting AI progress is morally wrong (though, of course, there is the equally compelling flip-side of the argument if one is concerned about powerful AIs...).

We should probably add something about the personification of AI. This indeed has negative side effects. But if pondered adequately, especially for reinforcement learning AIs, it is a very useful way to think about AIs and to anticipate their actions.

Thanks for the comment, Paul!

Why do you think people with a negative vision of AI would be upset by you mentioning positive applications?