Short-Term AI Alignment as a Priority Cause

by len.hoang.lnh7 min read11th Feb 202010 comments



In this post, I will argue that short-term AI alignment should be viewed as today's greatest priority cause, whether you are concerned by long-term AGI risks or not.

To do so, I will first insist on the fact that AIs are automating information collection, storage, analysis and dissemination; and that they are now doing a lot of this much better than humans. Yet, many of the priority cause areas in EA strongly depend on collecting, storing, analyzing and disseminating quality information. As of today, an aligned large-scale AI would thus be a formidable ally for EA.

In this post, I will particularly focus on the cases of public health, animal suffering, critical thinking and existential risks, since these are leading priority causes in EA.

The Power of Information

I already discussed this on LessWrong. But I fear that AI discussions are annoyingly too focused on futuristic robots, even within the EA community. In contrast, in this post, I propose to stress the growing role of algorithms and information processing in our societies.

It's indeed noteworthy that the greatest disruptions in human history have arguably been the invention of new information technologies. Language enabled coordination, writing enabled long-term information storage, printing press enabled scalable information dissemination, and computing machines enabled ultra-fast information processing. We also now have all sorts of sensors and cameras to scale data collection, and worldwide fiber-optics for super-reliable worldwide information communication.

Information technologies powered new sorts of economies, organizations, science discoveries, industrial revolutions, agricultural practices and product customization. They also moved our societies towards information societies. These days, essentially all jobs are information processing jobs, from the CEO of the largest company down to the baby-sitter. Scientists, journalists, managers, software developers, lawyers, doctors, workers, teachers, drivers, regulators, and even effective altruists — or me writing this post. All spend their days doing mostly information processing; they collect, store, analyze and communicate information.

When Yuval Noah Harari came to EPFL, his talk was greatly focused on information. "Those who control the flow of data in the world control the future, not only of humanity, but also perhaps of life itself", he said. This is because, according to Harari, "humans are hackable". As psychology keeps showing it, the information we are exposed to radically biases our beliefs, preferences and habits, with both short-term and long-term effects.

Now ask yourself. Today, who is the most in control the flow of information? Which entity holds more than any other, according to Harari, "the future of life"?

I would argue that, by far, algorithms have taken the control of the flow of information. Well, not all algorithms. Arguably, a handful of algorithms are controlling the flow of information more than all humans combined; and arguably, the YouTube algorithm is more in control of information than any other algorithm — with 1 billion watch-time hours per day for 2 billion users, 70% of which are results of recommendations.

And as algorithms become better and better at complex information processing, because of economical incentives, they seem bound to take more and more control of the information that powers our information societies. It seems critical that they be aligned with what we truly want them to do.

How short-term alignment can help all EA causes

In the sequel, I will particularly focus on the global impacts that the alignment of large-scale algorithms, like the YouTube algorithm, could have on some of the main EA causes.

Impact on public health

Much of healthcare is an information processing challenge. In particular, early diagnosis is often critical to efficient treatment. Enabling anomaly detection with non-intrusive sensors, like a mere picture with a phone, could enable great improvement in public health, especially if accompanied by adequate medical recommendations (which may be as simple as "you should see a doctor"). While exciting, and while there are major AI Safety challenges in this regard, I will not dwell on them since alignment is arguably not the bottleneck here.

On the other hand, much of public health has to do with daily habits, which are strongly influenced by recommender systems like the YouTube algorithm. Unfortunately, as long as they are unaligned, these recommender systems might encourage poor habits, like fast food consumption, taking the car for transport or binge-watching videos for hours without exercising.

More aligned recommender systems might instead encourage good habits, for instance in terms of hygiene habits, quality food recommendations and encouragements to do sports. By customizing adequately video recommendations, it might be even possible to motivate users to cook healthy food or practice sports that the users are more likely to enjoy.

A more tractable beneficial recommendation could be the promotion of evidenced-based medicine with large effect sizes, like vaccination. The World Health Organization reported 140,000 deaths by measles in 2018, for which a vaccine exists. Unfortunately, the anti-vaccination propaganda seems to have slowed down the systematic vaccination of children. Even if only 10% of deaths by measles could have been avoided by exposure to better information, this still represents tens of thousands of lives that could be saved by more aligned recommendation algorithms for measles alone.

As a more EA example, we can consider the case of the Malaria Consortium (or other GiveWell top charities). Much of philanthropy could become a lot more effective if donators were better informed. An aligned recommender could stress this fact, and recommend effective charities, as opposed to appealing ineffective ones. Thousands, if not hundreds of thousands of lives, could probably be saved by exposing potential donators to better quality information.

To conclude this section, I would like to stress the growing challenges with mental health. This will arguably be the ultimate frontier of healthcare, and a major cause for utilitarians. Unfortunately, fighting addiction, loneliness, depression and suicide seems nearly intractable through conventional channels. But data from social medias may provide formidable radically new means to diagnose these mental health conditions, as a Facebook study suggests. Interestingly, by aligning recommender algorithms, social medias could provide means to treat such conditions, for instance by recommending effective therapeutic contents. Indeed, studies showed that the mere exposure to the principles of cognitive behavioral therapy improved patients' conditions. Alternatively, algorithms could simply recommend contents that encourage viewers in need to see a psychiatrist.

Impact on animal suffering

Another important cause in EA is animal suffering. Here, again, it seems that information is critical. Most people seem to simply be unaware of the horror and scale of industrial farming. They also seem to neglect the impact of their daily consumptions on the incentive structure that motivates industrial farming.

But this is not all. Our food consumption habits arguably strongly depend on our social and informational environments. By fostering communities that, for instance, like to try different substitutes to meat, it seems more likely to convince a larger population to at least try such substitutes, which could reduce significantly our impacts on animal suffering, and on the environment.

(I once pointed this out to Ed Winters, a vegan YouTuber activist, who acknowledged that the number of views of his videos seems mostly controlled by the YouTube algorithm. Our discussion was recorded, and I guess it will be on his YouTube channel soon...)

It may also be possible to nudge biologists and aspiring biologists towards research on, say, meat substitutes. This seems critical to accelerate the development of such substitutes, but also of their price, which could then have a strong impact on animal suffering.

Finally, one of the great challenges of cultivated meat may be its social acceptance. There may be a lot of skepticism merely due to a misunderstanding, either of the nature of cultivated meat, or of the "naturalness" of conventional meat.

Impact on critical thinking

This leads us to what may be one of the most impactful consequences of aligned recommender systems. It might be possible to promote much more effectively critical thinking, at least within intellectual communities. Improving the way a large population thinks may be one of the most effective way to do a lot of good in a robust manner.

As a convinced Bayesian (with an upcoming book on the topic!), I feel that the scientific community (and others) would gain a lot by pondering at much greater length their epistemology, that is, how they came to believe what they believe, and what they ought to do to acquire more reliable beliefs. Unfortunately, most scientists seem to neglect the importance of thinking in bets. While they usually acknowledge themselves that they are poor in probability theory, they mostly seem fine with their inability to think probabilistically. When it comes to preparing ourselves for an uncertain future, this shortcoming seems very concerning. Arguably, this is why AI researchers are not sufficiently concerned by AGI risks.

An aligned algorithm could promote contents that stress the importance of thinking probabilistically, the fundamental principles to do so and the useful exercises to train our intuitions of probability, like the Bayes-up application.

Perhaps more importantly still, an aligned algorithm could be critical to promote intellectual honesty. Studies suggest that what's lacking in people's reasonings is often not information itself, but the ability to process information in an effective unbiased manner. Typically, more informed Republicans are also more likely to deny climate change. One hypothesis is that this is because they also gain the ability to better lie to themselves.

In this video (and probably her upcoming book), Julia Galef argues that the most effective way to combat our habit to lie to ourselves is to design incentives that reward intellectual honesty, changing our own minds, providing clear arguments, dismissing our own bullshits, and so on. While many of such rewards may be designed internally (by and for our own brains), because we are social creatures, most will likely need to come from our environments. Today, much of this environment and of the social rewards we receive come from social medias; and unfortunately, most people usually receive greater rewards (likes and retweets) by being offensive, sarcastic and caricatural.

An aligned algorithm could align our own rewards with what motivates intellectual honesty, by favoring connections with communities that value intellectual honesty, modesty and growth mindset. Thereby, the aligned algorithm may be effective in aligning ourselves with what we truly desire; not with our bullshits.

Impact on existential risks

What may be most counter-intuitive is that short-term alignment may be extremely useful for long-term AGI alignment (and other existential risks). In fact, to be honest, this is why I care so much about short-term alignment. I care about short-term alignment because I see this as the most effective way to increase the probability of achieving long-term AGI alignment.

An aligned recommender algorithm could typically promote video contents on long-term concerns. This would be critical to nudge people towards longer-term perspectives, and to combat our familiarity bias. This seems crucial as well to defend the respectability of long-term perspectives.

Perhaps more importantly, the great advantage of focusing on short-term alignment is that it makes it a lot easier to convince scientists, philosophers, but also engineers, managers and politicians to invest time and money on alignment. Yet, all such expertises (and still others) seem critical for robust alignment. We will likely need the formidable interdisciplinary collaboration of thousands, if not hundreds of thousands, of scholars and professionals to increase significantly the probability of long-term AGI alignment. So let's start recruiting them, one after the other, using arguments that they will find more compelling.

But this is not all. Since short-term alignment is arguably not completely different from long-term alignment, this research may be an excellent practice to better outline the cruxes and the pitfalls we will encounter for long-term alignment. In fact, some of the research on short-term alignment (see for instance this page on social choice) might be giving more reliable insights into long-term alignment than long-term alignment research itself, which can be argued to be sometimes too speculative.

Typically, it does not seem unlikely that long-term alignment will have to align algorithms of big (private or governmental) organizations, even though most people in these big organizations neglect the negative side effects of their algorithms.


I have sometimes faced a sort of contempt for short-term agendas within EA. I hope to have convinced you in this post that this contempt may have been highly counter-productive, because it might have led to the neglect of short-term AI alignment research. Yet, short-term AI alignment research seems critical to numerous EA causes, perhaps even including long-term AGI alignment.

To conclude, I would like to stress the fact that this post is the result of years of reflexions by a few of us, mostly based in Lausanne, Switzerland. Our reflexions culminated in the publication of a Robustly Beneficial AI book in French called Le Fabuleux Chantier, whose English translation is pending (feel free to contact us directly to see the current draft). But we have also explored other information dissemination formats, like the Robustly Beneficial Podcast (YouTube, iTunes, RSS) and the Robustly Beneficial Wiki.

In fact, after successfully initiating a research group at EPFL (with papers at ICML, NeurIPS,...), we are in the process of starting an AI Safety company, called Calicarpa, to exploit our published results and softwares (see for example this). Also, we have convinced a researcher in Morocco to tackle these questions, who's now building a team and looking for 3 postdocs to do so.