Hide table of contents

by Ivan Vendrov and Jeremy Nixon

Disclaimer: views expressed here are solely our own and not those of our employers or any other organization.

Most recent conversations about the future focus on the point where technology surpasses human capability. But they overlook a much earlier point where technology exceeds human vulnerabilities.

The Problem, Center for Humane Technology.

The short-term, dopamine-driven feedback loops that we have created are destroying how society works.

Chamath Palihapitiya, former Vice President of user growth at Facebook.

The most popular recommender systems - the Facebook news feed, the YouTube homepage, Netflix, Twitter - are optimized for metrics that are easy to measure and improve, like number of clicks, time spent, number of daily active users, which are only weakly correlated with what users care about. One of the most powerful optimization processes in the world is being applied to increase these metrics, involving thousands of engineers, the most cutting-edge machine learning technology, and a significant fraction of global computing power.

The result is software that is extremely addictive, with a host of hard-to-measure side effects on users and society including harm to relationships, reduced cognitive capacity, and political radicalization.

Update 2021-10-18: As Rohin points out in a comment below the evidence for concrete harms directly attributing to recommender systems is quite weak and speculative; the main argument of the post does not strongly depend on the last paragraph.

In this post we argue that improving the alignment of recommender systems with user values is one of the best cause areas available to effective altruists, particularly those with computer science or product design skills.

We’ll start by explaining what we mean by recommender systems and their alignment. Then we’ll detail the strongest argument in favor of working on this cause, the likelihood that working on aligned recommender system will have positive flow-through effects on the broader problem of AGI alignment. We then conduct a (very speculative) cause prioritization analysis, and conclude with key points of remaining uncertainty as well as some concrete ways to contribute to the cause.

Cause Area Definition

Recommender Systems

By recommender systems we mean software that assists users in choosing between a large number of items, usually by narrowing the options down to a small set. Central examples include the Facebook news feed, the YouTube homepage, Netflix, Twitter, and Instagram. Less central examples are search engines, shopping sites, and personal assistant software which require more explicit user intent in the form of a query or constraints.

Aligning Recommender Systems

By aligning recommender systems we mean any work that leads widely used recommender systems to align better with user values. Central examples of better alignment would be recommender systems which

  • optimize more for the user’s extrapolated volition - not what users want to do in the moment, but what they would want to do if they had more information and more time to deliberate.
  • require less user effort to supervise for a given level of alignment. Recommender systems often have facilities for deep customization (for instance, it's possible to tell the Facebook News Feed to rank specific friends’ posts higher than others) but the cognitive overhead of creating and managing those preferences is high enough that almost nobody uses them.
  • reduce the risk of strong undesired effects on the user, such as seeing traumatizing or extremely psychologically manipulative content.

What interventions would best lead to these improvements? Prioritizing specific interventions is out of scope for this essay, but plausible candidates include:

  • Developing machine learning techniques that differentially make learning from higher-quality human feedback easier.
  • Designing user interfaces with higher bandwidth and fidelity of transmission for user preferences.
  • Increasing the incentives for tech companies to adopt algorithms, metrics, and interfaces that are more aligned. This could be done through individual choices (using more aligned systems, working for more aligned companies), or through media pressure or regulation.

Concrete Examples of How Recommender Systems Could be More Aligned

  • A recommender system that optimizes partly for a user’s desired emotional state, e.g. using affective computing to detect and filter out text that predictably generates anger.
  • A conversational recommender system that allows users to describe, in natural language, their long term goals such as “become more physically fit”, “get into college”, or “spend more time with friends”. It would then slightly adjust its recommendations to make achievement of the goal more likely, e.g. by showing more instructional or inspiring videos, or alerting more aggressively about good social events nearby.
  • Once a month, users are sent a summary of their usage patterns for the recommender system, such as the distribution of time they spent between politics, sports, entertainment, and educational content. Using a convenient interface, users are able to specify their ideal distribution of time, and the recommender system will guide the results to try to achieve that ideal.

Connection with AGI Alignment

Risk from the development of artificial intelligence is widely considered one of the most pressing global problems and positively shaping the development of AI is one of the most promising cause areas for effective altruists.

We argue that working on aligning modern recommender systems is likely to have large positive spillover effects on the bigger problem of AGI alignment. There are a number of common technical sub-problems whose solution seems likely to be helpful for both. But since recommender systems are so widely deployed, working on them will lead to much tighter feedback loops, allowing more rapid winnowing of the space of ideas and solutions, faster build-up of institutional knowledge and better-calibrated researcher intuitions. In addition, because of the massive economic and social benefits of increasing recommender system alignment, it’s reasonable to expect a snowball effect of increased funding and research interest after the first successes.

In the rest of this section we review these common technical sub-problems, and specific benefits from approaching them in the context of recommender systems. We then briefly consider ways in which working on recommender system alignment might actually hurt the cause of AGI alignment. But the most serious objection to our argument from an EA perspective is lack of neglectedness: recommender system alignment will happen anyway, so it’s differentially more important to work on other sub-problems of AGI alignment. We discuss this objection more below in the section on Cause Prioritization.

Overlapping Technical Subproblems

Robustness to Adversarial Manipulation

Robustness - ensuring ML systems never fail catastrophically even on unseen or adversarially selected inputs - is a critical subproblem of AGI safety. Many solutions have been proposed, including verification, adversarial training, and red teaming, but it’s unclear how to prioritize between these approaches.

Recommender systems like Facebook, Google Search, and Twitter are under constant adversarial attack by the most powerful organizations in the world, such as intelligence agencies trying to influence elections and companies doing SEO for their websites. These adversaries can conduct espionage, exploit zero-day vulnerabilities in hardware and software, and draw on resources far in excess of any realistic internal red team. There is no better test of robustness today than deploying an aligned recommender system at scale; trying to make such systems robust will yield a great deal of useful data and intuition for the larger problem of AGI robustness.

Understanding preferences and values from natural language

There are a few reasons to think that better natural language understanding differentially improves alignment for both recommender systems and AGI.

First, given how strongly the performance of deep learning systems scales with data size, it seems plausible that the sheer number of bits of human feedback ends up being a limiting factor in the alignment of most AI systems. Since language is the highest bandwidth supervisory signal (in bits/second) that individual humans can provide to an ML system, and linguistic ability is nearly universal, it is probably the cheapest and most plentiful form of human feedback.

More speculatively, natural language may have the advantage of quality as well as quantity - since humans seem to learn values at least partly through language in the form of stories, myths, holy books, and moral claims, natural language may be an unusually high-fidelity representation of human values and goals.

Semi-supervised learning from human feedback

Since it’s plausible that AGI alignment will be constrained by the amount of high-quality human feedback we can provide, a natural subproblem is making better use of the labels we get via semi-supervised or weakly supervised learning. Proposals along these lines include Paul Christiano’s Semi-supervised RL and what the authors of Concrete Problems in AI Safety call “Scalable Oversight”. One especially promising approach to the problem is active learning, where the AI helps select which examples need to be labelled.

What are the advantages for studying semi-supervised learning in the context of recommender systems? First, because these systems are used by millions of people, they have plentiful human feedback of varying quality, letting us test algorithms at much more realistic scales than gridworlds or MuJoCo. Second, because recommender systems are a large part of many people’s lives, we expect that the feedback we get would reflect more of the complexity of human values. It seems plausible that we will need qualitatively different approaches to achieve human goals like “become physically fit” or “spend more time with my friends” than for simple goals in deterministic environments.

Learning to communicate to humans

It seems very likely that both aligned recommender systems and aligned AGI require bidirectional communication between humans and AI systems, not just a one-way supervisory signal from humans to AI. In particular, safe AI systems may need to be interpretable - to provide accurate explanations of the choices they make. They may also need to be corrigible, which among other properties requires them to actively communicate with users to elicit and clarify their true preferences.

Recommender systems seem a fertile ground for exploring and evaluating different approaches for interpretability and bidirectional communication with humans, especially in the context of conversational search and recommenders.

Understanding Human Factors

In AI Safety Needs Social Scientists, Geoffrey Irving and Amanda Askell make the case that prioritizing technical approaches to AI safety requires deeper empirical understanding of human factors. The biases, weaknesses, strengths, introspection ability, information-processing and communication limitations of actual humans and human institutions seem critical to evaluating the most promising AGI alignment proposals such as debate, amplification, and recursive reward modeling.

We agree that running human studies is likely to be valuable for future AI safety research. But we think equally valuable information could be acquired by deploying and studying aligned recommender systems. Recommender systems maintain the largest datasets of actual real-world human decisions. They have billions of users, many of whom would be willing to use experimental new interfaces for fun or for the promise of better long-term outcomes. Recommender systems are also a fertile ground for testing new social and institutional schemes of human-AI collaboration. Just in the domain of reliably aggregating human judgments (likely a key subproblem for debate and amplification) they are constantly experimenting with new techniques, from collaborative filtering to various systems for eliciting and aggregating reviews, ratings, and votes. AI safety needs social scientists, definitely - but it also needs product designers, human-computer interaction researchers, and business development specialists.

Risks from Aligning Recommender Systems

In what ways could working on recommender system alignment make AI risks worse?

False Confidence

One plausible scenario is that widespread use of aligned recommender systems instills false confidence in the alignment of AI systems, increasing the likelihood and severity of a catastrophic treacherous turn, or a slow but unstoppable trend towards the elimination of human agency. Currently the public, media, and governments have a healthy skepticism towards AI systems, and there is a great deal of pushback against using AI systems even for fairly limited tasks like criminal sentencing, financial trading, and medical decisions. But if recommender systems remain the most influential AI systems on most people’s lives, and people come to view them as highly empathetic, transparent, robust, and beneficial, skepticism will wane and increasing decision-making power will be concentrated in AI hands. If the techniques developed for aligning recommender systems don’t scale - i.e. stop working after a certain threshold of AI capability - then we may have increased overall AI risk despite making great technical progress.

Dual Use

Aligned recommender systems may be a strongly dual-use technology, enabling companies to optimize more powerfully for objectives besides alignment, such as creating even more intensely addictive products. An optimization objective that allows you to turn down anger also allows you to turn up anger; ability to optimize for users’ long term goals implies ability to insinuate yourself deeply into users’ lives.

Greater control over these systems also creates dual use censorship concerns, where organizations could dampen the recommendation of content that is negative towards them.

Perils of Partial Alignment

Working on alignment of recommender systems might simply get us worse and harder to detect versions of misalignment. For example, many ideas can’t be effectively communicated without creating an emotion or negative side effect that a partially aligned system may look to suppress. Highly warranted emotional responses (e.g. anger at failures to plan for Hurricane Katrina, or in response to genocide) could be improperly dampened. Political positions that consistently create undesirable emotions would also be suppressed, which may or may not be better than the status quo of promoting political positions that generate outrage and fear.

Cause Prioritization Analysis

Predictions are hard, especially about the future, especially in the domain of economics and sociology. So we will describe a particular model of the world which we think is likely, and do our analysis assuming that model. It’s virtually certain that this model is wrong, and fairly likely (~30% confidence) that it is wrong in a way that dramatically undermines our analysis.

The key question any model of the problem needs to answer is - why aren’t recommender systems already aligned? There are a lot of possible contingent reasons, for instance that few people have thought about it, and the few who did were not in a position to work on it. But the efficient market hypothesis implies there isn’t a giant pool of economic value lying around for anyone to pick up. That means at least one of the following structural reasons is true:

  1. Aligned recommender systems aren’t very economically valuable.
  2. Aligning recommender systems is extremely difficult and expensive.
  3. A solution to the alignment problem is a public good in which we expect rational economic actors to underinvest.

Our model says it’s a combination of (2) and (3). Notice that Google didn’t invent or fund AlexNet, the breakthrough paper that popularized image classification with deep convolutional neural networks - but it was quick to invest immense resources once the breakthrough had been made. Similarly with Monsanto and CRISPR.

We think aligning recommender systems follows the same pattern - there are still research challenges that are too hard and risky for companies to invest significant resources in. The challenges seem interdisciplinary (involving insights from ML, human-computer interaction, product design, social science) which makes it harder to attract funding and academic interest. But there is a critical threshold at which the economic incentives towards wide adoption become overpowering. Once the evidence that aligned recommender systems are practical and profitable reaches a certain threshold, tech companies and venture capitalists will pour money and talent into the field.

If this model is roughly correct, aligned recommender systems are inevitable - the only question is, how much can we speed up their creation and wide adoption? More precisely, what is the relationship between additional resources invested now and the time it takes us to reach the critical threshold?

The most optimistic case we can imagine is analogous to AlexNet - a single good paper or prototype, representing about 1-3 person-years invested, manages a conceptual breakthrough and triggers a flood of interest that brings the time-to-threshold 5 years closer.

The most pessimistic case is that the time-to-threshold is not constrained at the margin by funding, talent or attention; perhaps sufficient resources are already invested across the various tech companies. In that case additional resources will be completely wasted.

Our median estimate is that a small research sub-field (involving ~10-30 people over 3-5 years) could bring the critical threshold 3 years closer.

Assuming this model is roughly right, we now apply the Scale-Neglectness-Solvability framework for cause prioritization (also known as ITN - Importance, Tractability, Neglectedness) as described by 80000 Hours.


The easiest problem to quantify is the direct effect on quality of life while consuming content from recommender systems. In 2017 Facebook users spent about 1 billion hours / day on the site; YouTube also claims more than a billion hours a day in 2019. Netflix in 2017 counted 140 million hours per day. Not all of this time is powered by recommender systems, but 2.4 billion user hours / day = 100 million user years / year is a reasonably conservative order of magnitude estimate.

What is the difference in experienced wellbeing in time on current recommender systems vs aligned recommender systems? 1% seems conservative, leading to 1 million QALYs lost every year simply from time spent on unaligned recommender systems.

It’s likely that the flow-through effects on the rest of users’ lives will be even greater, if the studies showing effects on mental health, cognitive function, relationships hold out, and if aligned recommender systems are able to significantly assist users in achieving their long term goals. Even more speculatively, if recommender systems are able to align with users’ extrapolated volition this may also have flow-through effects on social stability, wisdom, and long-termist attitudes in a way that helps mitigate existential risk.

It’s much harder to quantify the scale of the AGI alignment problem, insofar as aligning recommender systems helps solve it; we will defer to 80000 Hours’ estimate of 3 billion QALYs per year.


Culturally there’s a lot of awareness of the problems with unaligned recommender systems, so the amount of potential support to draw on seems high. Companies like Google and Facebook have announced initiatives around Digital Wellbeing and Time Well Spent, but it’s unclear how fundamental these changes are. There are some nonprofits like Center for Human Technology working on improving incentives for companies to adopt aligned recommenders, but none to our knowledge working on the technical problem itself.

How many full-time employees are dedicated to the problem? At the high end, we might count all ML, product, data analysis, and UI work on recommender systems as having some component of aligning with user values, in which case there is on the order of 1000s of people working on the problem globally. We estimate the number that are substantially engaging with the alignment problem (as opposed to improving user engagement) full-time is at least an order of magnitude lower, probably less than 100 people globally.


The direct problem - unaligned recommender systems making their users worse off than they could be - seems very solvable. There are many seemingly tractable research problems to pursue, lots of interest from the media and wider culture, and clear economic incentives for powerful actors to throw money at a clear and convincing technical research agenda. It seems like a doubling of direct effort (~100 more people) would likely solve a large fraction of the problem, perhaps all of it, within a few years.

For the AGI alignment problem, 80000 Hours’ estimate (last updated in March 2017) is that doubling the effort, which they estimate as $10M annually, would reduce AI risk by about 1%. Given the large degree of technical overlap, it seems plausible that solving aligned recommender systems would solve 1-10% of the whole AGI alignment problem, so I’ll estimate the flow-through reduction in AI risk at 0.01 - 0.1%.

Overall Importance

Ivan's Note: I have very low confidence that these numbers mean anything. In the spirit of If It's Worth Doing, It's Worth Doing With Made-Up Statistics, I’m computing them anyway. May Taleb have mercy on my soul.

Converting all the numbers above into the 80000 Hours logarithmic scoring system for problem importance, we get the following overall problem scores. We use [x,y] to denote an interval of values.

Problem Scale Neglectedness Solvability Total
Unaligned Recommenders 8 [6,8] [6,7] [20,23]
Risks from AI (flow-through) 15 [6,8] [2,3] [23,26]

The overall range is between 20 and 26, which is coincidentally about the range of the most urgent global issues as scored by 80000 Hours, with climate change at 20 and risks from artificial intelligence at 27.

Key Points of Uncertainty

A wise man once said to think of mathematical proofs not as a way to be confident in our theorems, but as a way to focus our doubts on the assumptions. In a similar spirit, we hope this essay serves to focus our uncertainties about this cause area on a few key questions:

  1. Could aligning weak AI systems such as recommenders be net harmful due to the false confidence it builds? Are there ways of mitigating this effect?
  2. When will aligned recommender systems emerge, if we don’t intervene? If the answer is “never”, why? Why might aligned recommender systems not emerge in our economic environment, despite their obvious utility for users?
  3. What fraction of the whole AGI alignment problem would robustly aligning recommender systems with roughly modern capabilities solve? we estimated 1-10%, but we can imagine worlds in which it’s 0.1% or 90%.
  4. What is the direct cost that unaligned recommender systems are imposing on people’s lives? With fairly conservative assumptions we estimated 1 million QALYs per year, but we could easily see it being two orders of magnitude more or less.

How You Can Contribute

Machine learning researchers, software engineers, data scientists, policymakers, and others can immediately contribute to the goal of aligning recommender systems.

  • Much of the research needed to enable effective control of recommenders has not been done. Researchers in academia and especially in industry are in a position to ask and answer questions like:
    • What side effects are our recommendation engines having?
    • How can we more effectively detect harmful side effects?
    • What effect do different optimization metrics (e.g. number of likes or comments, time spent) have on harmful side effects? Are some substantially more aligned with collective well-being than others?
    • Can we design optimization objectives that do what we want?
  • The implementation of research tends to be done by software engineers. Being a member of a team stewarding these recommender systems will give you a concrete understanding of how the system is implemented, what its limitations and knobs for adjustment are, and what ideas can practically be brought to bear on the system.
  • Data scientists can investigate questions like ‘how does this user’s behavior change as a result of having seen this recommendation?’ and ‘what trajectories in topic / video space exist, where we see large clusters of users undergoing the same transition in their watch patterns?’. This is an especially critical question for children and other vulnerable users.
  • Policymakers are currently considering taking dramatic steps to reduce the negative impact of technology on the population. Tools developed by researchers working on this cause area can help. Many of those tools will make it feasible to check what impact is being had on the population, and will introduce methods that guard against specific and quantifiable notions of excessive harm.
Sorted by Click to highlight new comments since:

The result is software that is extremely addictive, with a host of hard-to-measure side effects on users and society including harm to relationships, reduced cognitive capacity, and political radicalization.

As far as I can tell, this is all the evidence given in this post that there is in fact a problem. Two of the four links are news articles, which I ignore on the principle that news articles are roughly uncorrelated with the truth. (On radicalization I've seen specific arguments arguing against the claim.) One seems to be a paper studying what users believe about the Facebook algorithm (I don't see any connection to "harm to relationships", if anything, the paper talks about how people use Facebook to maintain relationships). The last one is a paper whose abstract does in fact talk about phones reducing cognitive capacity, but (a) most papers are garbage, (b) beware the man of one study, and (c) why blame recommender systems for that, when it could just as easily be (say) email that's the problem?

Overall I feel pretty unconvinced that there even is a major problem with recommender systems. (I'm not convinced that there isn't a problem either.)

You could argue that since recommender systems have huge scale, any changes you make will be impactful, regardless of whether there is a problem or not. However, if there isn't a clear problem that you are trying to fix, I think you are going to have huge sign uncertainty on the impact of any given change, so the EV seems pretty low.


The main argument of this post seems to be that this cause area would have spillover effects into AGI alignment, so maybe I'm being unfair by focusing on whether or not there's a problem. But if that's your primary motivation, I think you should just do whatever seems best to address AGI alignment, which I expect won't be to work on recommender systems. (Note that the skills needed for recommender alignment are also needed for some flavors of AGI alignment research, so personal fit won't usually change the calculus much.)


Before you point me to Tristan Harris, I've engaged with (some of) those arguments too, see my thoughts here.

Have you considered developing these comments into a proper EA Forum post?

Unfortunately I don't really have the time to do this well, and I think it would be a pretty bad post if I wrote the version that would be ~2 hours of effort or less.

The next Alignment Newsletter will include two articles on recommender systems that mostly disagree with the "recommender systems are driving polarization" position; you might be interested in those. (In fact, I did this shallow dive because I wanted to make sure I wasn't neglecting arguments pointing in the opposite direction.)

EDIT: To be clear, I'd be excited for someone else to develop this into a post. The majority of my relevant thoughts are in the comments I already wrote, which anyone should feel free to use :)

Thanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I've come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it's not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind.

If there isn't a clear problem you're going to have huge sign uncertainty on the impact of any given change"

This is closer to a crux. I think there are a number of concrete changes like optimizing for the user's deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there's no "problem" attributable to recommender systems per se. Positive both in direct effects and in flow-through effects in learning what kinds of human-AI interaction protocols lead to good outcomes.

From your Alignment Forum comment,

The core feature of AI alignment is that the AI system deliberately and intentionally does things, and creates plans in new situations that you hadn't seen before, which is not the case with recommender systems.

This seems like the real crux. I'm not sure how exactly you define "deliberately and intentionally" but recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective. Moreover they are deployed in a dynamic world and so encounter new situations habitually (unlike the toy environments more commonly used for AI Alignment research).

I think there are a number of concrete changes like optimizing for the user's deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there's no "problem" attributable to recommender systems per se.

Some illustrative hypotheticals of how these could go poorly:

  • To optimize for deliberative retrospective judgment, you collect thousands of examples of such judgments, the most that is financially feasible. You train a reward model based on these examples and use that as your RL reward signal. Unfortunately this wasn't enough data and your reward model places high reward on very negative things it hasn't seen training data on (e.g. perhaps it strongly recommends posts encouraging people to commit suicide if they want to because it thinks encouraging people to do things they want is good).
  • Same situation, except the problem is that the examples you collected weren't representative of everyone who uses the recommender system, and so now the recommender system is nearly unusable for such people (e.g. the recommender system pushes away from "mindless fun", hurting the people who wanted mindless fun)
  • Same situation, except people are really bad at deliberative retrospective judgments. E.g. they take out everything that was "unvirtuous fun", and due to the lack of fun people stop using the thing altogether. (Whether this is good or bad depends on whether the technology is net positive or net negative, but I tend to think this would be bad. Anyone I know who isn't hyper-focused on productivity, i.e. most of the people in the world, seems to either like or be neutral about these technologies.)
  • You create a natural language interface. People use it to search for evidence that the outgroup is terrible (not deliberately; they think "wow, X is so bad, they do Y, I bet I could find tons of examples of that" and then they do, never seeking evidence in the other direction). Polarization increases dramatically, much more so than with the previous recommendation algorithm.
  • You expose the internals of recommender systems. Lots of people find gender biases and so on and PR is terrible. Company is forced to ditch their recommender system and instead have nothing (since any algorithm will be biased according to some metric, see the impossibility theorems). Everyone suffers.

I'm not saying that it's impossible to do positive things. I'm more saying:

  • If you aren't trying to solve a specific problem, it's really hard and doesn't seem obviously high-EV, especially due to sign uncertainty
  • It's not clear why you should do better than the people at the companies -- why is altruism important? If there's a problem in the form of a deviation between a company's incentives and what is actually good that has actual consequences in the world, then I can see why altruism has an advantage, but in the absence of such a problem I don't see why altruists should expect to do better.

recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective.

How do you know that? In most cases of RL I know of, it seems better to model them as repeating things that worked well in the past. Only the largest uses of RL (AlphaZero, OpenAI Five, AlphaStar) seem like they might be exceptions.

I'm curious if approaches like those I describe here (end of the article; building on this which uses mini-publics) for determining rec system policy  help address the concerns of your first 3 bullets. I should probably do a write-up or modification specifically for the EA audience (this is for a policy audience), but it ideally gets some of the point across re. how to do "deliberative retrospective judgment" in a way that is more likely to avoid problematic outcomes (I will also be publishing an expanded version that has much more sourcing).

These approaches could help! I don't have strong reason to believe that they will, nor do I have strong reason to believe that they won't, and I also don't have strong reason to believe that the existing system is particularly problematic. I am just generally very uncertain and am mostly saying that other people should also be uncertain (or should explain why they are more confident).

Re: deliberative retrospective judgments as a solution: I assume you are going to be predicting what the deliberative retrospective judgment is in most cases (otherwise it would be far too expensive); it is unclear how easy it will be to do these sorts of predictions. Bullet points 1 and 2 were possibilities where the prediction was hard; I didn't see on a quick skim why you think they wouldn't happen. I agree "bridging divides" probably avoids bullet point 3, but I could easily tell different just-so stories where "bridging divides" is a bad choice (e.g. current affairs / news / politics almost always leads to divides, and so is no longer recommended; the population becomes extremely ignorant as a result worsening political dynamics).

I work at Netflix on the recommender. It's interesting to read this abstract article about something that's very concrete for me.

For example, the article asks, "The key question any model of the problem needs to answer is - why aren’t recommender systems already aligned."

Despite working on a recommender system, I genuinely don't know what this means. How does one go about measuring how much a recommender is aligned with user interests? Like, I guarantee 100% that people would rather have the recommendations given by Netflix and YouTube than a uniform random distribution. So in that basic sense, I think we are already aligned. It's really not obvious to me that Netflix and YouTube are doing anything wrong. I'm not really sure how to go about measuring alignment, and without a measurement, I don't know how to tell whether we're making progress toward fixing it.

My two cents.

I'm not sure about users definitely preferring the existing recommendations to random ones - I actually have been trying to turn off YouTube recommendations because they make me spend more time on YouTube than I want. Meanwhile other recommendation systems send me news that is worse on average than the rest of the news I consume (from different channels). So in some cases at least, we could use a very minimal standard of: a system is aligned if the user better off because the recommendation system exists at all.

This is a pretty blunt metric, and probably we want something more nuanced, but at least to start off with it'd be interesting to think about how to improve whichever recommender systems are currently not aligned.

Thanks for sharing your perspective. I find it really helpful to hear reactions from practitioners.

I just added a comment above which aims to provide a potential answer to this question—that you can use "approaches like those I describe here (end of the article; building on this which uses mini-publics)".  This may not directly get you something to measure, but it may be able to elicit the values needed for defining an objective function.

You provide the example of this very low bar:

I guarantee 100% that people would rather have the recommendations given by Netflix and YouTube than a uniform random distribution. So in that basic sense, I think we are already aligned.

The goal here would be to scope out what a much higher bar might look like. 

Thanks for raising this. I appreciate specification is hard, but I think there's a broader lens on 'user interests' with more acknowledgement for the behavioural side.

What users want in one moment isn't always the same as what they might endorse when in a less slippery behavioural setting or upon reflection. You might say this is a human not a technical problem. True, but we can design systems to that help us optimize for our long-term goals and that is a different task to optimizing for what we click on in a given moment. Sure it's much harder to specify, but I think user research can be done. Thinking about the user more holistically could open up new innovations too. Imagine a person has watched several videos in a row about weight loss and rather than keeping them on the couch longer, it learns to respond with good nudges: prompts them to get up and go for a run, reminds them of their personal goals for the day (because it has such integrations), messages your running buddy, closes itself (and has nice configurable settings with good defaults),  or advertises joining a local running group (right now the local running group would not afford the advert, but in a world where recommenders weight ad quality to somehow include long-term preferences of the user, that might be different). 

I understand the measurement frustration issue, the task is harder than just optimising for views and clicks though (not just technically, also to align to the company's bottom line). However, I do think little steps towards better specification can help, and I'd love to read future user research on it at Netflix.

Sorry I think I didn't address the measurement issue very well, and assumed your notion of user interests meant simply optimizing for views, when maybe it isn't. I still think through user research you can learn to develop good measures. For example: surveys, cohort tests (e.g. if you discount ratings over time within a viewing session, to down weight lower agency views, do you see changes such as users searching more instead of just letting autoplay), is there a relationship between how much a user feels netflix is improving their life (in a survey) and how much they are sucked in by autoplay? Learning these higher order behavioural indicators can help give users a better long-term experience, if that's what the company optimizes for.

Absolutely. A few comments:

  • Stated preference (uplifting documentaries) and revealed preference (reality TV crime shows) are different
  • Asking people for their preference is quite difficult - only a small fraction of Netflix users give star ratings or thumb ratings. In general, users like using software to achieve their immediate goals. It's tough to get them to invest time and skill into making it better in the future. For most people, each app is a tiny tiny slice of their day and they don't want to do work to optimize anything. Customization and user controls often fail because no one uses them.
  • If serving recommendations according to stated preferences causes people to unsubscribe more, how should we interpret that? That their true preference is to not be subscribed to Netflix? It's unclear.
  • In any case, Netflix is financially incentivized to optimize for subscriptions, not viewing. So if people pay for what they want, then Netflix ought to be aligned with what they want. Netflix is only misaligned with what people want if people's own spending is not aligned with what they want (theoretically).
The short-term, dopamine-driven feedback loops that we have created are destroying how society works.

That's a very strong statement, and I don't think it's warranted.

My understanding is that research suggests that the link between digital technology use and reduced well-being is exaggerated. See, e.g. this paper:

The association we find between digital technology use and adolescent well-being is negative but small, explaining at most 0.4% of the variation in well-being. Taking the broader context of the data into account suggests that these effects are too small to warrant policy change.

Similarly, it's been suggested that social media use drives polarization, but the evidence for that is unclear (with some studies finding evidence against).

"Recommender systems" is an extremely broad category, and I think this discussion would benefit from being more concrete, and maybe also from narrowing it down. It's not obvious to me how strong the link between improving on standard recommender systems and AGI alignment is, for instance. It may be better to choose one of these tasks as the primary focus initially.

With regards to standard recommender systems, many of those aren't directly focused on increasing well-being, but rather on, e.g. increasing epistemic standards, preventing fraud, etc. Those things may of course indirectly increase well-being, but I think it may be better to think in terms of proximate aims.

There's been quite a lot written on better recommender or reputation systems, and people have had high hopes (see, e.g. the book The Reputation Society). While some recommendation systems are very successful (e.g. Google) it also seems to me that many of these hopes haven't materialized.

A new article (referring to this new paper) claims that New York Times' claims about algorithmic radicalization are flawed (the OP links to a NYT article on such issues):

By looking at recommendation flows between various political orientations and subcultures, we show how YouTube’s late 2019 algorithm is not a radicalization pipeline, but in fact
Removes almost all recommendations for conspiracy theorists, provocateurs and white Identitarians
Benefits mainstream partisan channels such as Fox News and Last Week Tonight
Disadvantages almost everyone else

This is fantastic. I don't have high confidence in the numbers you've put forth (for example, it's hard to compare QALYs from "more entertainment"/"better articles" to QALYs from "no malaria"), but I love the way this post was put together:

  • Lots of citations (to a stunning variety of sources; it feels like you've been thinking about these questions for a long time)
  • Careful analysis of what could go wrong
  • Willingness to use numbers, even if they are made up

Even putting aside flow-through effects on alignment, I think that "microtime" is important. Even saving people a few minutes of wasted time each day can be hugely beneficial at scale (especially if that time is replaced with something that fits a user's extrapolated volition). Our lives are made up of the way we spend each hour, and we could certainly be having better hours.

In a world where this is not a promising cause area, even if the risks turn out not to be a concern, I think the most likely cause of "failure" would be something like regulatory capture, where people enter large tech companies hoping to better their algorithms but get swept up by existing incentives. I'd guess that many people who already work at FANG companies entered with the goal of improving users' lives and slowly drifted away -- or came to believe that metrics companies now use are in fact improving users' lives to a "sufficient" extent.

(If you spend all day at Netflix, and come to think of TV as a golden wonderland of possibility, why not work to get people spending as much time as possible watching TV?)

It's possible that these employees still generally feel bad about optimizing for bad metrics, but however they feel, it hasn't yet added up to deliberative anti-addictive properties for any of the biggest tech companies (as far as I'm aware). It would be nice to see evidence that people have successfully advocated for these changes from the inside (Mark Zuckerberg has recently made some noises about trying to improve the situation on Facebook, but I'm not sure how much of that is due to pressure from inside Facebook vs. external pressure or his own feelings).

...including harm to relationships, reduced cognitive capacity, and political radicalization.

The first two links are identical; was that your intention?

Recommender systems often have facilities for deep customization (for instance, it's possible to tell the Facebook News Feed to rank specific friends’ posts higher than others) but the cognitive overhead of creating and managing those preferences is high enough that almost nobody uses them.

In addition to work on improved automated recommendation systems, it seems like there should be valuable projects out there that focus on getting more people to exercise their existing control over present-day systems (e.g. an app that gamifies changing your newsfeed settings, apps that let you more easily set limits for how you'll spend time online).


  • FB Purity claims to have over 450,000 users; even if only 100,000 are currently blocking their own newsfeeds, that probably represents ~10,000,000 hours each year spent somewhere other than Facebook.
  • StayFocusd has saved me, personally, thousands of hours on things my extrapolated volition would have regretted.

I think you're underrating the risk of capabilities acceleration.

The first two links are identical; was that your intention?

Thanks for the catch - fixed.

If we want to maximize flow-through effects to AI Alignment, we might want to deliberately steer the approach adopted for aligned recommender systems to one that is also designed to scale to more difficulty problems/more advanced AI systems (like Iterated Amplification). Having an idea become standard in the world of recommender systems could significantly increase the amount of non-saftey researcher effort put towards that idea. Solving the problem a bit earlier with a less scalable approach could close off this opportunity.

I suspect that principal–agent problems are the biggest single obstacle to alignment. That leads me to suspect it's less tractable than you indicate.

I'm interested in what happened with Netflix. Ten years ago their recommendation system seemed focused almost exclusively on maximizing user ratings of movies. That dramatically improved my ability to find good movies.

Yet I didn't notice many people paying attention to those benefits. Netflix has since then shifted toward less aligned metrics. I'm less satisfied with Netflix now, but I'm unclear what other users think of the changes.

I reviewed this post four months ago, and I continue to stand by that review.

Five months later, do you see increased attention with EAs on this cause area?

re: How You Can Contribute

Center for Humane Technology is hiring for 5 positions: Managing Director, Head of Humane Design Programs, Manager of Culture & Talent, Head of Policy, Research Intelligence Manager.

Great post! It's very nice to see this problem being put forward. Here are a few remarks.

It seems to be that the scale of the problem may be underestimated by the post. Two statistics that suggest this are the fact that there are now more views on YouTube than searches on Google, and that 70% of them are YouTube recommendation. Meanwhile, psychology stresses biases like availability bias or mere exposure effects that suggest that YouTube strongly influences what people think, want and do. Here are a few links about this:




Also, I would argue that the neglectedness of the problem may be underestimated by the post. I have personally talked to many people from different areas, social sciences, healthcare, education, environmentalists, medias, YouTubers and AI Safety researchers. After ~30-minute discussions, essentially all of them acknowledged that they had overlooked the importance of aligning recommender systems. For instance, one problem is known as "mute news", i.e. the fact that important problems are overshadowed by what's put forward by recommender systems. I'd argue that the problem of mute news is neglected.

Having said this, it seems to me that the tractability of the problem may be overestimated. For one thing, aligning recommender systems is particularly hard because they act in so-called "Byzantine" environments. Namely, any small modification of recommender systems is systematically followed by SEO-optimization-like strategies from content creators. This is discussed in the following excellent series of videos with interviews of Facebook and Twitter employees:


I would argue that aligning recommender systems may even be harder than aligning AGI, because we need to get the objective function right, even though we do not have AGI to help us do so. But as such, I'd argue that this is a perfect practice playground for alignment research, advocacy and policing. In particular, I'd argue that we too often view AGI as that system that *we* get to design. But what seems just as hard is to get leading AI companies to agree to align it.

I discussed this in a bit more length in this conference here (https://www.youtube.com/watch?v=sivsXJ1L1pg), and in this paper: https://arxiv.org/abs/1809.01036.

I discussed this in a bit more length in this conference here (https://www.youtube.com/watch?v=sivsXJ1L1pg), and in this paper: https://arxiv.org/abs/1809.01036.


The two links in this paragraph are broken; I'm interested in taking a look, are the resources still available?

If you remove the parentheses and comma from the first link, and the final period from the second, they work.

This is not a research problem, it's a coordination / political problem. The algorithms are already doing what the creators intended, which it to maximise engagement.

While fully understanding a user's preferences and values requires more research, it seems like there are simpler things that could be done by the existing recommender systems that would be a win for users, ie. facebook having a "turn off inflammatory political news" switch (or a list of 5-10 similar switches), where current knowledge would suffice to train a classification system.

It could be the case that this is bottlenecked by the incentives of current companies, in that there isn't a good revenue model for recommender systems other than advertising, and advertising creates the perverse incentive to keep users on your system as long as possible. Or it might be the case that most recommender systems are effectively monopolies on their respective content, and users will choose an aligned system over an unaligned one if options are available, but otherwise a monopoly faces no pressure to align their system.

In these cases, the bottleneck might be "start and scale one or more new organizations that do aligned recommender systems using current knowledge" rather than "do more research on how to produce more aligned recommender systems".

My mental model of why Facebook doesn't have "turn off inflammatory political news" and similar switches is because 99% of their users never toggle any such switches, so the feature won't affect any of the metrics they track, so no engineer or product manager has an incentive to add it. Why won't users toggle the switches? Part of it is laziness; but mostly I think users don't trust that the system will faithfully give them what they want based on a single short description like "inflammatory political news" -what if they miss out on an important national story? What if a close friend shares a story with them and they don't see it? What if their favorite comedian gets classified as inflammatory and filtered out?

As additional evidence that we're more bottlenecked by research than by incentives, consider Twitter's call for research to measure the "health" of Twitter conversations, and Facebook's decision to demote news content. I believe if you gave most companies a robust and well-validated metric (analogous to differential privacy) for alignment with user value, they would start optimizing for it even at the cost of some short term growth/revenue.

The monopoly point is interesting. I don't think existing recommender systems are well modelled as monopolies; they certainly behave as if they are in a life-and-death struggle with each other, probably because their fundamental product is "ways to occupy your time" and that market is extremely competitive. But a monopoly might actually be better because it wouldn't have the current race to the bottom in pursuit of monetisable eyeballs.

Appreciate that point that they are competing for time (as I was only thinking of monopolies over content).

If the reason it isn't used is that users don't "trust that the system will give what they want given a single short description", then part of the research agenda for aligned recommender systems is not just producing systems that are aligned, but systems where their users have a greater degree of justified trust that they are aligned (placing more emphasis on the user's experience of interacting with the system). Some of this research could potentially take place with existing classification-based filters.

Agreed that's an important distinction. I just assumed that if you make an aligned system, it will become trusted by users, but that's not at all obvious.

Good post!

I have a hunch that a big part of the issue here is institutional momentum around maximizing key performance indicators such as daily active users, time spent on platform, etc. Perhaps it will be important to persuade decisionmakers that although optimizing for these metrics helps the bottom line in the short run, in the long run optimizing these to the exclusion of all else hurts the brand, increases the probability of regulatory action or negative "black swan" type events, and risks having the users abandon the product. (I understand that the longer a culture gets exposed to alcohol, the greater the degree it develops "cultural antibodies" to the negative effects of alcohol which allow it to mitigate the harms... decisionmakers should worry that if users don't endorse the time they spend with the product, this hurts the long-term viability of the platform; imagine the formation of a group like Alcoholics Anonymous but for social media, for instance.) I think it'd be good if decisionmakers also started optimizing for key performance indicators like whether users think the product is a benefit to their life personally, whether the product makes society healthier/better off, etc. Or even more specific stuff, like whether users who engage in disagreements tend to come to a consensus vs walking away even angrier than when they started.

With regard to risks, here are some thoughts of mine related to scenarios in which users self-select in their use of these tools. I think maybe what I describe in this comment has already happened though.

The author, commenters and readers of this post may be interested in this new paper by the CMA, 'Algorithms: How they can reduce competition and harm consumers'. The programme of work being launched includes analyses of recommender systems.

From Optimizing Engagement to Measuring Value is interesting and somewhat related:

Most recommendation engines today are based on predicting user engagement, e.g. predicting whether a user will click on an item or not. However, there is potentially a large gap between engagement signals and a desired notion of "value" that is worth optimizing for. We use the framework of measurement theory to (a) confront the designer with a normative question about what the designer values, (b) provide a general latent variable model approach that can be used to operationalize the target construct and directly optimize for it, and (c) guide the designer in evaluating and revising their operationalization. We implement our approach on the Twitter platform on millions of users. In line with established approaches to assessing the validity of measurements, we perform a qualitative evaluation of how well our model captures a desired notion of "value".

Could you say a little bit about how this approach compares to Christiano's Iterated Amplification?

To my mind they are fully complementary: Iterated Amplification is a general scheme for AI alignment, whereas this post describes an application area where we could use and learn more about various alignment schemes. I personally think using amplification for aligning recommender systems is very much worth trying. It would have great direct positive effects if it worked, and the experiment would shed light on the viability of the scheme as a whole.

Thanks. I guess I'm fuzzy on what your actual research proposal is.

Are you proposing to implement an Iterated Amplification approach on existing recommender systems?

Or are you more agnostic about specific implementations? ("Hey, better alignment of recommender systems seems important, but we don't yet know what to do about that specifically.")

Definitely the latter. Though I would frame it more optimistically as "better alignment of recommender systems seems important, there's a lot of plausible solutions out there, let's prioritize them and try out the few most promising ones". Actually doing that prioritization was out of scope for this post but definitely something we want to do - and are looking for collaborators on.

It’s likely that the flow-through effects on the rest of users’ lives will be even greater, if the studies showing effects on mental health, cognitive function, relationships hold out, and if aligned recommender systems are able to significantly assist users in achieving their long term goals. Even more speculatively, if recommender systems are able to align with users’ extrapolated volition this may also have flow-through effects on social stability, wisdom, and long-termist attitudes in a way that helps mitigate existential risk.

I am very interested in these sorts of positive effects of aligned recommender systems. In addition to improving people's effectiveness at large, I think they can be a valuable tool for improving individual/organizational decision making and personal productivity which are EA focus areas.

I think that building a collaborative search engine is a tractable starting point and has the potential to improve information discovery within EA and in general—if anyone is interested in collaborating on this, please get in touch!

This post was awarded an EA Forum Prize; see the prize announcement for more details.

My notes on what I liked about the post, from the announcement:

Every cause area starts somewhere. And while I’m not sure whether improving YouTube recommendations or fixing the News Feed will become a major focus of EA research, I commend Ivan Vendrov and Jeremy Nixon for crafting a coherent vision for how we might approach the problem of “aligning recommender systems.”

Alongside a straightforward discussion of the scale of these systems' influence (they shape hours of daily experience for hundreds of millions of people), the authors present a fascinating argument that certain features of these commercial products map onto longstanding problems in AI alignment. This broad scope seems appropriate for an introduction to a new cause — I’m happy to see authors make the most comprehensive case they can, since further research can always moderate their conclusions.

(It helps that Vendrov and Nixon freely admit the low confidence levels around their specific numbers and discuss the risks behind this work — they want to inform, not just persuade.)

Finally, I appreciated the next-to-last section (“Key points of uncertainty”), which leaves a set of open questions for other authors to tackle and creates convenient cruxes for debate.

There's a growing area of research on fair ranking algorithms. Where the problem you've scoped out focuses on the utility of end users, fairness in ranking aims to align recommender systems with the "utility" of the items being recommended (e.g. job applicants) and the long-term viability of the platform.

Seemingly Useful Viewpoints

The expert DiResta said (in the YouTube video of interviews with Twitter and Facebook employees that Misha posted) that overcoming the division that is created by online bad actors will require us addressing our own natures because online bad actors will never be elimanted but merely managed. This struck me as important and it is applicable to the problems that recommender algorithms may exacerbate. If I remember correctly, in the audiobook The Alignment Problem, Brian Christian's way of looking at it was that the biases that AI systems spit out can hopefully cause us to look introspectively at ourselves and how we have committed so many injustices throughout history. 

Neil deGrasse Tyson once remarked that a recommender algorithm can prevent him from exploring content that he would have explored naturally. His remark seems to hint at or point somewhere in the direction of a dangerous slope recommender algorithms could potentially bring us down.

The Metrics for Recommender Algorithms

Somewhat along the lines of what Neil said, a recommender algorithm might devoid us of some important internal quality while building out empty, superficial qualities. The recommender algorithms that I am most familiar with (like the one on Netflix and for feeds on Google and Twitter) are based on maximizing the use of our eyes on the screen and clicks. While our eyes are important, neuroscience tells us that sight is not a perfect representation of reality, and even ancient philosophers took what they saw with a grain of salt. As for our clicks, to me they seem to be mostly associated with our curiosity to explore, to see what is in the next article, video, etc.


Ted Bundy said that pornography made him become who he was. I have no opinion on whether this is true. However, if it is true, it means that a recommender algorithm (when applied to pornography) could potentially make a person become a serial killer faster than they would have otherwise or pave the opportunity (for those who are slightly vulnerable of becoming one but have self control) for them to become one at all by exploiting their vulnerability.


A recommender algorithm can shut off periodically. The person can be notified when it is shut off and when it is on. When it is off, maybe things will appear based on how recent they are or something. This way a person can see the difference in their quality of life and content consumption with and without the recommender algorithm and decide whether the algorithm has any benefit. It is possible over time that the person will view the algorithm as a lenses into possibly their own bad habits or into the dark side of human history. It is possible that having the algorithm on sometimes, and off at others,  can reduce the capacity of the algorithm to become insidious in the person's life and make the interaction with the algorithm a more conscious interaction on behalf of the person; the algorithm may have some dark aspects and results, but the person can constantly be aware of these results and perhaps see it as a reflection of humanity's own faults.

Somewhat related: Prediction markets for content curation DAOs ( https://ethresear.ch/t/prediction-markets-for-content-curation-daos/1312/4 )

Thank you for writing this; the forum is richer for having people investigate areas of work and analyse them in this way.

I don't believe that replaceability has been sufficiently considered here (I read this quickly, so sorry if I missed it). By encouraging people with the relevant skills to work on this, what do we achieve? Would they replace someone who otherwise wouldn't get that there is a problem with recommender systems not being aligned with what society wants?

If anything, it seems that this issue has had a huge amount of attention, and it's likely that all those who are already working in this area are very conscious of this issue.

I would change my mind if I heard that someone surveyed several people working in this area and many of them said that they were ignorant of or overly pollyanna-ish about the risks to wellbeing from these systems.

Curated and popular this week
Relevant opportunities