by Jacy1 min read30th Jun 20222 comments
New Comment
2 comments, sorted by Click to highlight new comments since: Today at 11:22 PM

Brief Thoughts on the Prioritization of Quality Risks

This is a brief shortform post to accompany "The Future Might Not Be So Great." These are just some scattered thoughts on the prioritization of quality risks not quite relevant enough to go in the post itself. Thanks to those who gave feedback on the draft of that post, particularly on this section.

People ask me to predict the future, when all I want to do is prevent it. Better yet, build it. Predicting the future is much too easy, anyway. You look at the people around you, the street you stand on, the visible air you breathe, and predict more of the same. To hell with more. I want better. ⸻ Ray Bradbury (1979)

I present a more detailed argument for the prioritization of quality risks (particularly moral circle expansion) over extinction risk reduction (particularly through certain sorts of AI research) in Anthis (2018), but here I will briefly note some thoughts on importance, tractability, and neglectedness. Two related EA Forum posts are “Cause Prioritization for Downside-Focused Value Systems” (Gloor 2018) and “Reducing Long-Term Risks from Malevolent Actors” (Althaus and Baumann 2020). Additionally, at this early stage of the longtermist movement, the top priorities for population and quality risk may largely intersect. Both issues suggest foundational research of topics such as the nature of AI control and likely trajectories of the long-term future, community-building of thoughtful do-gooders, and field-building of institutional infrastructure to use for steering the long-term future.

Importance

One important application of the EV of human expansion is to the “importance” of population and quality risks. Importance can be operationalized as the good done if the entire cause succeeded in solving its corresponding problem, such as the good done by eliminating or substantially reducing extinction risk, which is effectively zero if the EV of human expansion is zero and effectively negative if the EV of human expansion is negative.

The importance of quality risk reduction is clearer, in the sense that the difference in quality between possible futures is clearer than the difference in extinction and non-extinction, and larger, in the sense that while population risk entails only the range of zero-to-positive difference between human extinction and non-extinction (or population risk between zero population and some positive number of individuals) across quality risk entails the difference between the best quality humans could engender and the worst, across all possible population sizes. This is arguably a weakness of the framework because we could categorize the quality risk cause area as smaller in importance (say, an increase of 1 trillion utils, i.e., units of goodness), and it would tend to become more tractable as we narrow the category.

Tractability

The tractability difference between population and quality risk seems the least clear of the three criteria. My general approach is thinking through the most likely “theories of change” or paths to impact and assessing them step-by-step. For example, one commonly discussed extinction risk reduction path to impact is “agent foundations,” building mathematical frameworks and formally proving claims about the behavior of intelligent agents, which would then allow us to build advanced AI systems more likely to do what we tell them to do, and then using these frameworks to build AGI or persuading the builders of AGI to use them. Quality-risk-focused AI safety strategies may be more focused on the outer alignment problem, ensuring that an AI’s objective is aligned with the right values, rather than just the inner alignment problem, ensuring that all actions of the AI are aligned with the objective.[1] Also, we can influence quality by steering the “direction” or “speed” of the long-term future, approaches with potentially very different impact, hinging on factors such as the distribution of likely futures across value and likelihood (e.g., Anthis 2018c; Anthis and Paez 2021).

One argument that I often hear on the tractability of trajectory changes is that changes need to “stick” or “persist” over long periods. It is true that there needs to be a persistent change in the expected value (i.e., the random variable or time series regime of value in the future), but I frequently hear the claim that there needs to be a persistent change in the realization of that value. For example, if we successfully broker a peace deal between great powers, neither the peace deal itself nor any other particular change in the world has to persist in order for this to have high long-term impact. The series of values itself can have arbitrarily large variance, such as it being very likely that the peace deal is broken within a decade.

For a sort of change to be intractable, it needs to not just lack persistence, but to rubber band (i.e., create opposite-sign effects) back to its counterfactual. For example, if brokering a peace deal causes an equal and opposite reaction of anti-peace efforts, then that trajectory change is intractable. Moreover, we should not only consider rubber banding but dominoing (i.e., create same-sign effects), perhaps because of how this peace deal inspires other great powers to follow suit even if this particular deal is broken. There is much of this potential energy in the world waiting to be unlocked by thoughtful actors.

The tractability of trajectory change has been the subject of research at Sentience Institute, including our historical case studies and “Harris’ (2019)” How Tractable Is Changing the Course of History?”

Neglectedness

The neglectedness difference between population and quality risk seems the most clear. There are far more EAs and longtermists working explicitly on population risks than on quality risks (i.e., risks to the moral value of individuals in the long-term future). Two nuances for this claim are first that it may not be true for other relevant comparisons: For example, many people in the world are trying to change social institutions, such as different sides of the political spectrum trying to pull public opinion towards their end of the spectrum. This group seems much larger than people focused explicitly on extinction risks, and there are many other relevant reference classes. Second, it is not entirely clear whether extinction risk reduction and quality risk reduction face higher or lower returns to being less neglected (i.e., more crowded). It may be that so few people are focused on quality risks that marginal returns are actually lower than they would be if there were more people working on them (i.e., increasing returns).


  1. In my opinion, there are many different values involved in developing and deploying an AI system, so the distinction between inner and outer alignment is rarely precise in practice. Much of identifying and aligning with “good” or “correct” values can be described as outer alignment. In general, I think of AI value alignment as a long series of mechanisms from the causal factors that create human values (which themselves can be thought of as objective functions) to a tangled web of objectives in each human brain (e.g., values, desires, preferences) to a tangled web of social objectives aggregated across humans (e.g., voting, debates, parliaments, marketplaces) to a tangled web of objectives communicated from humans to machines (e.g., material values in game-playing AI, training data, training labels, architectures) to a tangled web of emergent objectives in the machines (e.g., parametric architectures in the neural net, (smoothed) sets of possible actions in domain, (smoothed) sets of possible actions out of domain) and finally to the machine actions (i.e., what it actually does in the world). We can reasonably refer to the alignment of any of these objects with any of the other objects in this long, tangled continuum of values. Two examples of outer alignment work that I have in mind here are Askell et al. (2021) “A General Language Assistant as a Laboratory for Alignment” and Hobbhan et al. (2022) “Reflection Mechanisms as an Alignment Target: A Survey.” ↩︎

The collapse of FTX may be a reason for you to update towards pessimism about the long-term future.

I see a lot of people's worldviews updating this week based on the collapse of FTX. One view I think people in EA may be neglecting to update towards is pessimism about the expected value of the long-term future. Doing good is hard. As Tolstoy wrote, "All happy families are alike; each unhappy family is unhappy in its own way." There is also Yudkowsky: "Value is fragile"; Burns: "The best-laid plans of mice and men often go awry"; von Moltke: "No plan survives contact with the enemy"; etc. The point is, there are many ways for unforeseen problems to arise and suffering to occur, usually many more ways than unforeseen successes. I think this an underlying reason why charity cost-effectiveness estimates from GiveWell and ACE went down over the years, as the optimistic case was clear but reasons to doubt took time to appreciate.

I think this update should be particularly strong if you think EA, or more generally the presence of capable value-optimizers (e.g., post-AGI stakeholders who will work hard to seed the universe with utopia), is one of the main reasons for optimism.

I think the strongest rebuttal to this claim is that the context of doing good in the long-term future may be very different from today's context, such that self-interest, myopia, cutting corners, etc. would either be solved (e.g., an AGI would notice and remove such biases) or merely lead to a reduction in the creation of positive value rather than an increase in negative value as occurred with the collapse of FTX (e.g., because a utopia-seeding expedition may collapse, but this is unlikely to involve substantial harm to current people like cryptocurrency investors). I don't think this rebuttal is much of a reduction in the strength of the evidence because long-term trajectories may depend a lot on initial values, and I think such problems could easily persist in superintelligent systems, and because there will be many routes to s-risks (e.g., because the failure of a utopia-seeding expedition may lead to dystopia-seeding rather than failing to spread at all).

Of course, if you were already disillusioned with EA or if this sort of moral catastrophe was already in line with your expectations, you may also not need to update in this direction.