Why I prioritize moral circle expansion over artificial intelligence alignment

This blog post is written for a very specific audience: people involved in the effective altruism community who are familiar with cause prioritization and arguments for the overwhelming importance of the far future. It might read as strange and confusing to people without that domain knowledge. Please consider reading the articles linked in the Context section to get your bearings. This post is also very long, but the sections are fairly independent as each covers a fairly distinct consideration.

Many thanks for helpful feedback to Jo Anderson, Tobias Baumann, Jesse Clifton, Max Daniel, Michael Dickens, Persis Eskander, Daniel Filan, Kieran Greig, Zach Groff, Amy Halpern-Laff, Jamie Harris, Josh Jacobson, Gregory Lewis, Caspar Oesterheld, Carl Shulman, Gina Stuessy, Brian Tomasik, Johannes Treutlein, Magnus Vinding, Ben West, and Kelly Witwicki. I also forwarded Ben Todd and Rob Wiblin a small section of the draft that discusses an 80,000 Hours article.


When people in the effective altruism (EA) community have worked to affect the far future, they’ve typically focused on reducing extinction risk, especially risks associated with superintelligence or general artificial intelligence alignment (AIA). I agree with the arguments for the far future being extremely important in our EA decisions, but I tentatively favor improving the quality of the far future by expanding humanity’s moral circle more than increasing the likelihood of the far future or humanity’s continued existence by reducing AIA-based extinction risk because: (1) the far future seems to not be very good in expectation, and there’s a significant likelihood of it being very bad, and (2) moral circle expansion seems highly neglected both in EA and in society at large. Also, I think considerations of bias are very important here, given how necessarily intuitive and subjective judgment calls make up the bulk of differences in opinion on far future cause prioritization. I find the argument in favor of AIA that technical research might be more tractable than social change to be the most compelling counterargument to my position.


This post largely aggregates existing content on the topic, rather than making original arguments. I offer my views, mostly intuitions, on the various arguments, but of course I remain highly uncertain given the limited amount of empirical evidence we have on far future cause prioritization.


Many in the effective altruism (EA) community think the far future is a very important consideration when working to do the most good. The basic argument is that humanity could continue to exist for a very long time and could expand its civilization to the stars, creating a very large amount of moral value. The main narrative has been that this civilization could be a very good one, and that in the coming decades, we face sizable risks of extinctions that could prevent us from obtaining this “cosmic endowment.” The argument goes that these risks also seem like they can be reduced with a fairly small amount of additional resources (e.g. time, money), and therefore extinction risk reduction is one of the most important projects of humanity and the EA community.


(This argument also depends on a moral view that bringing about the existence of sentient beings can be a morally good and important action, comparable to helping sentient beings who currently exist live better lives. This is a contentious view in academic philosophy. See, for example, “'Making People Happy, Not Making Happy People': A Defense of the Asymmetry Intuition in Population Ethics.”)


However, one can accept the first part of this argument — that there is a very large amount of expected moral value in the far future and it’s relatively easy to make a difference in that value — without deciding that extinction risk is the most important project. In slightly different terms, one can decide not to work on reducing population risks, risks that could reduce the number of morally relevant individuals in the far future (of course, these are only risks of harm if one believes more individuals is a good thing), and instead work on reducing quality risks, risks that could reduce the quality of morally relevant individuals’ existence. One specific type of quality risk often discussed is a risk of astronomical suffering (s-risk), defined as “events that would bring about suffering on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.”

This blog post makes the case for focusing on quality risks over population risks. More specifically, though also more tentatively, it makes the case for focusing on reducing quality risk through moral circle expansion (MCE), the strategy of impacting the far future through increasing humanity’s concern for sentient beings who currently receive little consideration (i.e. widening our moral circle so it includes them), over AI alignment (AIA), the strategy of impacting the far future through increasing the likelihood that humanity creates an artificial general intelligence (AGI) that behaves as its designers want it to (known as the alignment problem).[1][2]


The basic case for MCE is very similar to the case for AIA. Humanity could continue to exist for a very long time and could expand its civilization to the stars, creating a very large number of sentient beings. The sort of civilization we create, however, seems highly dependent on our moral values and moral behavior. In particular, it’s uncertain whether many of those sentient beings will receive the moral consideration they deserve based on their sentience, i.e. whether they will be in our “moral circle” or not, like the many sentient beings who have suffered intensely over the course of human history (e.g. from torture, genocide, oppression, war). It seems the moral circle can be expanded with a fairly small amount of additional resources (e.g. time, money), and therefore MCE is one of the most important projects of humanity and the EA community.


Note that MCE is a specific kind of values spreading, the parent category of MCE that describes any effort to shift the values and moral behavior of humanity and its decendants (e.g. intelligent machines) in a positive direction to benefit the far future. (Of course, some people attempt to spread values in order to benefit the near future, but in this post we’re only considering far future impact.)


I’m specifically comparing MCE and AIA because AIA is probably the most favored method of reducing extinction risk in the EA community. AIA seems to be the default cause area to favor if one wants to have an impact on the far future, and I’ve been asked several times why I favor MCE instead.


This discussion risks conflating AIA with reducing extinction risk. These are two separate ideas, since an unaligned AGI could still lead to a large number of sentient beings, and an aligned AGI could still potentially cause extinction or population stagnation (e.g. if according to the designers’ values, even the best civilization the AGI could help build is still worse than nonexistence). However, most EAs focused on AIA seem to believe that the main risk is something quite like extinction, such as the textbook example of an AI that seeks to maximize the number of paperclips in the universe. I’ll note when the distinction between AIA and reducing extinction risk is relevant. Similarly, there are sometimes important prioritization differences between MCE and other types of values spreading, and those will be noted when they matter. (This paragraph is an important qualification for the whole post. The possibility of unaligned AGI that involves a civilization (and, less so because it seems quite unlikely, the possibility of an AGI that causes extinction) is important to consider for far future cause prioritization. Unfortunately, elaborating on this would make this post far more complicated and far less readable, and would not change many of the conclusions. Perhaps I’ll be able to make a second post that adds this discussion at some point.)

It’s also important to note that I’m discussing specifically AIA here, not all AI safety work in general. AI safety, which just means increasing the likelihood of beneficial AI outcomes, could be interpreted as including MCE, since MCE plausibly makes it more likely that an AI would be built with good values. However, MCE doesn’t seem like a very plausible route to increasing the likelihood that AI is simply aligned with the intentions of its designers, so I think MCE and AIA are fairly distinct cause areas.


AI safety can also include work on reducing s-risks, such as specifically reducing the likelihood of an unaligned AI that causes astronomical suffering, rather than reducing the likelihood of all unaligned AI. I think this is an interesting cause area, though I am unsure about its tractability and am not considering it in the scope of this blog post.


The post’s publication was supported by Greg Lewis, who was interested in this topic and donated $1,000 to Sentience Institute, the think tank I co-founded which researches effective strategies to expand humanity’s moral circle, conditional on this post being published to the Effective Altruism Forum. Lewis doesn’t necessarily agree with any of its content. He decided on the conditional donation prior to the post being written, and I did ask him to review the post prior to publication and it was edited based on his feedback.

The expected value of the far future

Whether we prioritize reducing extinction risk partly depends on how good or bad we expect human civilization to be in the far future, given it continues to exist. In my opinion, the assumption that it will be very good is a tragically unexamined assumption in the EA community.

What if it’s close to zero?

If we think the far future is very good, that clearly makes reducing extinction risk more promising. And if we think the far future is very bad, that makes reducing extinction risk not just unpromising, but actively very harmful. But what if it’s near the middle, i.e. close to zero?[3] 80,000 Hours wrote that to believe reducing extinction risk is not an EA priority on the basis of the expected moral value of the far future,


...even if you’re not sure how good the future will be, or suspect it will be bad, you may want civilisation to survive and keep its options open. People in the future will have much more time to study whether it’s desirable for civilisation to expand, stay the same size, or shrink. If you think there’s a good chance we will be able to act on those moral concerns, that’s a good reason to leave any final decisions to the wisdom of future generations. Overall, we’re highly uncertain about these big-picture questions, but that generally makes us more concerned to avoid making any irreversible commitments...


This reasoning seems mistaken to me because wanting “civilisation to survive and keep its options open” depends on optimism that civilization will do research, make good[4] decisions based on that research, and be capable of implementing those decisions.[5] In other words, while preventing extinction keeps options open for good things to happen, it also keeps options open for bad things to happen, and desiring this option value depends on an optimism that the good things are more likely. In other words, the reasoning assumes the optimism (thinking the far future is good, or at least that humans will make good decisions and be able to implement them[6]), which is also its conclusion.


Having that optimism makes sense in many decisions, which is why keeping options open is often a good heuristic. In EA, for example, people tend to do good things with their careers, which means career option value is a useful thing. This doesn’t readily translate to decisions where it’s not clear whether the actors involved will have a positive or negative impact. (Note 80,000 Hours isn’t making this comparison. I’m just making it to explain my own view here.)


There’s also a sense in which preventing extinction risk decreases option value because if humanity progresses past certain civilizational milestones that make extinction more unlikely — say, the rise of AGI or expansion beyond our own solar system — it might become harder or even impossible to press the “off switch” (ending civilization). However, I think most would agree that there’s more overall option value in a civilization that has gotten past these milestones because there’s a much wider variety of non-extinct civilizations than extinct civilizations.[7]


If you think that the expected moral value of the far future is close to zero, even if you think it’s slightly positive, then reducing extinction risk is a less promising EA strategy than if you think it’s very positive.

Key considerations

I think the considerations on this topic are best represented as questions where people’s beliefs (mostly just intuitions) vary on a long spectrum. I’ll list these in order of where I would guess I have the strongest disagreement with people who believe the far future is highly positive in expected value (shortened as HPEV-EAs), and I’ll note where I don’t think I would disagree or might even have a more positive-leaning belief than the average such person.


  1. I think there’s a significant[8] chance that the moral circle will fail to expand to reach all sentient beings, such as artificial/small/weird minds (e.g. a sophisticated computer program used to mine asteroids, but one that doesn’t have the normal features of sentient minds like facial expressions). In other words, I think there’s a significant chance that powerful beings in the far future will have low willingness to pay for the welfare of many of the small/weird minds in the future.[9]

  2. I think it’s likely that the powerful beings in the far future (analogous to humans as the powerful beings on Earth in 2018) will use large numbers of less powerful sentient beings, such as for recreation (e.g. safaris, war games), a labor force (e.g. colonists to distant parts of the galaxy, construction workers), scientific experiments, threats, (e.g. threatening to create and torture beings that a rival cares about), revenge, justice, religion, or even pure sadism.[10] I believe this because there have been less powerful sentient beings for all of humanity’s existence and well before (e.g. predation), many of whom are exploited and harmed by humans and other animals, and there seems to be little reason to think such power dynamics won’t continue to exist.

    Alternative uses of resources include simply working to increase one’s own happiness directly (e.g. changing one’s neurophysiology to be extremely happy all the time), and constructing large non-sentient projects like a work of art. Though each of these types of project could still include sentient beings, such as for experimentation or a labor force.

    With the exception of threats and sadism, the less powerful minds seem like they could suffer intensely because their intense suffering could be instrumentally useful. For example, if the recreation is nostalgic, or human psychology persists in some form, we could see powerful beings causing intense suffering in order to see good triumph over evil or in order to satisfy curiosity about situations that involve intense suffering (of course, the powerful beings might not acknowledge the suffering as suffering, instead conceiving of it as simulated but not actually experienced by the simulated entities). For another example, with a sentient labor force, punishment could be a stronger motivator than reward, as indicated by the history of evolution on Earth.[11][12]

  3. I place significant moral value on artificial/small/weird minds.

  4. I think it’s quite unlikely that human descendants will find the correct morality (in the sense of moral realism, finding these mind-independent moral facts), and I don’t think I would care much about that correct morality even if it existed. For example, I don’t think I would be compelled to create suffering if the correct morality said this is what I should do. Of course, such moral facts are very difficult to imagine, so I’m quite uncertain about what my reaction to them would be.[13]

  5. I’m skeptical about the view that technology and efficiency will remove the need for powerless, high-suffering, instrumental moral patients. An example of this predicted trend is that factory farmed animals seem unlikely to be necessary in the far future because of their inefficiency at producing animal products. Therefore, I’m not particularly concerned about the factory farming of biological animals continuing into the far future. I am, however, concerned about similar but less inefficient systems.

    An example of how technology might not render sentient labor forces and other instrumental sentient beings obsolete is how humans seem motivated to have power and control over the world, and in particular seem more satisfied by having power over other sentient beings than by having power over non-sentient things like barren landscapes.

    I do still believe there’s a strong tendency towards efficiency and that this has the potential to render much suffering obsolete; I just have more skepticism about it than I think is often assumed by HPEV-EAs.[14]

  6. I’m skeptical about the view that human descendants will optimize their resources for happiness (i.e. create hedonium) relative to optimizing for suffering (i.e. create dolorium).[15] Humans currently seem more deliberately driven to create hedonium, but creating dolorium might be more instrumentally useful (e.g. as a threat to rivals[16]).

    On this topic, I similarly do still believe there’s a higher likelihood of creating hedonium; I just have more skepticism about it than I think is often assumed by EAs.

  7. I’m largely in agreement with the average HPEV-EA in my moral exchange rate between happiness and suffering. However, I think those EAs tend to greatly underestimate how much the empirical tendency towards suffering over happiness (e.g. wild animals seem to endure much more suffering than happiness) is evidence of a future empirical asymmetry.

    My view here is partly informed by the capacities for happiness and suffering that have evolved in humans and other animals, the capacities that seem to be driven by cultural forces (e.g. corporations seem to care more about downsides than upsides, perhaps because it’s easier in general to destroy and harm things than to create and grow them), and speculation about what could be done in more advanced civilizations, such as my best guess on what a planet optimized for happiness and a planet optimized for suffering would look like. For example, I think a given amount of dolorium/dystopia (say, the amount that can be created with 100 joules of energy) is far larger in absolute moral expected value than hedonium/utopia made with the same resources.

  8. I’m unsure of how much I would disagree with HPEV-EAs about the argument that we should be highly uncertain about the likelihood of different far future scenarios because of how highly speculative our evidence is, which pushes my estimate of the expected value of the far future towards the middle of the possible range, i.e. towards zero.

  9. I’m unsure of how much I would disagree with HPEV-EAs about the persistence of evolutionary forces into the future (i.e. how much future beings will be determined by fitness, rather than characteristics we might hope for like altruism and happiness).[17]

  10. From the historical perspective, it worries me that many historical humans seem like they would be quite unhappy with the way human morality changed after them, such as the way Western countries are less concerned about previously-considered-immoral behavior like homosexuality and gluttony than their ancestors were in 500 CE. (Of course, one might think historical humans would agree with modern humans upon reflection, or think that much of humanity’s moral changes have been due to improved empirical understanding of the world.)[18]

  11. I’m largely in agreement with HPEV-EAs that humanity’s moral circle has a track record of expansion and seems likely to continue expanding. For example, I think it’s quite likely that powerful beings in the far future will care a lot about charismatic biological animals like elephants or chimpanzees, or whatever beings have a similar relationship to those powerful beings as humanity has to elephants and chimpanzees. (As mentioned above, my pessimism about the continued expansion is largely due to concern about the magnitude of bad-but-unlikely outcomes and the harms that could occur due to MCE stagnation.)


Unfortunately, we don’t have much empirical data or solid theoretical arguments on these topics, so the disagreements I’ve had with HPEV-EAs have mostly just come down to differences in intuition. This is a common theme for prioritization among far future efforts. We can outline the relevant factors and a little empirical data, but the crucial factors seem to be left to speculation and intuition.


Most of these considerations are about how society will develop and utilize new technologies, which suggests we can develop relevant intuitions and speculative capacity by studying social and technological change. So even though these judgments are intuitive, we could potentially improve them with more study of big-picture social and technological change, such as Sentience Institute’s MCE research or Robin Hanson’s book on The Age of Em that analyzes what a future of brain emulations would look like. (This sort of empirical research is what I see as the most promising future research avenue for far future cause prioritization. I worry EAs overemphasize armchair research (like most of this post, actually) for various reasons.[19])


I’d personally be quite interested in a survey of people with expertise in the relevant fields of social, technological, and philosophical research, in which they’re asked about each of the considerations above, though it might be hard to get a decent sample size, and I think it would be quite difficult to debias the respondents (see the Bias section of this post).


I’m also interested in quantitative analyses of these considerations —  calculations including all of these potential outcomes and associated likelihoods. As far as I know, this kind of analysis has only been attempted so far by Michael Dickens in “A Complete Quantitative Model for Cause Selection,” in which Dickens notes that, “Values spreading may be better than existential risk reduction.” While this quantification might seem hopelessly speculative, I think it’s highly useful even in such situations. Of course, rigorous debiasing is also very important here.


Overall, I think the far future is close to zero in expected moral value, meaning it’s not nearly as good as is commonly assumed, implicitly or explicitly, in the EA community.


Range of outcomes

It’s difficult to compare the scale of far future impacts since they are all astronomical, and I find the consideration of scale here to overall not be very useful.

Technically, it seems like MCE involves a larger range of potential outcomes than reducing extinction risk through AIA because, at least from a classical consequentialist perspective (giving weight to both negative and positive outcomes), it could make the difference between some of the worst far futures imaginable and the best far futures. Reducing extinction risk through AIA only makes the difference between nonexistence (a far future of zero value) and whatever world comes to exist. If one believes the far future is highly positive, this could still be a very large range, but it would still be less than the potential change from MCE.

How much less depends on one’s views of how bad the worst future is relative to the best future. If the absolute value is the same, then MCE has a range twice as large as extinction risk.


As mentioned in the Context section above, the change in the far future that AIA could achieve might not exactly be extinction versus non-extinction. While an aligned AI would probably not involve the extinction of all sentient beings, since that would require the values of its creators to prefer extinction over all other options, an unaligned AI might not necessarily involve extinction. To use the canonical AIA example of a “paperclip maximizer” (used to illustrate how an AI could easily have a harmful goal without any malicious intention), the rogue AI might create sentient beings as a labor force to implement its goal of maximizing the number of paperclips in the universe, or create sentient beings for some other goal.[20]


This means that the range of AIA is the difference between the potential universes with aligned AI and unaligned AI, which could be very good futures contrasted with very bad futures, rather than just very good futures contrasted with nonexistence.


Brian Tomasik has written out a thoughtful (though necessarily speculative and highly uncertain) breakdown of the risks of suffering in both aligned and unaligned AI scenarios, which weakly suggests that an aligned AI would lead to more suffering in expectation.


All things considered, it seems that the range of quality risk reduction (including MCE) is larger than that of extinction risk reduction (including AIA, depending on one’s view of what difference AI alignment makes), but this seems like a fairly weak consideration to me because (i) it’s a difference of roughly two-fold, which is quite small relative to the differences of ten-times, a thousand-times, etc. that we frequently see in cause prioritization, (ii) there are numerous fairly arbitrary judgment calls (like considering reducing extinction risk from AI versus AIA versus AI safety) that lead to different results.[21]

Likelihood of different far future scenarios[22][23]

MCE is relevant for many far future scenarios where AI doesn’t undergo the sort of “intelligence explosion” or similar progression that makes AIA important; for example, if AGI is developed by an institution like a foreign country that has little interest in AIA, or if AI is never developed, or if it’s developed slowly in a way that makes safety adjustments quite easy as that development occurs. In each of these scenarios, the way society treats sentient beings, especially those currently outside the moral circle, seems like it could still be affected by MCE. As mentioned earlier, I think there is a significant chance that the moral circle will fail to expand to reach all sentient beings, and I think a small moral circle could very easily lead to suboptimal or dystopian far future outcomes.


On the other hand, some possible far future civilizations might not involve moral circles, such as if there is an egalitarian society where each individual is able to fully represent their own interests in decision-making and this societal structure was not reached through MCE because these beings are all equally powerful for technological reasons (and no other beings exist and they have no interest in creating additional beings). Some AI outcomes might not be affected by MCE, such as an unaligned AI that does something like maximizing the number of paperclips for reasons other than human values (such as a programming error) or one whose designers create its value function without regard for humanity’s current moral views (“coherent extrapolated volition” could be an example of this, though I agree with Brian Tomasik that current moral views will likely be important in this scenario).


Given my current, highly uncertain estimates of the likelihood of various far future scenarios, I would guess that MCE is applicable in somewhat more cases than AIA, suggesting it’s easier to make a difference to the far future through MCE. (This is analogous to saying the risk of MCE-failure seems greater than the risk of AIA-failure, though I’m trying to avoid simplifying these into binary outcomes.)


How much of an impact can we expect our marginal resources to have on the probability of extinction risk, or on the moral circle of the far future?

Social change versus technical research

One may believe changing people’s attitudes and behavior is quite difficult, and direct work on AIA involves a lot less of that. While AIA likely involves influencing some people (e.g. policymakers, researchers, and corporate executives), MCE is almost entirely influencing people’s attitudes and behavior.[24]


However, one could instead believe that technical research is more difficult in general, pointing to potential evidence such as the large amount of money spent on technical research (e.g. by Silicon Valley) with often very little to show for it, while huge social change seems to sometimes be effected by small groups of advocates with relatively little money (e.g. organizers of revolutions in Egypt, Serbia, and Turkey). (I don’t mean this as a very strong or persuasive argument, just as a possibility. There are plenty of examples of tech done with few resources and social change done with many.)


It’s hard to speak so generally, but I would guess that technical research tends to be easier than causing social change. And this seems like the strongest argument in favor of working on AIA over working on MCE.

Track record

In terms of EA work explicitly focused on the goals of AIA and MCE, AIA has a much better track record. The past few years have seen significant technical research output from organizations like MIRI and FHI, as documented by user Larks on the EA Forum for 2016 and 2017. I’d defer readers to those posts, but as a brief example, MIRI had an acclaimed paper on “Logical Induction,” which used a financial market process to estimate the likelihood of logical facts (e.g. mathematical propositions like the Riemann hypothesis) that we aren’t yet sure of. This is analogous to how we use probability theory to estimate the likelihood of empirical facts (e.g. a dice roll). In the bigger picture of AIA, this research could help lay the technical foundation for building an aligned AGI. See Larks’ post for a discussion of more papers like this, as well as non-technical work done by AI-focused organizations such as the Future of Life Institute’s open letter on AI safety signed by leading AI researchers and cited by the White House’s “Report on the Future of Artificial Intelligence.”


Using an analogous definition for MCE, EA work explicitly focused on MCE (meaning expanding the moral circle in order to improve the far future) basically only started in 2017 with the founding of Sentience Institute (SI), though there were various blog posts and articles discussing it before then. SI has basically finished four research projects: (1) Foundational Question Summaries that summarize evidence we have on important effective animal advocacy (EAA) questions, including a survey of EAA researchers, (2) a case study of the British antislavery movement to better understand how they achieved one of the first major moral circle expansions in modern history, (3) a case study of nuclear power to better understand how some countries (e.g. France) enthusiastically adopted this new technology, but others (e.g. the US) didn’t, (4) a nationally representative poll of US attitudes towards animal farming and animal-free food.

With a broader definition of MCE that includes activities that people prioritizing MCE tend to think are quite indirectly effective (see the Neglectedness section for discussion of definitions), we’ve seen EA achieve quite a lot more, such as the work done by The Humane League, Mercy For Animals, Animal Equality, and other organizations on corporate welfare reforms to animal farming practices, and the work done by The Good Food Institute and others on supporting a shift away from animal farming, especially through supporting new technologies like so-called “clean meat.”

Since I favor the narrower definition, I think AIA outperforms MCE on track record, but the difference in track record seems largely explained by the greater resources spent on AIA, which makes it a less important consideration. (Also, when I personally decided to focus on MCE, SI did not yet exist, so the lack of track record was an even stronger consideration in favor of AIA (though MCE was also more neglected at that time).)

To be clear, the track records of all far future projects tend to be weaker than near-term projects where we can directly see the results.


If one values robustness, meaning a higher certainty that one is having a positive impact, either for instrumental or intrinsic reasons, then AIA might be more promising because once we develop an aligned AI (that continues to be aligned over time), the work of AIA is done and won’t need to be redone in the future. With MCE, assuming the advent of AI or similar developments won’t fix society’s values in place (known as “value lock-in”), then MCE progress could more easily be undone, especially if one believes there’s a social setpoint that humanity drifts back towards when moral progress is made.[25]


I think the assumptions of this argument make it quite weak: I’d guess an “intelligence explosion” has a significant chance of value lock-in,[26][27] and I don’t think there’s a setpoint in the sense that positive moral change increases the risk of negative moral change. I also don’t value robustness intrinsically at all or instrumentally very much; I think that there is so much uncertainty in all of these strategies and such weak prior beliefs[28] that differences in certainty of impact matter relatively little.


Work on either cause area runs the risk of backfiring. The main risk for AIA seems to be that the technical research done to better understand how to build an aligned AI will increase AI capabilities generally, meaning it’s also easier for humanity to produce an unaligned AI. The main risk for MCE seems to be that certain advocacy strategies will end up having the opposite effect as intended, such as a confrontational protest for animal rights that ends up putting people off of the cause.


It’s unclear which project has better near-term proxies and feedback loops to assess and increase long-term impact. AIA has technical problems with solutions that can be mathematically proven, but these might end up having little bearing on final AIA outcomes, such as if an AGI isn’t developed using the method that was advised or if technical solutions aren’t implemented by policy-makers. MCE has metrics like public attitudes and practices. My weak intuition here, and the weak intuition of other reasonable people I’ve discussed this with, is that MCE has better near-term proxies.


It’s unclear which project has more historical evidence that EAs can learn from to be more effective. AIA has previous scientific, mathematical, and philosophical research and technological successes and failures, while MCE has previous psychological, social, political, and economic research and advocacy successes and failures.


Finally, I do think that we learn a lot about tractability just by working directly on an issue. Given how little effort has gone into MCE itself (see Neglectedness below), I think we could resolve a significant amount of uncertainty with more work in the field.

Overall, considering only direct tractability (i.e. ignoring information value due to neglectedness, which would help other EAs with their cause prioritization), I’d guess AIA is a little more tractable.


With neglectedness, we also face a challenge of how broadly to define the cause area. In this case, we have a fairly clear goal with our definition: to best assess how much low-hanging fruit is available. To me, it seems like there are two simple definitions that meet this goal: (i) organizations or individuals working explicitly on the cause area, (ii) organizations or individuals working on the strategies that are seen as top-tier by people focused on the cause area. How much one favors (i) versus (ii) depends largely on whether one thinks the top-tier strategies are fairly well-established and thus (ii) makes sense, or whether they will change over time such that one should favor (i) because those organizations and individuals will be better able to adjust.[29]


With the explicit focus definitions of AIA and MCE (recall this includes having a far future focus), it seems that MCE is much more neglected and has more low-hanging fruit.[30] For example, there is only one organization that I know of explicitly committed to MCE in the EA community (SI), while numerous organizations (MIRI, CHAI, part of FHI, part of CSER, even parts of AI capabilities organizations like Montreal Institute for Learning Algorithms, DeepMind, and OpenAI, etc.) are explicitly committed to AIA. Because MCE seems more neglected, we could learn a lot about MCE through SI’s initial work, such as how easily advocates have achieved MCE throughout history.


If we include those working on the cause area without an explicit focus, then that seems to widen the definition of MCE to include some of the top strategies being used to expand the moral circle in the near-term, such as farmed animal work done by Animal Charity Evaluators and it’s top-recommended charities, which have a combined budget of around $7.5 million in 2016. The combined budgets of top-tier AIA work is harder to estimate, but the Centre for Effective Altruism estimates all AIA work in 2016 was around $6.6 million. The AIA budgets seem to be increasing more quickly than the MCE budgets, especially given the grant-making of the Open philanthropy project. We could also include EA movement-building organizations that place a strong focus on reducing extinction risk, and even AIA specifically, such as 80,000 Hours. The categorization for MCE seems to have more room to broaden, perhaps all the way to mainstream animal advocacy strategies like the work of People for the Ethical Treatment of Animals (PETA), which might make AIA more neglected. (It could potentially go even farther, such as advocating for human sweatshop laborers, but that seems too far removed and I don’t know any MCE advocates who think it’s plausibly top-tier.)


I think there’s a difference in aptitude that suggests MCE is more neglected. Moral advocacy seems like a field which, while quite crowded, seems relatively easy for deliberate, thoughtful people to vastly outperform the average advocate,[31] which can lead to surprisingly large impact (e.g. EAs have already had far more success in publishing their writing, such as books and op-eds, than most writers hope for).[32] Additionally, despite centuries of advocacy, very little quality research has been done to critically examine what advocacy is effective and what’s not, while the fields of math, computer science, and machine learning involve substantial self-reflection and are largely worked on by academics who seem to use more critical thinking than the average activist (e.g. there’s far more skepticism in these academic communities, a demand for rigor and experimentation that’s rarely seen among advocates). In general, I think the aptitude of the average social change advocate is much lower than that of the average technological researcher, suggesting MCE is more neglected, though of course other factors also count.


The relative neglectedness of MCE also seems likely to continue, given the greater self-interest humanity has in AIA relative to MCE and, in my opinion, the net biases towards AIA described in the Biases section of this blog post. (This self-interest argument is a particularly important consideration for prioritizing MCE over AIA in my view.[33])


However, while neglectedness is typically thought to make a project more tractable, it seems that existing work in the extinction risk space has made marginal contributions more impactful in some ways. For example, talented AI researchers can find work relatively easily at an organization dedicated to AIA, while the path for talented MCE researchers is far less clear and easy. This alludes to the difference in tractability that might exist between labor resources and funding resources, as it currently seems like MCE is much more funding-constrained[34] while AIA is largely talent-constrained.


As another example, there are already solid inroads between the AIA community and the AI decision-makers, and AI decision-makers have already expressed interest in AIA, suggesting that influencing them with research results will be fairly easy once those research results are in hand. This means both that our estimation of AIA’s neglectedness should decrease, and that our estimation of its non-neglectedness tractability should increase, in the sense that neglectedness is a part of tractability. (The definitions in this framework vary.)

All things considered, I find MCE to be more compelling from a neglectedness perspective, particularly due to the current EA resource allocation and the self-interest humanity has, and will most likely continue to have, in AIA. When I decided to focus on MCE, there was an even stronger case for neglectedness because no organization existed committed to that goal (SI was founded in 2017), though there was an increased downside to MCE — the even more limited track record.


Values spreading as a far future intervention has been criticized on the following grounds: People have very different values, so trying to promote your values and change other people’s could be seen as uncooperative. Cooperation seems to be useful both directly (e.g. how willing are other people to help us out if we’re fighting them?) and in a broader sense because of superrationality, an argument that one should help others even when there’s no causal mechanism for reciprocation.[35]


I think this is certainly a good consideration against some forms of values spreading. For example, I don’t think it’d be wise for an MCE-focused EA to disrupt the Effective Altruism Global conferences (e.g. yell on stage and try to keep the conference from continuing) if they have an insufficient focus on MCE. This seems highly ineffective because of how uncooperative it is, given the EA space is supposed to be one for having challenging discussions and solving problems, not merely advocating one’s positions like a political rally.


However, I don’t think it holds much weight against MCE in particular for two reasons: First, because I don’t think MCE is particularly uncooperative. For example, I never bring up MCE with someone and hear, “But I like to keep my moral circle small!” I think this is because there are many different components of our attitudes and worldview that we refer to as values and morals. People have some deeply-held values that seem strongly resistant to change, such as their religion or the welfare of their immediate family, but very few people seem to have small moral circles as a deeply-held value. Instead, the small moral circle seems to mostly be a superficial, casual value (though it’s often connected to the deeper values) that people are okay with — or even happy about — changing.[36]


Second, insofar as MCE is uncooperative, I think a large number of other EA interventions, including AIA, are similarly uncooperative. Many people even in the EA community are concerned with, or even opposed to, AIA. For example, if one believes an aligned AI would create a worse far future than an unaligned AI, or if one thinks AIA is harmfully distracting from more important issues and gives EA a bad name. This isn’t to say I think AIA is bad because it’s uncooperative — on the contrary, this seems like a level of uncooperativeness that’s often necessary for dedicated EAs. (In a trivial way, basically all action involves uncooperativeness because it’s always about changing the status quo or preventing the status quo from changing.[37] Even inaction can involve uncooperativeness if it means not working to help someone who would like your help.)

I do think it’s more important to be cooperative in some other situations, such as if one has a very different value system than some of their colleagues, as might be the case for the Foundational Research Institute, which advocates strongly for cooperation with other EAs.

Cooperation with future do-gooders

Another argument against values spreading goes something like, “We can worry about values after we’ve safely developed AGI. Our tradeoff isn’t, ‘Should we work on values or AI?’ but instead ‘Should we work on AI now and values later, or values now and maybe AI later if there’s time?’”


I agree with one interpretation of the first part of this argument, that urgency is an important factor and AIA does seem like a time-sensitive cause area. However, I think MCE is similarly time-sensitive because of risks of value lock-in where our descendants’ morality becomes much harder to change, such as if AI designers choose to fix the values of an AGI, or at least to make them independent of other people’s opinions (they could still be amenable to self-reflection of the designer and new empirical data about the universe other than people’s opinions)[38]; if humanity sends out colonization vessels across the universe that are traveling too fast for us to adjust based on our changing moral views; or if society just becomes too wide and disparate to have effective social change mechanisms like we do today on Earth.


I disagree with the stronger interpretation, that we can count on some sort of cooperation with or control over future people. There might be some extent to which we can do this, such as via superrationality, but that seems like a fairly weak effect. Instead, I think we’re largely on our own, deciding what we do in the next few years (or perhaps in our whole career), and just making our best guess of what future people will do. It sounds very difficult to strike a deal with them that will ensure they work on MCE in exchange for us working on AIA.


I’m always cautious about bringing considerations of bias into an important discussion like this. Considerations easily turn into messy, personal attacks, and often you can fling roughly-equal considerations of counter-biases when accusations of bias are hurled at you. However, I think we should give them serious consideration in this case. First, I want to be exhaustive in this blog post, and that means throwing every consideration on the table, even messy ones. Second, my own cause prioritization “journey” led me first to AIA and other non-MCE/non-animal-advocacy EA priorities (mainly EA movement-building), and it was considerations of bias that allowed me to look at the object-level arguments with fresh eyes and decide that I had been way off in my previous assessment.


Third and most importantly, people’s views on this topic are inevitably driven mostly by intuitive, subjective judgment calls. One could easily read everything I’ve written in this post and say they lean in the MCE direction on every topic, or the AIA direction, and there would be little object-level criticism one could make against that if they just based their view on a different intuitive synthesis of the considerations. This subjectivity is dangerous, but it is also humbling. It requires us to take an honest look at our own thought processes in order to avoid the subtle, irrational effects that might push us in either direction. It also requires caution when evaluating “expert” judgment, given how much experts could be affected by personal and social biases themselves.


The best way I know of to think about bias in this case is to consider the biases and other factors that favor either cause area and see which case seems more powerful, or which particular biases might be affecting our own views. The following lists are presumably not exhaustive but lay out what I think are some common key parts of people’s journeys to AIA or MCE. Of course, these factors are not entirely deterministic and probably not all will apply to you, nor do they necessarily mean that you are wrong in your cause prioritization. Based on the circumstances that apply more to you, consider taking a more skeptical look at the project you favor and your current views on the object-level arguments for it.

One might be biased towards AIA if...

  1. They eat animal products, and thus are assign lower moral value and less mental faculties to animals.

  2. They haven’t accounted for the bias of speciesism.

  3. They lack personal connections to animals, such as growing up with pets.

  4. They are or have been a fan of science fiction and fantasy literature and media, especially if they dreamed of being the hero.

  5. They have a tendency towards technical research over social projects.

  6. They lack social skills.

  7. They are inclined towards philosophy and mathematics.

  8. They have a negative perception of activists, perhaps seeing them as hippies, irrational, idealistic, “social justice warriors,” or overly emotion-driven.

  9. They are a part of the EA community, and therefore drift towards the status quo of EA leaders and peers. (The views of EA leaders can of course be genuine evidence of the correct cause prioritization, but they can also lead to bias.)

  10. The idea of “saving the world” appeals to them.

  11. They take pride in their intelligence, and would love if they could save the world just by doing brilliant technical research.

  12. They are competitive, and like the feeling/mindset of doing astronomically more good than the average do-gooder, or even the average EA. (I’ve argued in this post that MCE has this astronomical impact, but it lacks the feeling of literally “saving the world” or otherwise having a clear impact that makes a good hero’s journey climax, and it’s closely tied to lesser, near-term impacts.)

  13. They have little personal experience of extreme suffering, the sort that makes one pessimistic about the far future, especially regarding s-risks. (Personal experience could be one’s own experience or the experiences of close friends and family.)

  14. They have little personal experience of oppression, such as due to their gender, race, disabilities, etc.

  15. They are generally a happy person.

  16. They are generally optimistic, or at least averse to thinking about bad outcomes like how humanity could cause astronomical suffering. (Though some pessimism is required for AIA in the sense that they don’t count on AI capabilities researchers ending up with an aligned AI without their help.)

One might be biased towards MCE if...

  1. They are vegan, especially if they went vegan for non-animal or non-far-future reasons, such as for better personal health.

  2. Their gut reaction when they hear about extinction risk or AI risk is to judge it nonsensical.

  3. They have personal connections to animals, such as growing up with pets.

  4. They are or have been a fan of social movement/activism literature and media, especially if they dreamed of being a movement leader.

  5. They have a tendency towards social projects over technical research.

  6. They have benefitted from above-average social skills.

  7. They are inclined towards social science.

  8. They have a positive perception of activists, perhaps seeing them as the true leaders of history.

  9. They have social ties to vegans and animal advocates. (The views of these people can of course be genuine evidence of the correct cause prioritization, but they can also lead to bias.)

  10. The idea of “helping the worst off” appeals to them.

  11. They take pride in their social skills, and would love if they could help the worst off just by being socially savvy.

  12. They are not competitive, and like the thought of being a part of a friendly social movement.

  13. They have a lot of personal experience of extreme suffering, the sort that makes one pessimistic about the far future, especially regarding s-risks. (Personal experience could be one’s own experience or the experiences of close friends and family.)

  14. They have a lot of personal experience of oppression, such as due to their gender, race, disabilities, etc.

  15. They are generally an unhappy person.

  16. They are generally pessimistic, or at least don’t like thinking about good outcomes. (Though some optimism is required for MCE in the sense that they believe work on MCE can make a large positive difference in social attitudes and behavior.)

  17. They care a lot about directly seeing the impact of their work, even if the bulk of their impact is hard to see. (E.g. seeing improvements in the conditions of farmed animals, which can be seen as a proxy for helping farmed-animal-like beings in the far future.)


I personally found myself far more compelled towards AIA in my early involvement with EA before I had thought in detail about the issues discussed in this post. I think the list items in the AIA section apply to me much more strongly than the MCE list. When I considered these biases, in particular speciesism and my desire to follow the status quo of my EA friends, a fresh look at the object-level arguments changed my mind.


From my reading and conversations in EA, I think the biases in favor of AIA are also quite a bit stronger in the community, though of course some EAs — mainly those already working on animal issues for near-term reasons — probably feel a stronger pull in the other direction.

How you think about these bias considerations also depends on how biased you think the average EA is. If you, for example, think EAs tend to be quite biased in another way like “measurement bias” or “quantifiability bias” (a tendency to focus too much on easily-quantifiable, low-risk interventions), then considerations of biases on this topic should probably be more compelling to you than they will be to people who think EAs are less biased.


[1] This post attempts to compare these cause areas overall, but since that’s sometimes too vague, I specifically mean the strategies within each cause area that seem most promising. I think this is basically equal to “what EAs working on MCE most strongly prioritize” and “what EAs working on AIA most strongly prioritize.”

[2] There’s a sense in which AIA is a form of MCE simply because AIA will tend to lead to certain values. I’m excluding that AIA approach of MCE from my analysis here to avoid overlap between these two cause areas.

[3] Depending on how close we’re talking about, this could be quite unlikely. If we’re discussing the range of outcomes from dystopia across the universe to utopia across the universe, then a range like “between modern earth and the opposite value of modern earth” seems like a very tiny fraction of the total possible range.

[4] I mean “good” in a “positive impact” sense here, so it includes not just rationality according to the decision-maker but also value alignment, luck, being empirically well-informed, being capable of doing good things, etc.

[5] One reason for optimism is that you might think most extinction risk is in the next few years, such that you and other EAs you know today will still be around to do this research yourselves and make good decisions after those risks are avoided.

[6] Technically one could believe the far future is negative but also that humans will make good decisions about extinction, such as if one believes the far future (given non-extinction) will be bad only due to nonhuman forces, such as aliens or evolutionary trends, but has optimism about human decision-making, including both that humans will make good decisions about extinction and that they will be logistically able to make those decisions. I think this is an unlikely view to settle on, but it would make option value a good thing in a “close to zero” scenario.
Non-extinct civilizations could be maximized for happiness, maximized for interestingness, set up like Star Wars or another sci-fi scenario, etc. while extinct civilizations would all be devoid of sentient beings, perhaps with some variation in physical structure like different planets or remnant structures of human civilization.
My views on this are currently largely qualitative, but if I had to put a number on the word “significant” in this context, it’d be somewhere around 5-30%. This is a very intuitive estimate, and I’m not prepared to justify it.
Paul Christiano made a general argument in favor of humanity reaching good values in the long run due to reflection in his post “Against Moral Advocacy” (see the “Optimism about reflection” section) though he doesn’t specifically address concern for all sentient beings as a potential outcome, which might be less likely than other good values that are more driven by cooperation."
Nick Bostrom has considered some of these risks of artificial suffering using the term “mind crime,” which specifically refers to harming sentient beings created inside a superintelligence. See his book, Superintelligence.
The Foundational Research Institute has written about risks of astronomical suffering in “Reducing Risks of Astronomical Suffering: A Neglected Priority.” The TV series Black Mirror is an interesting dramatic exploration of how the far future could involve vasts amounts of suffering, such as the episodes “White Christmas” and “USS Callister.” Of course, the details of these situations often veer towards entertainment over realism, but their exploration of the potential for dystopias in which people abuse sentient digital entities is thought-provoking.
I’m highly uncertain about what sort of motivations (like happiness and suffering in humans) future digital sentient beings will have. For example, is punishment being a stronger motivator in earth-originating life just an evolutionary fluke that we can expect to dissipate in artificial beings? Could they be just as motivated to attain reward as we are to avoid punishment? I think this is a promising avenue for future research, and I’m glad it’s being discussed by some EAs.
Brian Tomasik discusses this in his essay on “Values Spreading is Often More Important than Extinction Risk,” suggesting that, “there's not an obvious similar mechanism pushing organisms toward the things that I care about.” However, Paul Christiano notes in “Against Moral Advocacy” that he expects “[c]onvergence of values” because “the space of all human values is not very broad,” though this seems quite dependent on how one defines the possible space of values.
This efficiency argument is also discussed in Ben West’s article on “An Argument for Why the Future May Be Good.”
The term “resources” is intentionally quite broad. This means whatever the limitations are on the ability to produce happiness and suffering, such as energy or computation.
[16] One can also create hedonium as a promise to get things from rivals, but promises seem less common than threats because threats tend to be more motivating and easier to implement (e.g it’s easier to destroy than create). However, some social norms encourage promises over threats because promises are better for society as a whole. Additionally, threats against powerful beings (e.g. other citizens in the same country) do less than threats against less powerful, or more distant beings, and the latter category might be increasingly common in the future. Additionally, threats and promises matter less when one considers that they are often unfulfilled because the other party doesn’t do the action that was the subject of the threat or promise.
Paul Christiano’s blog post on “Why might the future be good?” argues that “the future will be characterized by much higher influence for altruistic values [than self-interest],” though he seems to just be discussing the potential of altruism and self-interest to create positive value, rather than their potential to create negative value.

Brian Tomasik discusses Christiano’s argument and others in “The Future of Darwinism” and concludes, “Whether the future will be determined by Darwinism or the deliberate decisions of a unified governing structure remains unclear.”
One discussion of changes in morality on a large scale is Robin Hanson’s blog post, “Forager, Farmer Morals.”

[19] Armchair research is relatively easy, in the sense that all it requires is writing and thinking rather than also digging through historical texts, running scientific studies, or engaging in substantial conversation with advocates, researchers, and/or other stakeholders. It’s also more similar to the mathematical and philosophical work that most EAs are used to doing. And it’s more attractive as a demonstration of personal prowess to think your way into a crucial consideration than to arrive at one through the tedious work of research. (These reasons are similar to the reasons I feel most far-future-focused EAs are biased towards AIA over MCE.)
These sentient beings probably won’t be the biological animals we know today, but instead digital beings who can more efficiently achieve the AI’s goals.
The neglectedness heuristic involves a similar messiness of definitions, but the choices seem less arbitrary to me, and the different definitions lead to more similar results.
Arguably this consideration should be under Tractability rather than Scale.
There’s a related framing here of “leverage,” with the basic argument being that AIA seems more compelling than MCE because AIA is specifically targeted at an important, narrow far future factor (the development of AGI) while MCE is not as specifically targeted. This also suggests that we should consider specific MCE tactics focused on important, narrow far future factors, such as ensuring the AI decision-makers have wide moral circles even if the rest of society lags behind. I find this argument fairly compelling, including the implication that MCE advocates should focus more on advocating for digital sentience and advocating in the EA community than they would otherwise.
Though plausibly MCE involves only influencing a few decision-makers, such as the designers of an AGI.
Brian Tomasik discusses this in, “Values Spreading is Often More Important than Extinction Risk,” arguing that, “Very likely our values will be lost to entropy or Darwinian forces beyond our control. However, there's some chance that we'll create a singleton in the next few centuries that includes goal-preservation mechanisms allowing our values to be "locked in" indefinitely. Even absent a singleton, as long as the vastness of space allows for distinct regions to execute on their own values without take-over by other powers, then we don't even need a singleton; we just need goal-preservation mechanisms.”
Brian Tomasik discusses the likelihood of value lock-in in his essay, “Will Future Civilization Eventually Achieve Goal Preservation?”

[27] The advent of AGI seems like it will have similar effects on the lock-in of values and alignment, so if you think AI timelines are shorter (i.e. advanced AI will be developed sooner), then that increases the urgency of both cause areas. If you think timelines are so short that we will struggle to successfully reach AI alignment, then that decreases the tractability of AIA, but MCE seems like it could more easily have a partial effect on AI outcomes than AIA could.
In the case of near-term, direct interventions, one might believe that “most social programmes don’t work,” which suggests that we should have low, strong priors for intervention effectiveness that we need robustness to overcome.
Caspar Oesterheld discusses the ambiguity of neglectedness definitions in his blog post, "Complications in evaluating neglectedness." Other EAs have also raised concern about this commonly-used heuristic, and I almost included this content in this post under the “Tractability” section for this reason.
This is a fairly intuitive sense of the word “matched.” I’m taking the topic of ways to affect the far future, dividing it into population risk and quality risk categories, then treating AIA and MCE as subcategories of each. I’m also thinking in terms of each project (AIA and MCE) being in the category of “cause areas with at least pretty good arguments in their favor,” and I think “put decent resources into all such projects until the arguments are rebutted” is a good approach for the EA community.

[31] I mean “advocate” quite broadly here, just anyone working to effect social change, such as people submitting op-eds to newspapers or trying to get pedestrians to look at their protest or take their leaflets.
It’s unclear what the explanation is for this. It could just be demographic differences such as high IQ, going to elite universities, etc. but it could also be exceptional “rationality skills” like finding loopholes in the publishing system.
In Brian Tomasik’s essay on “Values Spreading is Often More Important than Extinction Risk,” he argues that “[m]ost people want to prevent extinction” while, “In contrast, you may have particular things that you value that aren't widely shared. These things might be easy to create, and the intuition that they matter is probably not too hard to spread. Thus, it seems likely that you would have higher leverage in spreading your own values than in working on safety measures against extinction.”
This is just my personal impression from working in MCE, especially with my organization Sentience Institute. With indirect work, The Good Food Institute is a potential exception since they have struggled to quickly hired talented people after their large amounts of funding.
See “Superrationality” in “Reasons to Be Nice to Other Value Systems” for an EA introduction to the idea. See “In favor of ‘being nice’” in “Against Moral Advocacy” as example of cooperation as an argument against values spreading. In “Multiverse-wide Cooperation via Correlated Decision Making,” Caspar Oesterheld argues that superrational cooperation makes MCE more important.
This discussion is complicated by the widely varying degrees of MCE. While, for example, most US residents seem perfectly okay with expanding concern to vertebrates, there would be more opposition to expanding to insects, and even more to some simple computer programs that some argue should fit into the edges of our moral circles. I do think the farthest expansions are much less cooperative in this sense, though if the message is just framed as, “expand our moral circle to all sentient beings,” I still expect strong agreement.
One exception is a situation where everyone wants a change to happen, but nobody else wants it badly enough to put the work into changing the status quo.
My impression is that the AI safety community currently wants to avoid fixing these values, though they might still be trying to make them resistant to advocacy from other people, and in general I think many people today would prefer to fix the values of an AGI when they consider that they might not agree with potential future values.

68 comments, sorted by Highlighting new comments since Today at 4:28 AM
New Comment

Thank you for writing this post. An evergreen difficulty that applies to discussing topics of such a broad scope is the large number of matters that are relevant, difficult to judge, and where one's judgement (whatever it may be) can be reasonably challenged. I hope to offer a crisper summary of why I am not persuaded.

I understand from this the primary motivation of MCE is avoiding AI-based dystopias, with the implied causal chain being along the lines of, “If we ensure the humans generating the AI have a broader circle of moral concern, the resulting post-human civilization is less likely to include dystopic scenarios involving great multitudes of suffering sentiences.”

There are two considerations that speak against this being a greater priority than AI alignment research: 1) Back-chaining from AI dystopias leaves relatively few occasions where MCE would make a crucial difference. 2) The current portfolio of ‘EA-based’ MCE is poorly addressed to averting AI-based dystopias.

Re. 1): MCE may prove neither necessary nor sufficient for ensuring AI goes well. On one hand, AI designers, even if speciesist themselves, might nonetheless provide the right apparatus for value learning such that resulting AI will not propagate the moral mistakes of its creators. On the other, even if the AI-designers have the desired broad moral circle, they may have other crucial moral faults (maybe parochial in other respects, maybe selfish, maybe insufficiently reflective, maybe some mistaken particular moral judgements, maybe naive approaches to cooperation or population ethics, and so on) - even if they do not, there are manifold ways in the wider environment (e.g. arms races), or in terms of technical implementation, that may incur disaster.

It seems clear to me that, pro tanto, the less speciesist the AI-designer, the better the AI. Yet for this issue to be of such fundamental importance to be comparable to AI safety research generally, the implication is of an implausible doctrine of ‘AI immaculate conception’: only by ensuring we ourselves are free from sin can we conceive an AI which will not err in a morally important way.

Re 2): As Plant notes, MCE does not arise from animal causes alone: global poverty, climate change also act to extend moral circles, as well as propagating other valuable moral norms. Looking at things the other way, one should expect the animal causes found most valuable from the perspective of avoiding AI-based dystopia to diverge considerably from those picked on face-value animal welfare. Companion animal causes are far inferior from the latter perspective, but unclear on the former if this a good way of fostering concern for animals; if the crucial thing is for AI-creators not to be speciest over the general population, targeted interventions like ‘Start a petting zoo at Deepmind’ look better than broader ones, like the abolition of factory farming.

The upshot is that, even if there are some particularly high yield interventions in animal welfare from the far future perspective, this should be fairly far removed from typical EAA activity directed towards having the greatest near-term impact on animals. If this post heralds a pivot of Sentience Institute to directions pretty orthogonal to the principal component of effective animal advocacy, this would be welcome indeed.

Notwithstanding the above, the approach outlined above has a role to play in some ideal ‘far future portfolio’, and it may be reasonable for some people to prioritise work on this area, if only for reasons of comparative advantage. Yet I aver it should remain a fairly junior member of this portfolio compared to AI-safety work.

Those considerations make sense. I don't have much more to add for/against than what I said in the post.

On the comparison between different MCE strategies, I'm pretty uncertain which are best. The main reasons I currently favor farmed animal advocacy over your examples (global poverty, environmentalism, and companion animals) are that (1) farmed animal advocacy is far more neglected, (2) farmed animal advocacy is far more similar to potential far future dystopias, mainly just because it involves vast numbers of sentient beings who are largely ignored by most of society. I'm not relatively very worried about, for example, far future dystopias where dog-and-cat-like-beings (e.g. small, entertaining AIs kept around for companionship) are suffering in vast numbers. And environmentalism is typically advocating for non-sentient beings, which I think is quite different than MCE for sentient beings.

I think the better competitors to farmed animal advocacy are advocating broadly for antispeciesism/fundamental rights (e.g. Nonhuman Rights Project) and advocating specifically for digital sentience (e.g. a larger, more sophisticated version of People for the Ethical Treatment of Reinforcement Learners). There are good arguments against these, however, such as that it would be quite difficult for an eager EA to get much traction with a new digital sentience nonprofit. (We considered founding Sentience Institute with a focus on digital sentience. This was a big reason we didn't.) Whereas given the current excitement in the farmed animal space (e.g. the coming release of "clean meat," real meat grown without animal slaughter), the farmed animal space seems like a fantastic place for gaining traction.

I'm currently not very excited about "Start a petting zoo at Deepmind" (or similar direct outreach strategies) because it seems like it would produce a ton of backlash because it seems too adversarial and aggressive. There are additional considerations for/against (e.g. I worry that it'd be difficult to push a niche demographic like AI researchers very far away from the rest of society, at least the rest of their social circles; I also have the same traction concern I have with advocating for digital sentience), but this one just seems quite damning.

The upshot is that, even if there are some particularly high yield interventions in animal welfare from the far future perspective, this should be fairly far removed from typical EAA activity directed towards having the greatest near-term impact on animals. If this post heralds a pivot of Sentience Institute to directions pretty orthogonal to the principal component of effective animal advocacy, this would be welcome indeed.

I agree this is a valid argument, but given the other arguments (e.g. those above), I still think it's usually right for EAAs to focus on farmed animal advocacy, including Sentience Institute at least for the next year or two.

(FYI for readers, Gregory and I also discussed these things before the post was published when he gave feedback on the draft. So our comments might seem a little rehearsed.)

The main reasons I currently favor farmed animal advocacy over your examples (global poverty, environmentalism, and companion animals) are that (1) farmed animal advocacy is far more neglected, (2) farmed animal advocacy is far more similar to potential far future dystopias, mainly just because it involves vast numbers of sentient beings who are largely ignored by most of society.

Wild animal advocacy is far more neglected than farmed animal advocacy, and it involves even larger numbers of sentient beings ignored by most of society. If the superiority of farmed animal advocacy over global poverty along these two dimensions is a sufficient reason for not working on global poverty, why isn't the superiority of wild animal advocacy over farmed animal advocacy along those same dimensions not also a sufficient reason for not working on farmed animal advocacy?

I personally don't think WAS is as similar to the most plausible far future dystopias, so I've been prioritizing it less even over just the past couple of years. I don't expect far future dystopias to involve as much naturogenic (nature-caused) suffering, though of course it's possible (e.g. if humans create large numbers of sentient beings in a simulation, but then let the simulation run on its own for a while, then the simulation could come to be viewed as naturogenic-ish and those attitudes could become more relevant).

I think if one wants something very neglected, digital sentience advocacy is basically across-the-board better than WAS advocacy.

That being said, I'm highly uncertain here and these reasons aren't overwhelming (e.g. WAS advocacy pushes on more than just the "care about naturogenic suffering" lever), so I think WAS advocacy is still, in Gregory's words, an important part of the 'far future portfolio.' And often one can work on it while working on other things, e.g. I think Animal Charity Evaluators' WAS content (e.g. ]guest blog post by Oscar Horta](https://animalcharityevaluators.org/blog/why-the-situation-of-animals-in-the-wild-should-concern-us/)) has helped them be more well-rounded as an organization, and didn't directly trade off with their farmed animal content.

But humanity/AI is likely to expand to other planets. Won't those planets need to have complex ecosystems that could involve a lot of suffering? Or do you think it will all be done with some fancy tech that'll be too different from today's wildlife for it to be relevant? It's true that those ecosystems would (mostly?) be non-naturogenic but I'm not that sure that people would care about them, it'd still be animals/diseases/hunger.etc. hurting animals. Maybe it'd be easier to engineer an ecosystem without predation and diseases but that is a non-trivial assumption and suffering could then arise in other ways.

Also, some humans want to spread life to other planets for its own sake and relatively few people need to want that to cause a lot of suffering if no one works on preventing it.

This could be less relevant if you think that most of the expected value comes from simulations that won't involve ecosystems.

Yes, terraforming is a big way in which close-to-WAS scenarios could arise. I do think it's smaller in expectation than digital environments that develop on their own and thus are close-to-WAS.

I don't think terraforming would be done very differently than today's wildlife, e.g. done without predation and diseases.

Ultimately I still think the digital, not-close-to-WAS scenarios seem much larger in expectation.

Thanks for funding this research. Notes:

  • Ostensibly it seems like much of Sentience Institute's (SI) current research is focused on identifying those MCE strategies which historically have turned out to be more effective among the strategies which have been tried. I think SI as an organization is based on the experience of EA as a movement in having significant success with MCE in a relatively short period of time. Successfully spreading the meme of effective giving; increasing concern for the far future in notable ways; and corporate animal welfare campaigns are all dramatic achievements for a young social movement like EA. While these aren't on the scale of shaping MCE over the course of the far future, these achievements makes it seem more possible EA and allied movements can have an outsized impact by pursuing neglected strategies for values-spreading.

  • On terminology, to say the focus is on non-human animals, or even moral patients which typically come to mind when describing 'animal-like' minds, i.e., familiar vertebrates is inaccurate. "Sentient being", "moral patient" or "non-human agents/beings" are terms which are inclusive of non-human animals, and other types of potential moral patients posited. Admittedly these aren't catchy terms.

In Stuart Russell's Human Compatible (2019), he advocates for AGI to follow preference utilitarianism, maximally satisfying the values of humans. As for animal interests, he seems to think that they are sufficiently represented since he writes that they will be valued by the AI insofar as humans care about them. Reading this from Stuart Russell shifted me toward thinking that moral circle expansion probably does matter for the long-term future. It seems quite plausible (likely?) that AGI will follow this kind of value function which does not directly care about animals rather than broadly anti-speciesist values, since AI researchers are not generally anti-speciesists. In this case, moral circle expansion across the general population would be essential.

(Another factor is that Russell's reward modeling depends on receiving feedback occasionally from humans to learn their preferences, which is much more difficult to do with animals. Thus, under an approach similar to reward modeling, AGI developers probably won't bother to directly include animal preferences, when that involves all the extra work of figuring out how to get the AI to discern animal preferences. And how many AI researchers want to risk, say, mosquito interests overwhelming human interests?)

In comparison, if an AGI was planned to only care about the interests of people in, say, Western countries, that would instantly be widely decried as racist (at least in today's Western societies) and likely not be developed. So while moral circle expansion encompasses caring about people in other countries, I'm less concerned that large groups of humans will not have their interests represented in the AGI's values than I am about nonhuman animals.

It may be more cost-effective to have targeted approach of increasing anti-speciesism among AI researchers and doing anti-speciesist AI alignment philosophy/research (e.g., more details on how AI following preference utilitarianism can also intrinsically care about animal preferences, accounting for preferences of digital sentience given the problem that they can easily replicate and dominate preference calculations), but anti-speciesism among the general population still seems to be an important component of reducing risk of having a bad far future.

AI designers, even if speciesist themselves, might nonetheless provide the right apparatus for value learning such that resulting AI will not propagate the moral mistakes of its creators

This is something I also struggle with in understanding the post. it seems like we need:

  1. AI creators can be convinced to expand their moral circle
  2. Despite (1), they do not wish to be convinced to expand their moral circle
  3. The AI follows this second desire to not be convinced to expand their moral circle

I imagine this happening with certain religious things; e.g. I could imagine someone saying "I wish to think the Bible is true even if I could be convinced that the Bible is false".

But it seems relatively implausible with regards to MCE?

Particularly given that AI safety talks a lot about things like CEV, it is unclear to me whether there is really a strong trade-off between MCE and AIA.

(Note: Jacy and I discussed this via email and didn't really come to a consensus, so there's a good chance I am just misunderstanding his argument.)

Hm, yeah, I don't think I fully understand you here either, and this seems somewhat different than what we discussed via email.

My concern is with (2) in your list. "[T]hey do not wish to be convinced to expand their moral circle" is extremely ambiguous to me. Presumably you mean they -- without MCE advocacy being done -- wouldn't put in wide-MC* values or values that lead to wide-MC into an aligned AI. But I think it's being conflated with, "they actively oppose" or "they would answer 'no' if asked, 'Do you think your values are wrong when it comes to which moral beings deserve moral consideration?'"

I think they don't actively oppose it, they would mostly answer "no" to that question, and it's very uncertain if they will put the wide-MC-leading values into an aligned AI. I don't think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).

This leads me to think that you only need (2) to be true in a very weak sense for MCE to matter. I think it's quite plausible that this is the case.

*Wide-MC meaning an extremely wide moral circle, e.g. includes insects, small/weird digital minds.

I don't think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).

Why do you think this is the case? Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like?

If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasn't dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.

I think that there's an inevitable tradeoff between wanting a reflection process to have certain properties and worries about this violating goal preservation for at least some people. This blogpost is not about MCE directly, but if you think of "BAAN thought experiment" as "we do moral reflection and the outcome is such a wide circle that most people think it is extremely counterintuitive" then the reasoning in large parts of the blogpost should apply perfectly to the discussion here.

That is not to say that trying to fine tune reflection processes is pointless: I think it's very important to think about what our desiderata should be for a CEV-like reflection process. I'm just saying that there will be tradeoffs between certain commonly mentioned desiderata that people don't realize are there because they think there is such a thing as "genuinely free and open-ended deliberation."

Thanks for commenting, Lukas. I think Lukas, Brian Tomasik, and others affiliated with FRI have thought more about this, and I basically defer to their views here, especially because I haven't heard any reasonable people disagree with this particular point. Namely, I agree with Lukas that there seems to be an inevitable tradeoff here.

I tend to think of moral values as being pretty contingent and pretty arbitrary, such that what values you start with makes a big difference to what values you end up with even on reflection. People may "imprint" on the values they receive from their culture to a greater or lesser degree.

I'm also skeptical that sophisticated philosophical-type reflection will have significant influence over posthuman values compared with more ordinary political/economic forces. I suppose philosophers have sometimes had big influences on human politics (religions, Marxism, the Enlightenment), though not necessarily in a clean "carefully consider lots of philosophical arguments and pick the best ones" kind of way.

I'd qualify this by adding that the philosophical-type reflection seems to lead in expectation to more moral value (positive or negative, e.g. hedonium or dolorium) than other forces, despite overall having less influence than those other forces.

I thought this piece was good. I agree that MCE work is likely quite high impact - perhaps around the same level as X-risk work - and that it has been generally ignored by EAs. I also agree that it would be good for there to be more MCE work going forward. Here's my 2 cents:

You seem to be saying that AIA is a technical problem and MCE is a social problem. While I think there is something to this, I think there are very important technical and social sides to both of these. Much of the work related to AIA so far has been about raising awareness about the problem (eg the book Superintelligence), and this is more a social solution than a technical one. Also, avoiding a technological race for AGI seems important for AIA, and this also is more a social problem than a technical one.

For MCE, the 2 best things I can imagine (that I think are plausible) are both technical in nature. First, I expect clean meat will lead to the moral circle expanding more to animals. I really don't see any vegan social movement succeeding in ending factory farming anywhere near as much as I expect clean meat to. Second, I'd imagine that a mature science of consciousness would increase MCE significantly. Many people don't think animals are conscious, and almost no one thinks anything besides animals can be conscious. How would we even know if an AI was conscious, and if so, if it was experiencing joy or suffering? The only way would be if we develop theories of consciousness that we have high confidence in. But right now we're very limited in studying consciousness, because our tools at interfacing with the brain are crude. Advanced neurotechnologies could change that - they could allow us to potentially test hypotheses about consciousness. Again, developing these technologies would be a technical problem.

Of course, these are just the first ideas that come into my mind, and there very well may be social solutions that could do more than the technical solutions I mentioned, but I don't think we should rule out the potential role of technical solutions, either.

Thanks for the comment! A few of my thoughts on this:

Presumably we want some people working on both of these problems, some people have skills more suited to one than the other, and some people are just going to be more passionate about one than the other.

If one is convinced non-extinction civilization is net positive, this seems true and important. Sorry if I framed the post too much as one or the other for the whole community.

Much of the work related to AIA so far has been about raising awareness about the problem (eg the book Superintelligence), and this is more a social solution than a technical one.

Maybe. My impression from people working on AIA is that they see it as mostly technical, and indeed they think much of the social work has been net negative. Perhaps not Superintelligence, but at least the work that's been done to get media coverage and widespread attention without the technical attention to detail of Bostrom's book.

I think the more important social work (from a pro-AIA perspective) is about convincing AI decision-makers to use the technical results of AIA research, but my impression is that AIA proponents still think getting those technical results is probably the more important projects.

There's also social work in coordinating the AIA community.

First, I expect clean meat will lead to the moral circle expanding more to animals. I really don't see any vegan social movement succeeding in ending factory farming anywhere near as much as I expect clean meat to.

Sure, though one big issue with technology is that it seems like we can do far less to steer its direction than we can do with social change. Clean meat tech research probably just helps us get clean meat sooner instead of making the tech progress happen when it wouldn't otherwise. The direction of the far future (e.g. whether clean meat is ever adopted, whether the moral circle expands to artificial sentience) probably matters a lot more than the speed at which it arrives.

Of course, this gets very complicated very quickly, as we consider things like value lock-in. Sentience Institute has a bit of basic sketching on the topic on this page.

Second, I'd imagine that a mature science of consciousness would increase MCE significantly. Many people don't think animals are conscious, and almost no one thinks anything besides animals can be conscious

I disagree that "many people don't think animals are conscious." I almost exclusively hear that view in from the rationalist/LessWrong community. A recent survey suggested that 87.3% of US adults agree with the statement, "Farmed animals have roughly the same ability to feel pain and discomfort as humans," and presumably even more think they have at least some ability.

Advanced neurotechnologies could change that - they could allow us to potentially test hypotheses about consciousness.

I'm fairly skeptical of this personally, partly because I don't think there's a fact of the matter when it comes to whether a being is conscious. I think Brian Tomasik has written eloquently on this. (I know this is an unfortunate view for an animal advocate like me, but it seems to have the best evidence favoring it.)

I'm fairly skeptical of this personally, partly because I don't think there's a fact of the matter when it comes to whether a being is conscious.

I would guess that increasing understanding of cognitive science would generally increase people's moral circles if only because people would think more about these kinds of questions. Of course, understanding cognitive science is no guarantee that you'll conclude that animals matter, as we can see from people like Dennett, Yudkowsky, Peter Carruthers, etc.

Second, I'd imagine that a mature science of consciousness would increase MCE significantly. Many people don't think animals are conscious, and almost no one thinks anything besides animals can be conscious. How would we even know if an AI was conscious, and if so, if it was experiencing joy or suffering? The only way would be if we develop theories of consciousness that we have high confidence in. But right now we're very limited in studying consciousness, because our tools at interfacing with the brain are crude. Advanced neurotechnologies could change that - they could allow us to potentially test hypotheses about consciousness. Again, developing these technologies would be a technical problem.

I think that's right. Specifically, I would advocate consciousness research as a foundation for principled moral circle expansion. I.e., if we do consciousness research correctly, the equations themselves will tell us how conscious insects are, whether algorithms can suffer, how much moral weight we should give animals, and so on.

On the other hand, if there is no fact of the matter as to what is conscious, we're headed toward a very weird, very contentious future of conflicting/incompatible moral circles, with no 'ground truth' or shared principles to arbitrate disputes.

Edit: I'd also like to thank Jacy for posting this- I find it a notable contribution to the space, and clearly a product of a lot of hard work and deep thought.

Thanks for writing this, I thought it was a good article. And thanks to Greg for funding it.

My pushback would be on the cooperation and coordination point. It seems that a lot of other people, with other moral values, could make a very similar argument: that they need to promote their values now, as the stakes as very high with possible upcoming value lock-in. To people with those values, these arguments should seem roughly as important as the above argument is to you.

  • Christians could argue that, if the singularity is approaching, it is vitally important that we ensure the universe won't be filled with sinners who will go to hell.
  • Egalitarians could argue that, if the singularity is approaching, it is vitally important that we ensure the universe won't be filled with wider and wider diversities of wealth.
  • Libertarians could argue that, if the singularity is approaching, it is vitally important that we ensure the universe won't be filled with property rights violations.
  • Naturalists could argue that, if the singularity is approaching, it is vitally important that we ensure the beauty of nature won't be bespoiled all over the universe.
  • Nationalists could argue that, if the singularity is approaching, it is vitally important that we ensure the universe will be filled with people who respect the flag.

But it seems that it would be very bad if everyone took this advice literally. We would all end up spending a lot of time and effort on propaganda, which would probably be great for advertising companies but not much else, as so much of it is zero sum. Even though it might make sense, by their values, for expanding-moral-circle people and pro-abortion people to have a big propaganda war over whether foetuses deserve moral consideration, it seems plausible we'd be better off if they both decided to spend the money on anti-malaria bednets.

In contrast, preventing the extinction of humanity seems to occupy a privileged position - not exactly comparable with the above agendas, though I can't exactly cache out why it seems this way to me. Perhaps to devout Confucians a pre-occupation with preventing extinction seems to be just another distraction from the important task of expressing filial piety – though I doubt this.

(Moral Realists, of course, could argue that the situation is not really symmetric, because promoting the true values is distinctly different from promoting any other values.)

Yeah, I think that's basically right. I think moral circle expansion (MCE) is closer to your list items than extinction risk reduction (ERR) is because MCE mostly competes in the values space, while ERR mostly competes in the technology space.

However, MCE is competing in a narrower space than just values. It's in the MC space, which is just the space of advocacy on what our moral circle should look like. So I think it's fairly distinct from the list items in that sense, though you could still say they're in the same space because all advocacy competes for news coverage, ad buys, recruiting advocacy-oriented people, etc. (Technology projects could also compete for these things, though there are separations, e.g. journalists with a social beat versus journalists with a tech beat.)

I think the comparably narrow space of ERR is ER, which also includes people who don't want extinction risk reduced (or even want it increased), such as some hardcore environmentalists, antinatalists, and negative utilitarians.

I think these are legitimate cooperation/coordination perspectives, and it's not really clear to me how they add up. But in general, I think this matters mostly in situations where you actually can coordinate. For example, in the US general election when Democrats and Republicans come together and agree not to give to their respective campaigns (in exchange for their counterpart also not doing so). Or if there were anti-MCE EAs with whom MCE EAs could coordinate (which I think is basically what you're saying with "we'd be better off if they both decided to spend the money on anti-malaria bednets").

As EA as a movement has grown so far, the community appears to converge upon a rationalization process whereby most of us have realized what is centrally morally important is the experiences of well-being of a relatively wide breadth of moral patients, and the relatively equal moral weight assigned to well-being of each moral patient. The difference between SI and those who focus on AIA is primarily their differing estimates of the expected value of far-future in terms of average or total well-being. Among the examples you provided, it seems some worldviews are more amenable to the rationalization process which lends itself to consequentialism and EA. Many community members were egalitarians and libertarians who find common cause now in trying to figure out if to focus on AIA or MCE. I think your point is important in that ultimately advocating for this type of values spreading could be bad. However what appears to be an extreme amount of diversity could end up looking less fraught in a competition among values as divergent worldviews converge on similar goals.

Since different types of worldviews, like any amenable to aggregate consequentialist frameworks, can collate around a single goal of something like MCE. The relevance of your point, then, would hinge upon how universal MCE really is or can be across worldviews, relative to other types of values, such that it wouldn't clash with many worldviews in a values-spreading contest. This is a matter of debate I haven't thought of. It seems an important way to frame solutions to the challenge to Jacy's point you raise.

But it seems that it would be very bad if everyone took this advice literally.

Fortunately, not everyone does take this advice literally :).

This is very similar to the tragedy of the commons. If everyone acts out of their own self motivated interests, then everyone will be worse off. However, the situation as you described does not fully reflect reality because none of the groups you mentioned are actually trying to influence AI researchers at the moment. Therefore, MCE has a decisive advantage. Of course, this is always subject to change.

In contrast, preventing the extinction of humanity seems to occupy a privileged position

I find that it is often the case that people will dismiss any specific moral recommendation for AI except this one. Personally I don't see a reason to think that there are certain universal principles of minimal alignment. You may argue that human extinction is something that almost everyone agrees is bad -- but now the principle of minimal alignment has shifted to "have the AI prevent things that almost everyone agrees is bad" which is another privileged moral judgement that I see no intrinsic reason to hold.

In truth, I see no neutral assumptions to ground AI alignment theory in. I think this is made even more difficult because even relatively small differences in moral theory from the point of view of information theoretic descriptions of moral values can lead to drastically different outcomes. However, I do find hope in moral compromise.

Thanks for this post. Some scattered thoughts:

The main risk for AIA seems to be that the technical research done to better understand how to build an aligned AI will increase AI capabilities generally, meaning it’s also easier for humanity to produce an unaligned AI.

This doesn't seem like a big consideration to me. Even if unfriendly AI comes sooner by an entire decade, this matters little on a cosmic timescale. An argument I find more compelling: If we plot the expected utility of an AGI as a function of the amount of effort put into aligning it, there might be a "valley of bad alignment" that is worse than no attempt at alignment at all. (A paperclip maximizer will quickly kill us and not generate much long-term suffering, whereas an AI that understands the importance of human survival but doesn't understand any other values will imprison us for all eternity. Something like that.)

I'd like to know more about why people think that our moral circles have expanded. I suspect activism plays a smaller role than you think. Steven Pinker talks about possible reasons for declining violence in his book The Better Angels of Our Nature. I'm guessing this is highly related to moral circle expansion.

One theory I haven't seen elsewhere is that self-interest plays a big role in moral circle expansion. Consider the example of slavery. The BBC writes:

It becomes clear that humanitarianism and imperial muscling were able bedfellows...

One can be certain that the high ideals of abolition and the promotion of legitimate trade were equally matched by economic and territorial ambitions, impulses which brought forward partition and colonial rule in Africa in the late 19th century.

You'll note that the villains of the slave story are the slavers--people with an interest in slavery. The heroes seem to have been Britons who would not lose much if slavery was outlawed (though I guess boycotting sugar would go against their self-interest?) Similarly, I think I remember reading that poor northern whites were motivated to fight in the US Civil War because they were worried their labor would be displaced by slave labor.

According to this story, the expanding circle is a side effect of the world growing wealthier. As lower levels of Maslow's hierarchy are met, people care more about humanitarian issues. (I'm assuming that genetic relatedness predicts where on the hierarchy another being falls.) Conquest is less common now because it's more profitable to control a multinational company than control lots of territory. Slavery is less common because unskilled laborers are less of an asset & more of a liability, and it's hard to coerce skilled labor. Violence has declined because sub-replacement fertility means we're no longer in a zero-sum competition for resources. (Note that the bloodiest war in recent memory happened in the Democratic Republic of Congo, a country where women average six children each--source. Congo has a lot of mineral wealth, which seems to incentivize conflict. Probably this wealth doesn't diminish as much in the presence of conflict as much as e.g. manufacturing wealth would.)

I suppose a quick test for the Maslow's hierarchy story is to check whether wealthy people are more likely to be vegan (controlling for meat calories being as expensive as non-meat calories).

I don't think everyone is completely self-interested all the time, but I think people are self-interested enough that it makes sense for activists to apply leverage strategically.

Re: a computer program used to mine asteroids, I'd expect certain AI alignment work to be useful here. If we understand AI algorithms more deeply, an asteroid miner can be simpler and less likely sentient. Contrast with the scenario where AI progress is slow, brain emulations come before AGI, and the asteroid miner is piloted using an emulation of someone's brain.

I'm not comfortable relying on innate human goodness to deal with moral dilemmas. I'd rather eliminate incentives for immoral behavior. In the presence of bad incentives, I worry about activism backfiring as people come up with rationalizations for their immoral behavior. See e.g. biblical justifications for slavery in the antebellum south. Instead of seeing the EA movement as something that will sweep the globe and make everyone altruistic, I'm more inclined to see it as a team of special forces working to adjust the incentives that everyone else operates under in order to create good outcomes as a side effect of everyone else working towards their incentives.

Another moral circle expansion story involves improved hygiene. See also.

Singer and Pinker talk a lot about the importance of reason and empathy to the expanding moral circle. This might be achieved through better online discussion platforms, widespread adoption of meditation, etc.

Anyway, I think that if we take a broad view of moral circle expansion, the best way to achieve it might be some unexpected thing: improving the happiness of voters who control nuclear weapons, helping workers deal with technological job displacement, and so on. IMO, more EAs should work on world peace.

This post is extremely valuable - thank you! You have caused me to reexamine my views about the expected value of the far future.

What do you think are the best levers for expanding the moral circle, besides donating to SI? Is there anything else outside of conventional EAA?

Thanks! That's very kind of you.

I'm pretty uncertain about the best levers, and I think research can help a lot with that. Tentatively, I do think that MCE ends up aligning fairly well with conventional EAA (perhaps it should be unsurprising that the most important levers to push on for near-term values are also most important for long-term values, though it depends on how narrowly you're drawing the lines).

A few exceptions to that:

  • Digital sentience probably matters the most in the long run. There are good reasons to be skeptical we should be advocating for this now (e.g. it's quite outside of the mainstream so it might be hard to actually get attention and change minds; it'd probably be hard to get funding for this sort of advocacy (indeed that's one big reason SI started with farmed animal advocacy)), but I'm pretty compelled by the general claim, "If you think X value is what matters most in the long-term, your default approach should be working on X directly." Advocating for digital sentience is of course neglected territory, but Sentience Institute, the Nonhuman Rights Project, and Animal Ethics have all worked on it. People for the Ethical Treatment of Reinforcement Learners has been the only dedicated organization AFAIK, and I'm not sure what their status is or if they've ever paid full-time or part-time staff.

  • I think views on value lock-in matter a lot because of how they affect food tech (e.g. supporting The Good Food Institute). I place significant weight on this and a few other things (see this section of an SI page) that make me think GFI is actually a pretty good bet, despite my concern that technology progresses monotonically.

  • Because what might matter most is society's general concern for weird/small minds, we should be more sympathetic to indirect antispeciesism work like that done by Animal Ethics and the fundamental rights work of the Nonhuman Rights Project. From a near-term perspective, I don't think these look very good because I don't think we'll see fundamental rights be a big reducer of factory farm suffering.

  • This is a less-refined view of mine, but I'm less focused than I used to be on wild animal suffering. It just seems to cost a lot of weirdness points, and naturogenic suffering doesn't seem nearly as important as anthropogenic suffering in the far future. Factory farm suffering seems a lot more similar to far future dystopias than does wild animal suffering, despite WAS dominating utility calculations for the next, say, 50 years.

I could talk more about this if you'd like, especially if you're facing specific decisions like where exactly to donate in 2018 or what sort of job you're looking for with your skillset.

I thought this was very interesting, thanks for writing up. Two comments

  1. It was useful to have a list of reasons why you think the EV of the future could be around zero, but it still found it quite vague/hard to imagine - why exactly would more powerful minds be mistreating less powerful minds? etc. - so I'd would have liked to see that sketched in slightly more depth.

  2. It's not obvious to me it's correct/charitable to draw the neglectedness of MCE so narrowly. Can't we conceive of a huge ammount of moral philosophy, and well as social activism, both new and old, as MCE? Isn't all EA outreach an indirect form of MCE?

I'm sympathetic to both of those points personally.

1) I considered that, and in addition to time constraints, I know others haven't written on this because there's a big concern of talking about it making it more likely to happen. I err more towards sharing it despite this concern, but I'm pretty uncertain. Even the detail of this post was more than several people wanted me to include.

But mostly, I'm just limited on time.

2) That's reasonable. I think all of these boundaries are fairly arbitrary; we just need to try to use the same standards across cause areas, e.g. considering only work with this as its explicit focus. Theoretically, since Neglectedness is basically just a heuristic to estimate how much low-hanging fruit there is, we're aiming at "The space of work that might take such low-hanging fruit away." In this sense, Neglectedness could vary widely. E.g. there's limited room for advocating (e.g. passing out leaflets, giving lectures) directly to AI researchers, but this isn't affected much by advocacy towards the general population.

I do think moral philosophy that leads to expanding moral circles (e.g. writing papers supportive of utiltiarianism), moral-circle-focused social activism (e.g. anti-racism, not as much something like campaigning for increased arts funding that seems fairly orthogonal to MCE), and EA outreach (in the sense that the A of EA means a wide moral circle) are MCE in the broadest somewhat-useful definition.

Caspar's blog post is a pretty good read on the nuances of defining/utilizing Neglectedness.

I think there’s a significant[8] chance that the moral circle will fail to expand to reach all sentient beings, such as artificial/small/weird minds (e.g. a sophisticated computer program used to mine asteroids, but one that doesn’t have the normal features of sentient minds like facial expressions). In other words, I think there’s a significant chance that powerful beings in the far future will have low willingness to pay for the welfare of many of the small/weird minds in the future.[9]

I think it’s likely that the powerful beings in the far future (analogous to humans as the powerful beings on Earth in 2018) will use large numbers of less powerful sentient beings

So I'm curious for your thoughts. I see this concern about "incidental suffering of worker-agents" stated frequently, which may be likely in many future scenarios. However, it doesn't seem to be a crucial consideration, specifically because I care about small/weird minds with non-complex experiences (your first consideration).

Caring about small minds seems to imply that "Opportunity Cost/Lost Risks" are the dominate consideration - if small minds have moral value comparable to large minds, then the largest-EV risk is not optimizing for small minds and wasting resources thrown at large minds with complex/expensive experiences (or thrown at something even less efficient, like biological beings, any non-total-consequentialist view, etc). This would you lose you many orders of magnitude of optimized happiness, and this loss would be worse than the other scenarios' aggregate incidental suffering. Even if this inefficient moral position merely reduced optimized happiness by 10% - far less than an order of magnitude - this would dominate incidental suffering, even if the incidental suffering scenarios were significantly more probable. And even if you very heavily weight suffering compared to happiness, my math still suggests this conclusion survives by a significant margin).

Also note that Moral Circle Expansion is relevant conditional on solving the alignment problem, so we're in the set of worlds where the alignment problem was actually solved in some way (humanity's values are somewhat intact). So, the risk is that whatever-we're-optimizing-the-future-for is far less efficient than ideal hedonium could have been, because we're wasting it on complex minds, experiences that require lots of material input, or other not-efficiently-value-creating things. "Oh, what might have been", etc. Note this still says values spreading might be very important, but I think this version has a slightly different flavor that implies somewhat different actions. Thoughts?

A very interesting and engaging article indeed.

I agree that people often underestimate the value of strategic value spreading. Oftentimes, proposed moral models that AI agents will follow have some lingering narrowness to them, even when they attempt to apply the broadest of moral principles. For instance, in Chapter 14 of Superintelligence, Bostrom highlights his common good principle:

Superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals.

Clearly, even something as broad as that can be controversial. Specifically, it doesn't speak at all about any non-human interests except insofar as humans express widely held beliefs to protect them.

I think one thing to add is that AIA researchers who hold more traditional moral beliefs (as opposed to wide moral circles and transhumanist beliefs) are probably less likely to believe that moral value spreading is worth much. The reason for this is obvious: if everyone around you holds, more or less, the same values that you do, then why change anyone's mind? This may explain why many people dismiss the activity you proposed.

I think one thing to add is that AIA researchers who hold more traditional moral beliefs (as opposed to wide moral circles and transhumanist beliefs) are probably less likely to believe that moral value spreading is worth much.

Historically it doesn't seem to be true. As AIA becomes more mainstream, it'll be attracting a wider diversity of people, which may induce a form of common grounding and normalization of the values in the community. We should be looking for opportunities to collect data on this in the future to see how attitudes within AIA change. Of course this could lead to attempts to directly influence the proportionate representation of different values within EA. That'd be prone to all the hazards of an internal tug of war pointed out in other comments on this post. Because the vast majority of the EA movement focused on the impact of advanced AI on the far future are relatively coordinated and with sufficiently similar goals there isn't much risk of internal fraction in the near future. I think organizations from MIRI to FRI are also averse to growing AIA in ways which drive the trajectory of the field away from what EA currently values.

Next to the counterpoints mentioned by Gregory Lewis, I think there is an additional reason why MCE seems less effective than more targeted interventions to improve the quality of the long-term future: Gains from trade between humans with different values become easier to implement as the reach of technology increases. As long as a non-trivial fraction of humans end up caring about animal wellbeing or digital minds, it seems likely it would be cheap for other coalitions to offer trades. So whether 10% of future people end up with an expanded moral circle or 100% may not make much of a difference to the outcome: It will be reasonably good either way if people reap the gains from trade.

One might object that it is unlikely that humans would be able to cooperate efficiently, given that we don't see this type of cooperation happening today. However, I think it's reasonable to assume that staying in control of technological progress beyond the AGI transition requires a degree of wisdom and foresight that is very far away from where most societal groups are at today. And if humans do stay in control, then finding a good solution for value disagreements may be the easier problem, or at worst similarly hard. So it feels to me that most likely, we either we get a future that goes badly for reasons related to lack of coordination and sophistication in the pre-AGI stage, or we get a future where humans set things up wisely enough to actually design an outcome that is nice (or at least not amongst the 10% of worst outcomes) by the lights of nearly everyone.

Brian Tomasik made the point that conditional on human values staying in control, we may be very unlikely to get something like broad moral reflection. Instead, values could be determined by a very small group of individuals who happened to be in power by the time AGI arrives (as opposed to individuals ending up there because they were unusually foresighted and also morally motivated). This feels possible too, but it seems to not be the likely default to me because I suspect that you'd need to necessarily increase your philosophical sophistication in order to stay in control of AGI, and that probably gives you more pleasant outcomes (correlational claim). Iterated amplification for instance, as an approach to AI alignment, has several uses for humans: Humans are not only where the resulting values come from, but they're also in charge of keeping the bootstrapping process on track and corrigible. And as this post on factored cognition illustrates, this requires sophistication to set up. So if that's the bar that AGI creators need to pass before they can determine how "human values" are to be extrapolated, maybe we shouldn't be too pessimistic about the outcome. It seems kind of unlikely that someone would go through all of that only to be like "I'm going to implement my personal best guess about what matters to me, with little further reflection, and no other humans get a say here." Similarly, it also feels unlikely that people would go through with all that and not find a way to make subparts of the population reasonably content about how sentient subroutines are going to be used.

Now, I feel a bit confused about the feasibility of AI alignment if you were to do it somewhat sloppily and with lower standards. I think that there's a spectrum from "it just wouldn't work at all and not be competitive" (and then people would have to try some other approach) to "it would produce a capable AGI but it would be vulnerable to failure modes like adversarial exploits or optimization daemons, and so it would end up with not human values". These failure modes, to the very small degree I currently understand them, sound like they would not be sensitive to whether the human whose approval you tried to approximate had an expanded moral circle or not. I might be wrong about that. If people mostly want sophisticated alignment procedures because they care about preserving the option for philosophical reflection, rather than because they also think that you simply run into large failure modes otherwise, then it seems like (conditional on some kind of value alignment) whether we get an outcome with broad moral reflection is not so clear. If it's technically easier to build value-aligned AI with very parochial values, then MCE could make a relevant difference to these non-reflection outcomes.

But all in all my argument is that it's somewhat strange to assume that a group of people could succeed at building an AGI optimized for its creators' values, without having to put in so much thinking about how to get this outcome right that they'd almost can't help but become reasonably philosophically sophisticated in the process. And sure, philosophically sophisticated people can still have fairly strange values by your own lights, but it seems like there's more convergence. Plus I'd at least be optimistic about their propensity to strive towards positive-sum outcomes, given how little scarcity you'd have if the transition does go well.

Of course, maybe value-alignment is going to work very differently from what people currently think. The main way I'd criticize my above points is that they're based on heavy-handed inside-view thinking about how difficult I (and others I'm updating towards) expect the AGI transition to be. If AGI will be more like the Industrial Revolution rather than something that is even more difficult to stay remotely in control of, or if some other technology proves to be more consequential than AGI, then my argument has less force. I mainly see this as yet another reason to caveat that the ex ante plausible-seeming position that MCE can have a strong impact on AGI outcomes starts to feel more and more conjunctive the more you zoom in and try to identify concrete pathways.

Interesting points. :) I think there could be substantial differences in policy between 10% support and 100% support for MCE depending on the costs of appeasing this faction and how passionate it is. Or between 1% and 10% support for MCE applied to more fringe entities.

philosophically sophisticated people can still have fairly strange values by your own lights, but it seems like there's more convergence.

I'm not sure if sophistication increases convergence. :) If anything, people who think more about philosophy tend to diverge more and more from commonsense moral assumptions.

Yudkowsky and I seem to share the same metaphysics of consciousness and have both thought about the topic in depth, yet we occupy almost antipodal positions on the question of how many entities we consider moral patients. I tend to assume that one's starting points matter a lot for what views one ends up with.

I agree with this. It seems like the world where Moral Circle Expansion is useful is the world where:

The creators of AI are philosophically sophisticated (or persuadable) enough to expand their moral circle if they are exposed to the right arguments or work is put into persuading them.

They are not philosophically sophisticated enough to realize the arguments for expanding the moral circle on their own (seems plausible).

They are not philosophically sophisticated enough to realize that they might want to consider a distribution of arguments that they could have faced and could have persuaded them about what is morally right, and design AI with this in mind (ie CEV), or with the goal of achieving a period of reflection where they can sort out the sort of arguments that they would want to consider.

I think I'd prefer pushing on point 3, as it also encompasses a bunch of other potential philosophical mistakes that AI creators could make.

On this topic, I similarly do still believe there’s a higher likelihood of creating hedonium; I just have more skepticism about it than I think is often assumed by EAs.

This is the main reason I think the far future is high EV. I think we should be focusing on p(Hedonium) and p(Delorium) more than anything else. I'm skeptical that, from a hedonistic utilitarian perspective, byproducts of civilization could come close to matching the expected value from deliberately tiling the universe (potentially multiverse) with consciousness optimized for pleasure or pain. If p(H)>p(D), the future of humanity is very likely positive EV.

My current position is that the amount of pleasure/suffering that conscious entities will experience in a far-future technological civilization will not be well-defined. Some arguments for this:

  1. Generally utility functions or reward functions are invariant under affine transformations (with suitable rescaling for the learning rate for reward functions). Therefore they cannot be compared between different intelligent agents as a measure of pleasure.

  2. The clean separation of our civilization into many different individuals is an artifact of how evolution operates. I don't expect far future civilization to have a similar division of its internal processes into agents. Therefore the method of counting conscious entities with different levels of pleasure is inapplicable.

  3. Theoretical computer science gives many ways to embed one computational process within another so that it is unclear whether or how many times the inner process "occurs", such as running identical copies of the same program, using a quantum computer to run the same program with many inputs in superposition, and homomorphic encryption. Similar methods we don't know about will likely be discovered in the future.

  4. Our notions of pleasure and suffering are mostly defined extensionally with examples from the present and the past. I see no reason that such an extensionally-derived concept to have a natural definition that applies to extremely different situations. Uncharitably, it seems like the main reason people assume this is a sort of wishful thinking due to their normal moral reasoning breaking down if they allow pleasure/suffering to be undefined.

I'm currently uncertain about how to make decisions relating to the far future in light of the above arguments. My current favorite position is to try to understand the far future well enough until I find something I have strong moral intuitions about.

Possibly the biggest unknown in ethics is whether bits matter, or whether atoms matter.

If you assume bits matter, then I think this naturally leads into a concept cluster where speaking about utility functions, preference satisfaction, complexity of value, etc, makes sense. You also get a lot of weird unresolved thought-experiments like homomorphic encryption.

If you assume atoms matter, I think this subtly but unavoidably leads to a very different concept cluster-- qualia turns out to be a natural kind instead of a leaky reification, for instance. Talking about the 'unity of value thesis' makes more sense than talking about the 'complexity of value thesis'.

TL;DR: I think you're right that if we assume computationalism/functionalism is true, then pleasure and suffering are inherently ill-defined, not crisp. They do seem well-definable if we assume physicalism is true, though.

Thanks for reminding me that I was implicitly assuming computationalism. Nonetheless, I don't think physicalism substantially affects the situation. My arguments #2 and #4 stand unaffected; you have not backed up your claim that qualia is a natural kind under physicalism. While it's true that physicalism gives clear answers for the value of two identical systems or a system simulated with homomorphic encryption, it may still be possible to have quantum computations involving physically instantiated conscious beings, by isolating the physical environment of this being and running the CPT reversal of this physical system after an output has been extracted to maintain coherence. Finally, physicalism adds its own questions, namely, given a bunch of physical systems that all appear to have behavior that appears to be conscious, which ones are actually conscious and which are not. If I understood you correctly, physicalism as a statement about consciousness is primary a negative statement, "the computational behavior of a system is not sufficient to determine what sort of conscious activity occurs there", which doesn't by itself tell you what sort of conscious activity occurs.

It seems to me your #2 and #4 still imply computationalism and/or are speaking about a straw man version of physicalism. Different physical theories will address your CPT reversal objection differently, but it seems pretty trivial to me.

If I understood you correctly, physicalism as a statement about consciousness is primary a negative statement, "the computational behavior of a system is not sufficient to determine what sort of conscious activity occurs there", which doesn't by itself tell you what sort of conscious activity occurs.

I would generally agree, but would personally phrase this differently; rather, as noted here, there is no objective fact-of-the-matter as to what the 'computational behavior' of a system is. I.e., no way to objectively derive what computations a physical system is performing. In terms of a positive statement about physicalism & qualia, I'm assuming something on the order of dual-aspect monism / neutral monism. And yes insofar as a formal theory of consciousness which has broad predictive power would depart from folk intuition, I'd definitely go with the formal theory.

Thanks for the link. I didn't think to look at what other posts you have published and now I understand your position better.

As I now see it, there two critical questions for distinguishing the different positions on the table:

  1. Does our intuitive notion of pleasure/suffering have objective precisely defined fundamental concept underlying it?
  2. In practice, is it a useful approach to look for computational structures exhibiting pleasure/suffering in the distant future as a means to judge possible outcomes?

Brian Tomasik answers these questions "No/Yes", and a supporter of the Sentience Institute would probably answer "Yes" to the second question. Your answers are "Yes/No", and so you prefer to work on finding the underlying theory for pleasure/suffering. My answers are "No/No", and am at a loss.

I see two reasons why a person might think that pleasure/pain of conscious entities is a solid enough concept to answer "Yes" to either of these questions (not counting conservative opinions over what futures are possible for question 2). The first is a confusion caused by subtle implicit assumptions in the way we talk about consciousness, which makes a sort of conscious experience from which includes in it pleasure and pain seem more ontologically basic than it really is. I won't elaborate on this in this comment, but for now you can round me as an eliminativist.

The second is what I was calling "a sort wishful thinking" in argument #4: These people have moral intuitions that tell them to care about others' pleasure and pain, which implies not fooling themselves about how much pleasure and pain others experience. On the other hand, there are many situations where their intuition does not give them a clear answer, but also tells them that picking an answer arbitrarily is like fooling themselves. They resolve this tension by telling themselves, "there is a 'correct answer' to this dilemma, but I don't know what it is. I should act to best approximate this 'correct answer' with the information I have." People then treat these "correct answers" like other things they are ignorant about, and in particular imagine that a scientific theory might be able to answer these questions in the same way science answered other things we used to be ignorant about.

However, this expectation infers something external, the existence of a certain kind of scientific theory, from evidence that is internal, their own cognitive tensions. This seems fallacious to me.

Thanks, this is helpful. My general position on your two questions is indeed "Yes/No".

The question of 'what are reality's natural kinds?' is admittedly complex and there's always room for skepticism. That said, I'd suggest the following alternatives to your framing:

  • Whether the existence of qualia itself is 'crisp' seems prior to whether pain/pleasure are. I call this the 'real problem' of consciousness.

  • I'm generally a little uneasy with discussing pain/pleasure in technically precise contexts- I prefer 'emotional valence'.

  • Another reframe to consider is to disregard talk about pain/pleasure, and instead focus on whether value is well-defined on physical systems (i.e. the subject of Tegmark's worry here). Conflation of emotional valence & moral value can then be split off as a subargument.

Generally speaking, I think if one accepts that it's possible in principle to talk about qualia in a way that 'carves reality at the joints', it's not much of a stretch to assume that emotional valence is one such natural kind (arguably the 'c. elegans of qualia'). I don't think we're logically forced to assume this, but I think it's prima facie plausible, and paired with some of our other work it gives us a handhold for approaching qualia in a scientific/predictive/falsifiable way.

Essentially, QRI has used this approach to bootstrap the world's first method for quantifying emotional valence in humans from first principles, based on fMRI scans. (It also should work for most non-human animals; it's just harder to validate in that case.) We haven't yet done the legwork on connecting future empirical results here back to the computationalism vs physicalism debate, but it's on our list.

TL;DR: If consciousness is a 'crisp' thing with discoverable structure, we should be able to build/predict useful things with this that cannot be built/predicted otherwise, similar to how discovering the structure of electromagnetism let us build/predict useful things we could not have otherwise. This is probably the best route to solve these metaphysical disagreements.

It wasn't clear to me from your comment, but based on your link I am presuming that by "crisp" you mean "amenable to generalizable scientific theories" (rather than "ontologically basic"). I was using "pleasure/pain" as a catch-all term and would not mind substituting "emotional valence".

It's worth emphasizing that just because a particular feature is crisp does not imply that it generalizes to any particular domain in any particular way. For example, a single ice crystalline has a set of directions in which the molecular bonds are oriented which is the same throughout the crystal, and this surely qualifies as a "crisp" feature. Nonetheless, when the ice melts, this feature becomes undefined -- no direction is distinguished from any other direction in water. When figuring out whether a concept from one domain extends to a new domain, to posit that there's a crisp theory describing the concept does not answer this question without any information on what that theory looks like.

So even if there existed a theory describing qualia and emotional valence as it exists on Earth, it need not extend to being able to describe every physically possible arrangement of matter, and I see no reason to expect it to. Since a far future civilization will be likely to approach the physical limits of matter in many ways, we should not assume that it is not one such arrangement of matter where the notion of qualia is inapplicable.

This is an important point and seems to hinge on the notion of reference, or the question of how language works in different contexts. The following may or may not be new to you, but trying to be explicit here helps me think through the argument.

Mostly, words gain meaning from contextual embedding- i.e. they’re meaningful as nodes in a larger network. Wittgenstein observed that often, philosophical confusion stems from taking a perfectly good word and trying to use it outside its natural remit. His famous example is the question, “what time is it on the sun?”. As you note, maybe notions about emotional valence are similar- trying to ‘universalize’ valence may be like trying to universalize time-zones, an improper move.

But there’s another notable theory of meaning, where parts of language gain meaning through deep structural correspondence with reality. Much of physics fits this description, for instance, and it’s not a type error to universalize the notion of the electromagnetic force (or electroweak force, or whatever the fundamental unification turns out to be). I am essentially asserting that qualia is like this- that we can find universal principles for qualia that are equally and exactly true in humans, dogs, dinosaurs, aliens, conscious AIs, etc. When I note I’m a physicalist, I intend to inherit many of the semantic properties of physics, how meaning in physics ‘works’.

I suspect all conscious experiences have an emotional valence, in much the same way all particles have a charge or spin. I.e. it’s well-defined across all physical possibilities.

Do you think we should move the conversation to private messages? I don't want to clutter a discussion thread that's mostly on a different topic, and I'm not sure whether the average reader of the comments benefits or is distracted by long conversations on a narrow subtopic.

Your comment appears to be just reframing the point I just made in your own words, and then affirming that you believe that the notion of qualia generalizes to all possible arrangements of matter. This doesn't answer the question, why do you believe this?

By the way, although there is no evidence for this, it is commonly speculated by physicists that the laws of physics allow multiple metastable vacuum states, and the observable universe only occupies one such vacuum, and near different vacua there different fields and forces. If this is true then the electromagnetic field and other parts of the Standard Model are not much different from my earlier example of the alignment of an ice crystal. One reason this view is considered plausible is simply the fact that it's possible: It's not considered so unusual for a quantum field theory to have multiple vacuum states, and if the entire observable universe is close to one vacuum then none of our experiments give us any evidence on what other vacuum states are like or whether they exist.

This example is meant to illustrate a broader point: I think that making a binary distinction between contextual concepts and universal concepts is oversimplified. Rather, here's how I would put it: Many phenomena generalize beyond the context in which they were originally observed. Taking advantage of this, physicists deliberate seek out the phenomena that generalize as far as possible, and over history broadened their grasp very far. Nonetheless, they avoid thinking about any concept as "universal", and often when they do think a concept generalizes they have a specific explanation for why it should, while if there's a clear alternative to the concept generalizing they keep an open mind.

So again: Why do you think that qualia and emotional valence generalize to all possible arrangements of matter?

EA forum threads auto-hide so I’m not too worried about clutter.

I don’t think you’re fully accounting for the difference in my two models of meaning. And, I think the objections you raise to consciousness being well-defined would also apply to physics being well-defined, so your arguments seem to prove too much.

To attempt to address your specific question, I find the hypothesis that ‘qualia (and emotional valence) are well-defined across all arrangements of matter’ convincing because (1) it seems to me the alternative is not coherent (as I noted in the piece on computationalism I linked for you) and (2) it seems generative and to lead to novel and plausible predictions I think will be proven true (as noted in the linked piece on quantifying bliss and also in Principia Qualia).

All the details and sub arguments can be found in those links.

Will be traveling until Tuesday; probably with spotty internet access until then.

I haven't responded to you for so long firstly because I felt like we got to the point in the discussion where it's difficult to get across anything new and I wanted to be attentive to what I say, and then because after a while without writing anything I became disinclined from continuing. The conversation may close soon.

Some quick points:

  • My whole point in my previous comment is that the conceptual structure of physics is not what you make it out to be, and so your analogy to physics is invalid. If you want to say that my arguments against consciousness apply equally well to physics you will need to explain the analogy.

  • My views on consciousness that I mentioned earlier but did not elaborate on are becoming more relevant. It would be a good idea for me to explain them in more detail.

  • I read your linked piece on quantifying bliss and I am unimpressed. I concur with the last paragraph of this comment.

Thank you for providing an abstract for your article. I found it very helpful.

(and I wish more authors here would do so as well)

Random thought: (factory farm) animal welfare issues will likely eventually be solved by cultured (lab grown) meat when it becomes cheaper than growing actual animals. This may take a few decades, but social change might take even longer. The article even suggests technical issues may be easier to solve, so why not focus more on that (rather than on MCE)?

I just took it as an assumption in this post that we're focusing on the far future, since I think basically all the theoretical arguments for/against that have been made elsewhere. Here's a good article on it. I personally mostly focus on the far future, though not overwhelmingly so. I'm at something like 80% far future, 20% near-term considerations for my cause prioritization decisions.

This may take a few decades, but social change might take even longer.

To clarify, the post isn't talking about ending factory farming. And I don't think anyone in the EA community thinks we should try to end factory farming without technology as an important component. Though I think there are good reasons for EAs to focus on the social change component, e.g. there is less for-profit interest in that component (most of the tech money is from for-profit companies, so it's less neglected in this sense).

Thank you for this piece. I enjoyed reading it and I'm glad that we're seeing more people being explicit about their cause-prioritization decisions and opening up discussion on this crucially important issue.

I know that it's a weak consideration, but I hadn't, before I read this, considered the argument for the scale of values spreading being larger than the scale of AI alignment (perhaps because, as you pointed out, the numbers involved in both are huge) so thanks for bringing that up.

I'm in agreement with Michael_S that hedonium and delorium should be the most important considerations when we're estimating the value of the far-future, and from my perspective the higher probability of hedonium likely does make the far-future robustly positive, despite the valid points you bring up. This doesn't necessarily mean that we should focus on AIA over MCE (I don't), but it does make it more likely that we should.

Another useful contribution, though others may disagree, was the biases section: the biases that could potentially favour AIA did resonate with me, and they are useful to keep in mind.

That makes sense. If I were convinced hedonium/dolorium dominated to a very large degree, and that hedonium was as good as dolorium is bad, I would probably think the far future was at least moderately +EV.

Isn't hedonium inherently as good as dolorium is bad? If it's not, can't we just normalize and then treat them as the same? I don't understand the point of saying there will be more hedonium than dolorium in the future, but the dolorium will matter more. They're vague and made-up quantities, so can't we just set it so that "more hedonium than dolorium" implies "more good than bad"?

He defines hedonium/dolorium as the maximum positive/negative utility you can generate with a certain amount of energy:

"For example, I think a given amount of dolorium/dystopia (say, the amount that can be created with 100 joules of energy) is far larger in absolute moral expected value than hedonium/utopia made with the same resources."

Exactly. Let me know if this doesn't resolve things, zdgroff.

Thanks very much for writing this, and thanks to Greg for funding it! I think this is a really important discussion. Some slightly rambling thoughts below.

We can think about 3 ways of improving the EV of the far future:

1: Changing incentive structures experienced by powerful agents in the future (e.g. avoiding arms races, power struggles, selection pressures)

2: a) Changing the moral compass of powerful agents in the future in specific directions (e.g. MCE).

b) Indirect ways to improve the moral compass of powerful agents in the future (e.g. philosophy research, education, intelligence/empathy enhancement)

All of these are influenced both by strategies such as activism, improving institutions, and improving education, as well as by AIA. I am inclined to think of AIA as a particularly high-leverage point at which we can have influence on these.

However, these are issues are widely encountered. Consider 2b: we have to decide how to educate the next generation of humans, and they may well end up with ethical beliefs that are different from ours, so we must judge how much to try and influence or constrain them, and how much to accept that the changes are actually progress. This is similar to the problem of defining CEV: we have some vague idea of the direction in which better values lie (more empathy, more wisdom, more knowledge), but we can't say exactly what the values should be. For this intervention, working on AIA may be more important than activism because it has more leverage - it is likely to be more tractable and have greater influence on the future than the more diffuse ways in we can push on education and intergenerational moral progress.

This framework also suggests that MCE is just one example of a collection of similar interventions. MCE involves pushing for a fairly specific belief and behaviour change on a principle that's fairly uncontroversial. You could also imagine similar interventions - for instance, helping people reduce unwanted aggressive or sadistic behaviour. We could call this something like 'uncontroversial moral progress': helping individuals and civilisation to live by their values more. (on a side note: sometimes I think of this as the minimal core of EA: trying to live according to your best guess of what’s right)

The choice between working on 2a and 2b depends, among other things, on your level of moral uncertainty.

I am inclined to think that AIA is the best way to work on 1 and 2b, as it is a particularly high-leverage intervention point to shape the power structures and moral beliefs that exist in the future. It gives us more of a clean slate to design a good system, rather than having to work within a faulty system.

I would really like to see more work on MCE and other examples of 'uncontroversial moral progress'. Historical case studies of value changes seem like a good starting point, as well as actually testing the tractability of changing people's behaviour.

I also really appreciated your perspective on different transformative AI scenarios, as I’m worried I’m thinking about it in an overly narrow way.

Impressive article - I especially liked the biases section. I would recommend doing a quantitative model of cost effectiveness comparing to AIA, as I have done for global agricultural catastrophes, especially because neglectedness is hard to define in your case.

You say

I think a given amount of dolorium/dystopia (say, the amount that can be created with 100 joules of energy) is far larger in absolute moral expected value than hedonium/utopia made with the same resources

Could you elaborate more on why this is the case? I would tend to think that a prior would be that they're equal, and then you update on the fact that they seem to be asymmetrical, and try to work out why that is the case, and whether those factors will apply in future. They could be fundamentally asymmetrical, or evolutionary pressures may tend to create minds with these asymmetries. The arguments I've heard for why are:

  • The worst thing that can happen to an animal, in terms of genetic success, is much worse than the best thing.

This isn't entirely clear to me: I can imagine a large genetic win such as securing a large harem could be comparable to the genetic loss of dying, and many animals will in fact risk death for this. This seems particularly true considering that dying leaving no offspring doesn't make your contribution to the gene pool zero, just that it's only via your relatives.

  • There is selection against strong positive experiences in a way that there isn't against strong negative experiences.

The argument here is, I think, that strong positive experiences will likely result in the animal sticking in the blissful state, and neglecting to feed, sleep, etc, whereas strong negative experiences will just result in the animal avoiding a particular state, which is less maladaptive. This argument seems stronger to me but still not entirely satisfying - it seems to be quite sensitive to how you define states.

Thanks for writing it.

Here are my reasons for the belief wild animal/small minds/... suffering agenda is based mostly on errors and uncertainties. Some of the uncertainties should warrant research effort, but I do not believe the current state of knowledge justifies prioritization ofany kind of advocacy or value spreading.

1] The endeavour seems to be based on extrapolating intuitive models far outside the scope for which we have data. The whole suffering calculus is based on extrapolating the concept of suffering far away from the domain for which we have data from human experience.

2] Big part of it seems arbitrary. When expanding the moral circle toward small computational processes and simple systems, why not expand it toward large computational processes and complex systems? E.g. we can think about the DNA based evolution as about large computational/optimization process - suddenly "wild animal suffering" has a purpose and traditional environmnet and biodiversity protection efforts make sense.

(Similarly we could argue much "human utility" is in the larger system structure above individual humans)

3] We do not know how to measure and aggregate utility of mind states. Like, really don't know. E.g. it sems to me completely plausible the utility of 10 people reaching some highly joyful mindstates is the dominanat contribution over all human and animal minds.

4] Part of the reasoning usually seems contradictory. If the human cognitive processes are in the priviledged position of creating meaning in this universe ... well, then they are in the priviledged postion, and there _is_ a categorical difference between humans and other minds. If they are not in the priviledged positions, how it comes humans should impose their ideas about meaning on other agents?

5] MCE efforts directed toward AI researchers with the intent of influencing values of some powerful AI may increase x-risk. E.g. if the AI is not "speciist" and gives the same weight to satysfing prefrences of all humans and all chicken, the chicken would outnumber humans.

You raise some good points. (The following reply doesn't necessarily reflect Jacy's views.)

I think the answers to a lot of these issues are somewhat arbitrary matters of moral intuition. (As you said, "Big part of it seems arbitrary.") However, in a sense, this makes MCE more important rather than less, because it means expanded moral circles are not an inevitable result of better understanding consciousness/etc. For example, Yudkowsky's stance on consciousness is a reasonable one that is not based on a mistaken understanding of present-day neuroscience (as far as I know), yet some feel that Yudkowsky's view about moral patienthood isn't wide enough for their moral tastes.

Another possible reply (that would sound better in a political speech than the previous reply) could be that MCE aims to spark discussion about these hard questions of what kinds of minds matter, without claiming to have all the answers. I personally maintain significant moral uncertainty regarding how much I care about what kinds of minds, and I'm happy to learn about other people's moral intuitions on these things because my own intuitions aren't settled.

E.g. we can think about the DNA based evolution as about large computational/optimization process - suddenly "wild animal suffering" has a purpose and traditional environmnet and biodiversity protection efforts make sense.

Or if we take a suffering-focused approach to these large systems, then this could provide a further argument against environmentalism. :)

If the human cognitive processes are in the priviledged position of creating meaning in this universe ... well, then they are in the priviledged postion, and there is a categorical difference between humans and other minds.

I selfishly consider my moral viewpoint to be "privileged" (in the sense that I prefer it to other people's moral viewpoints), but this viewpoint can have in its content the desire to give substantial moral weight to non-human (and human-but-not-me) minds.

@Matthew_Barnett As a senior electrical engineering student, proficient in a variety of programming languages, I do think and believe that AI is important to think about and discuss. The theoretical threat of a malevolent strong AI would be immense. But that does not mean one has cause or a valid reason to support CS grad students financially.

A large, significant, asteroid collision with Earth would also be quite devastating. Yet, to fund and support aerospace grads does not follow. Perhaps I really mean this: AI safety is an Earning to Give non sequitur.

Lastly, again, there is no evidence or results. Effective Altruism is about being beneficent instead of merely benevolent (meaning well). In other words, making decisions off well researched initiatives (e.g., bed nets). Since strong AI does not exist, it does not make sense to support though E2G. (I'm not saying it will never exist; that is unknown.) Of course, there are medium-term (systematic change) with results that more or less rely on historical-type empiricism--but that's still some type of evidence. For poverty we have RCTs and developmental economics. For AI safety [something?]. For animal suffering we have proof that less miserable conditions can become a reality.

I don't think anyone here is suggesting supporting random CS grads financially. Although, they might endorse something like that indirectly by funding AI alignment research, which tends to attract CS grads.

I agree that simply because an asteroid collision would be devastating, it does not follow that we should necessarily focus on that work in particular. However, there are variables which I think you might be overlooking.

The reason why people are concerned with AI alignment is not necessarily because of the scope of the issue, but also the urgency and tractability of the problem. The urgency of the problem comes from the idea that advanced AI will probably be developed this century. The tractability of the problem comes from the idea that there exists a set of goals that we could in theory put into an AI goals that are congruent with ours -- you might want to read up on the Orthogonality Thesis.

Furthermore, it is dangerous to assume that we should judge the effectiveness of certain activities merely based on prior evidence or results. There are some activities which are just infeasible to give post hoc judgements about -- and this issue is one of them. The inherent nature of the problem is that we will probably only get about one chance to develop superintelligence -- because if we fail, then we will all probably die or otherwise be permanently unable to alter its goals.

To give you an analogy, few would agree that because climate change is an unprecedented threat, it therefore follows that we should wait until after the damage has been done to assess the best ways of mitigating it. Unfortunately for issues that have global scope, it doesn't look like we get a redo if things start going badly.

If you want to learn more about the research, I recommend reading Superintelligence by Nick Bostrom. The vast majority of AI alignment researchers are not worried about malevolent AI despite your statement. I mean this is in the kindest way possible, but if you really want to be sure that you're on the right side of a debate, it's worth understanding the best arguments against your position, not the worst.

Please, what AIA organizations? MIRI? And do not worry about offending me. I do not intend to offend. If I do/did though my tone or however, I am sorry.

That being said, I wish you would've examined the actual claims I presented. I did not claim AI researchers are worried about a malevolent AI. I am not against researchers; research in robotics, industrial PLCs, nanotech, whatever--are fields in their own right. It is donating my income, as an individual that I take offense. People can fund whatever they want: A new planetary wing at a museum, research in robotics, research in CS, research in CS philosophy.

Although, Earning to Give does not follow. Thinking about and discussing the risks of strong AI does make sense, and we both seem to agree it is important. The CS grad students being supported, however, what makes them different from a random CS grad? Just because they claim to be researching AIA? Following the money, there is not a clear answer on which CS grad students are receiving it. Low or zero transparency. MIRI or no? Am I missing some public information?

Second, what do you define as advanced AI? Before, I said strong AI. Is that what you mean? Is there some sort of AI in between? I'm not aware. This is crucially where I split with AI safety. The theory is an idea of a belief about the far future. To claim that we're close to developing strong AI is unfounded to me. What in this century is so close to strong AI? Neural networks do not seem to be (from my light research).

I do not believe climate change is as simple to define a "before" and "after." Perhaps a large rogue solar flair or the Yellowstone supervolcano. Or perhaps even a time travel analogy would suffice ~ time travel safety research. There is no tractability/solvability. [Blank] cannot be defined because it doesn't exist; unfounded and unknown phenomena cannot be solved. Climate change exists. It is a very real reality. It has solvability. A belief in an idea about the future is a poor reason for claiming some sort of tractability for funding. Strong AI safety (singularity safety) has "solvability" for thinking about and discussing--but, again, it does not follow that one should give monetarily. I feel like I'm beating a dead horse with this point.

For the book recommendation, I looked into it. I'd rather read about morality/ethics directly or further delve into better learning Java, Python, Logix5000, LabVIEW, etc.

SE for

SE against

Please, what AIA organizations? MIRI?

Yes, MIRI is one. FHI is another.

That being said, I wish you would've examined the actual claims I presented. I did not claim AI researchers are worried about a malevolent AI.

You did, however, say "The theoretical threat of a malevolent strong AI would be immense. But that does not mean one has cause or a valid reason to support CS grad students financially." I assumed you meant that you believed someone was giving an argument along the lines of "since malevolent AI is possible, then we should support CS grads." If that is not what you meant, then I don't see the relevance of mentioning malevolent AI.

Since you also stated that you had an issue with me not being charitable, I would reciprocate likewise. I agree that we should be charitable to each other's opinions.

Having truthful views is not about winning debate. It's about making sure that you hold good beliefs for good reasons, end of story. I encourage you to imagine this conversation not as a way to convince me that I'm wrong -- but more of a case study about what the current arguments are, and whether they are valid. In the end, you don't get points for winning an argument. You get points for actually holding correct views.

Therefore, it's good to make sure that your beliefs actually hold weight under scrutiny. Not in a, "you can't find the flaw after 10 minutes of self-sabotaged thinking" sort of way, but in a very deep understanding sort of way.

It is donating my income, as an individual that I take offense. People can fund whatever they want: A new planetary wing at a museum, research in robotics, research in CS, research in CS philosophy.

I agree people can fund whatever they want. It's important to make a distinction between normative questions and factual ones. It's true that people can fund whatever project they like; however, it's also true that some projects have a high value from an impersonal utilitarian perspective. It is this latter category that I care about, which is why I want to find projects with particular high value. I believe that existential risk mitigation and AI alignment is among these projects, although I fully admit that I may be mistaken.

Although, Earning to Give does not follow. Thinking about and discussing the risks of strong AI does make sense, and we both seem to agree it is important.

If you agree that thinking about something is valuable, why not also agree that funding that thing is valuable. It seems you think that the field should just get a certain threshold of funding that allows certain people to think about the problem just enough -- but not too much. I don't a reason to believe that the field of AI alignment has reached that critical threshold. On the contrary, I believe the field is far from it at the moment.

Following the money, there is not a clear answer on which CS grad students are receiving it. Low or zero transparency. MIRI or no? Am I missing some public information?

I suppose when you make a donation to MIRI, it's true that you can't be certain about how they spend that money (although I might be wrong about this, I haven't actually donated to MIRI). Generally though, funding an organization is about whether you think that their mission is neglected, and whether you think that further money would make a marginal impact in their cause area. This is no different than any other charity that EA aligned people endorse.

Second, what do you define as advanced AI? Before, I said strong AI. Is that what you mean? Is there some sort of AI in between? I'm not aware.

It might be confusing that there are all these terms for AI. To taboo the words "advanced AI", "strong AI", "AGI" or others -- what I am worried about is an information processing system that can achieve broad success in cognitive tasks in a way that rivals or surpasses humans. I hope that makes it clear.

This is crucially where I split with AI safety. The theory is an idea of a belief about the far future. To claim that we're close to developing strong AI is unfounded to me.

I'm not quite clear what you mean here. If you mean we are worried about AI in the far future, fine. But then in the next sentence you say that we're worried about being close to strong AI. How can we simultaneously believe both. If AI is near then I care about the near-term future. If AI is not near, then I care about the long-term future. I do not claim either, however. I think it is important consideration even if it's a long way off.

Neural networks do not seem to be (from my light research).

This is what I'm referring to when I talk about how important it is to really, truly understand something before developing an informed opinion about it. If you admit that you have only done light research, how can you be confident that you are right. Doing a bit of research might give you an edge for debate purposes, but we are talking about the future of life on Earth here. We really need to know the answers to these questions.

Perhaps a large rogue solar flair or the Yellowstone supervolcano. Or perhaps even a time travel analogy would suffice ~ time travel safety research. There is no tractability/solvability.

Lumping all existential risks in a single category and then asserting that there's no tractability is a simplified approach. First what we need is the probability of any given existential risk occurring. For instance, if scientists discovered that the Yellowstone supervolcano was probably about to erupt sometime in the next few centuries, I'd definitely agree we should do research in that area, and we should fund that research as well. In fact, some research is being done in that area and I'm happy that it's being done.

A belief in an idea about the future is a poor reason for claiming some sort of tractability for funding.

I'd agree with you if it was an idea asserted without evidence or reason. But there's a whole load of arguments about why it is a tractable field, and how we can do things now -- yes right now -- about making the future safer. Ignorance of these arguments does not mean they do not exist.

Remember, ask yourself first what is true. Then form your opinion. Do not go the other way.

I am not trying to "win" anything. I am stating why MIRI is not transparent, and does not deal in scalable issues. As an individual, Earning to Give, it does not follow to fund such things under the guise of Effective Altruism. Existential risk is important to think about and discuss as individuals. However, funding CS grad students does not make sense in the light of Effective Altruism.

Funding does not increase "thinking." The whole point of EA is to not give blindly. For example, giving food aid, although meaning well, can have a very negative effect (i.e., the crowding out effect on the local market). Nonmaleficence should be one's initial position in regards to funding.

Lastly, no I rarely accept something as true first. I do not first accept the null hypothesis. "But there's a whole load of arguments about why it is a tractable field"--What are they? Again, none of the actual arguments were examined: How is MIRI going about tractable/solvable issues? Who of MIRI is getting the funds? How is time travel safety not as relevant as AI safety?

Thanks for this discussion, which I find quite interesting. I think the effectiveness and efficiency of funding research projects concerning risks of AI is a largely neglected topic. I've posted some concerns on this below an older thread on MIRI: http://effective-altruism.com/ea/14c/why_im_donating_to_miri_this_year/dce

  • the primary problem being the lack of transparency on the side of Open Phil. concerning the evaluative criteria used in their decision to award MIRI with an extremely huge grant.