AI alignment research has a religion problem – specifically, a big blind spot when it comes to modeling and aligning the human values associated with organized religions.

The blind spot arises because many Artificial Intelligence (AI) researchers and Effective Altruists (EAs) are atheists who don’t think much about religion. For example, in the 2019 EA Survey, 86% of Effective Altruists reported being atheist, agnostic, or non-religious. On Facebook, the Effective Altruism group has 21,800 members, whereas the three main religiously-inclined EA groups that I could find (Christians and Effective Altruism, Buddhists in Effective Altruism, Spirituality and Effective Altruism) only have 1,700 members in total. And, out of thousands of posts on EA Forum, less than a dozen include the words ‘religion’ or ‘religious’.

Thus, when we think about aligning AI and human values, we tend to ignore religion entirely, or we discount it as irrelevant, regressive, irrational, retarded, and/or ridiculous. When we envision glorious futures, with our transhumanist descendants or self-replicating superintelligent drones pursuing galactic colonization, we typically don’t think of those futures as including religion at all. We imagine, at most, that future people might preserve some religious traditions as quaint cultural vestiges, or cautionary tales from a less rational era. We don’t take seriously the idea that helping more people enjoying infinite bliss in heaven after they die should be a major cause area. Conversely, when we envision existential risks and global catastrophes, we don’t typically think about people burning in hell forever, or being reincarnated as an insect in a world of wild animal suffering, or being stuck in Saṃsāra, the endless cycle of suffering. Our concepts of collective human success or failure are highly secularized. Within that secular context, a highly aligned Superintelligence seems like the best redemptive savior we could realistically hope for, and perhaps the only path towards maximizing total future sentient well-being.

However, most current humans are religious. Religion has been important throughout human history and probably prehistory. There are pretty good evolutionary psychology theories about the origins and adaptive functions of religions in human groups, as group coordination mechanisms, costly commitment signals of tribal loyalty, sources of mating and reproductive norms, and repositories of miscellaneous life advice, such as taboos against eating foods with high parasite risk, or against reproducing with one’s first cousins. 

So, AI alignment research is focused on promoting alignment between the AI systems that we design and train, and our human goals, values, preferences, and norms. And we know that for most humans, their religions are closely associated with their goals, values, preferences, and norms. Thus, if we take the standard definition of AI alignment research at face value – as achieving alignment with human values – a lot of it must concern alignment with religious values. 


The continuing popularity of religions

If you don’t think this is a serious issue, let’s consider the popularity of religions today. 

Many AI researchers and Effective Altruists live in relatively secular countries such as the US, UK, Australia, Canada, and Germany, where religion has relatively limited and declining influence over politics, academia, media, and public discourse. (For example, about 69% of the respondents to the 2020 EA Survey lived in these 5 countries, and very few EAs came from countries with high religiosity.) Also, whereas a high proportion of charitable donations globally go to religious charities, EAs typically donate very little money to religious charities, which they usually regard as theoretically incoherent and empirically unsupported.

However, out of the 8,000 million living humans, surveys indicate that about 2,380 million are Christian, 1,810 million are Muslim, 1,160 million are Hindu, and 507 million are Buddhist. Together, these ‘big four’ religions include about 5,857 million humans, or 73% of our species. Another 500-1,000 million people (it’s hard to count) follow various ‘folk religions’ (including Shinto, Taoism, African tribal religions, etc.) and/or are influenced by various quasi-religious belief systems such as Confucianism. People who identify as ‘unaffiliated’ with any religion (including atheists and agnostics) seem to number no more than 1,200 million, or about 15% of humanity. 

Another way to look at human religion is by country. Of the 14 major countries with populations above 100 million, only 2 (China, population 1,430 million and Japan, population 124 million) have a majority that are not actively involved in organized religion. Consider the other 12 countries in descending order of population, with just the big four religions considered:

  • India: 1,420 million people, 80% Hindu, 14% Muslim 
  • USA: 338 million people, 65% Christian
  • Indonesia: 276 million people, 87% Muslim
  • Pakistan: 236 million people, 96% Muslim
  • Nigeria: 219 million people, 46% Christian, 46% Muslim
  • Brazil: 215 million people, 81% Christian
  • Bangladesh: 171 million people, 91% Muslim
  • Russia: 144 million people, 47% Christian, 7% Muslim
  • Mexico: 127 million people, 90% Christian
  • Ethiopia: 123 million people, 67% Christian, 31% Muslim
  • Philippines: 116 million people, 88% Christian
  • Egypt: 111 million people, 90% Muslim, 10% Christian

Together, these 14 most populous countries include 5,050 million people, or 63% of all living humans. Note that in 8 of the 14 countries, more than 80% of people belong to just one of the four major religions.

Overall, over 80% of living humans are religious to some degree – and often to a very considerable degree. Religious people often believe that their most important goals, values, preferences, and norms are dictated by God(s), revealed by prophets, derived from their religion, and are crucial to their long-termist well-being after death (whether in a Christian or Muslim afterlife, a Hindu reincarnation, or a Buddhist liberation from reincarnation). 

Can we achieve AI alignment with humans and human values when over 80% of humans are religious, and when their most important values are closely associated with their religion? I don’t know. Let’s think about it a bit more.


Will religion wither away before AI alignment becomes a big issue?

Elite intellectuals have been predicting that religion with wither away ever since the 16th century Scientific Revolution, the 17th century Enlightenment, the 18th century Industrial Revolution, 19th century socialism, and 20th century consumerism. Yet here we still are, 80% religious.

Many AI researchers and Effective Altruists live in countries such as the US, UK, and Australia where religiosity is quickly declining, and where younger cohorts show rapidly increasing atheism and agnosticism, with political activism on social media replacing organized religion as the key domain of young adult virtue-signaling. For example, in the US in 2017, only 12% of people over age 65 were religiously unaffiliated, whereas 38% of people aged 18-29 were religiously unaffiliated – a huge generational increase in atheism. This might give the impression that religion is in a general global decline as a human instinct and institution. 

However, this decrease in religiosity is happening mostly in rich countries with low and declining birthrates. In countries with high birthrates, religiosity remains high. Given that religious people are having more kids, and religious countries are having more kids, and religiosity is both genetically heritable and culturally transmitted, religiosity may hold steady or even increase at the global population level. For example, Nigeria is about 92% religious (46% Christian, 46% Muslim), and its population is expected to increase from 219 million in 2022 to 329 million in 2040 and 401 million in 2050. If Nigerian religiosity remains steady at 92%, that’s another 167 million religious people within the next 30 years, just in one country. 

On a longer timescale, by 2100, 8 of the 10 most countries that are projected to have the highest populations are ones that currently have very high religiosity, including India (1,450 million people expected by 2100), Nigeria (733 million), Pakistan (403 million), D.R. Congo (362 million), Indonesia (321 million), Ethiopia (294 million), Tanzania (286 million), and Egypt (225 million). Together, these countries will account for at least an extra 1,500 million people (compared to their current populations), and if they remain at least 80% religious, they’ll add an extra 1,200 million religious people in the world by 2100. For religion to suffer a net decrease in popularity, the other countries would have to add an extra 1,200 million atheists – which seems unlikely, given their declining populations. 

Effective Altruists often seem to assume that religion will decline in poorer, higher-fertility countries as they enjoy better nutrition, health care (anti-malaria bednets, deworming, etc), nootropics, and embryo selection methods to increase average intelligence and openness. In other words, the expectation is that cognitive and moral enhancement will gradually turn more people into atheists – even if religious people are currently out-breeding atheists. Maybe that will happen, over a multi-decade or multi-generational time scale. But these interventions seems unlikely to have a major effect on religiosity before AI alignment becomes a major issue, given current projections by AI researchers about the likely AGI timelines. In summary, most humans will probably still be religious if AGI is developed any time in the next century.


Common ground between Effective Altruists and religious people

The likely persistence of human religion may seem alarming or depressing to EA atheists and AI safety researchers. What do we rationalists have in common with religious people? How could they even participate constructively in a dialogue about AI alignment?

I see at least three major kinds of common ground: moral circle expansion, longtermism, and the Simulation Hypothesis.

Moral circle expansion. Effective Altruists ever since Peter Singer have sought to expand our moral circle, pushing human instincts for altruism beyond their ancestral limits (self, family, and tribe) to include more humans and other animal species. Most major religions have also preached some form of moral circle expansion, nudging their believers to be nicer not just to people in the same family and tribe, but to everyone who shares the same faith. Often, religions even preach tolerance and altruism towards non-believers, and some degree of concern for other animals (e.g. Hindu and Buddhist veganism). Of course, religious altruism is much more based on deontology and virtue-signaling than on consequentialism, and tends to be far less evidence-based than EA charity evaluation. But religions on average tend to preach altruism over selfishness, and a more inclusive version of tribalism over a less inclusive version.

Longtermism. Many Effective Altruists have shifted from emphasizing shorter-term goals such as global public health and poverty reduction, to longer-term goals such as reducing existential risks and promoting sentient flourishing in future millennia. Longtermism is the new enthusiasm. But all the major religions have been longtermist for at least a millennium. They just have a different model of the world, where the ‘long term’ typically includes an extremely long afterlife. For example, Christianity and Islam teach that devout believers will enjoy an eternal heaven in the afterlife, and non-believers will suffer eternal torment (or at least a lamentable alienation from God) in hell. Hinduism teaches that people reincarnate in higher or lower sentient forms according to the karma accumulated in each life; this happens thousands and thousands of times, at a time scale that can be measured in kalpas (units of 4.32 billion years). Buddhism also emphasizes reincarnation, teaching that it may take many cycles of birth and death (saṃsāra) before a sentient being escapes duḥkha (suffering) and achieves nirvana. Religions also generally nudge people to avoid short-term temptations, bad habits, impulsive aggression, and runaway consumerism, and to think about the longer-term consequences of their actions, both in this life and the afterlife.

Simulation Hypothesis. Many Effective Altruists believe in the Simulation Hypothesis, that we are living in some sort of simulated reality, such as highly advanced computer simulation, virtual reality, or Matrix, that was created by much more advanced forms of sentient life. All major religions agree with this. They just have a pre-computational understanding of how such a simulation works, and of what kind of entities are running it. What we call simulators or programmers, they would call Gods. For example, in Hinduism, māyā refers to the veil of illusion or the tempting magic show that humans perceive as their everyday lives. In Christian theology, our visible, temporal world is a relatively illusory and transient show created by a more substantial and eternal God. Most religions view our lived experiences as somewhat shallow, deceptive, and fleeting compared to an eternal, transcendent realm where immortals live. If you think the Simulation Hypothesis seems likely, but the traditional religions are idiotic, then either you don’t understand the Simulation Hypothesis in a sufficiently humble way, or you don’t understand traditional religions in a sufficiently humble way.

Of course, to skeptics outside Effective Altruism, our interests in moral circle expansion, longtermism, and the Simulation Hypothesis strike them as quite religious, cult-like, and faith-based. If outsiders think that EA has something pretty deep in common with major religions, maybe we can accept some of those commonalities ourselves, and hold religions is a little less contempt than we’re used to doing?


Distinctive problems that religion raises for AI alignment

If we take seriously the fact that (1) over 80% of humans are religious, and (2) many human values are closely associated with religions, and (3) AI alignment is supposed to be about aligning human values with AI systems, then where does that leave us? How should we be thinking differently about AI alignment?

I don’t know. This essay is a preliminary attempt to raise the issue. I look forward to your thoughts and feedback. I don’t have many concrete suggestions so far.

However, as a preliminary exercise, I can note a few distinctive problems that religion seems to raise for AI alignment. (There might be dozens of other problems that you can suggest.)

1. AI systems look intrinsically sacrilegious, outrageous, and hubristic to many religious people, and the more powerful the AI system gets, the more outrageous it might seem. All four main religions teach that humans have distinctive kinds of souls that survive our bodily death. Creating systems that show human-level intelligence, but that are soulless, faithless, and atheistic, may seem evil to many religious people. As AI systems grow more advanced, more capable, and more intrusive in our everyday lives, religious leaders will notice, and judge. And some will condemn. The Pope, Cardinals, and Catholic priests will have views on AI. Muslim Imams, Alims, Allamahs, Grand Muftis, Mujtahids, and Ayatollahs will have views on AI. Hindu and Buddhist priests, scholars, and gurus will have views on AI. Those views might be positive, but they might be very negative, or even aggressive. 

Consider the traditional Muslim views on idolatry, which prohibit depictions, sculptures, or simulacra of sentient beings – especially humans. If Muslim leaders decide that creating AI systems or anthropomorphic robots is sacrilegious, the Butlerian Jihad might become an actual Jihad, and AI researchers might be condemned by serious fatwas

2. Cultural conflicts between religions may create AI safety risks just as serious as geopolitical conflict between nation-states. Historically, religious conflicts have accounted for a high proportion of organized warfare. Even in the last century, a lot of conflict and tension between nation-states has some degree of religious conflict behind it (consider mostly-Hindu India versus mostly-Muslim Pakistan, mostly-Sunni Saudi Arabia versus mostly-Shia Iran, or the Cold War between mostly-Christian USA and mostly-atheist Soviet Union). 

Every major religion proclaims itself the one true religion, whereas no nation-state proclaims itself the ‘one true nation-state’. Nation-states can play positive-sum games with each based on exchanging resources, products, ideas, traditions, tourists, and talent. Religions seem locked into a more zero-sum competitive situation. Nation-states might find enough common ground that they can avoid AI arms races that sacrifice safety for speed of progress. If religions realize that AI can play powerful roles in recruiting new members, enforcing doctrine, surveilling believers, identifying heretics and apostates, and undermining rival religions, we might face a religious AI arms race just as dangerous as a geopolitical AI arms race. 

Nation-states have a lot of resources they can devote to AI development, but so do organized religions. And many nation-states are highly aligned with particular religions – e.g. the 13 countries that are officially Christian, and the 26 countries that are officially Muslim. (If you’ve played the Civilization VI computer game, consider that there’s a ‘Religious Victory’ condition in addition to the Domination, Diplomacy, Science, and Cultural victory conditions.)

3. Religious values might be harder to align with AI systems than other kinds of values. For example, when AI researchers talk about teaching AI systems to incorporate human values, the most common examples seem to be (1) super-high-stakes preferences for life over death (e.g. ‘Please don’t kill me or exterminate all of humanity just to make some paperclips or collect some stamps’), (2) low-stakes preferences for certain consumer goods and services (e.g. ‘Please deliver pineapple pizza in an hour’, or ‘Please drive me to the airport without breaking the speed limit’), or (3) low-stakes preferences to avoid inconveniences (e.g. ‘Please don’t knock over that vase while making tea’). 

However, many religions teach that believers should pursue ascetic or transcendental values that don’t map very well onto life-or-death, consumer, of convenience preferences. For example, Buddhism teaches a doctrine of nonattachment, which is basically a meta-preference not to have strong preferences, and not to take one’s preferences too seriously. If an AI system asked a seriously devout Buddhist to teach it their preferences, the devout Buddhist might shrug, and say ‘Help me to escape from saṃsāra and the whole illusion that preferences matter. Oh and maybe remind me in an hour to meditate.’

For the billions of believers who take seriously the idea of the afterlife or reincarnation (i.e. every member of the four major religions who’s actually devout), the idea that an AI system could incorporate their true preferences and payoff functions is metaphysically impossible, because they’re no feedback or reward signal from beyond the grave. Christian and Muslims want to go to heaven, and if they’re extremely devout, that’s all that matters. There’s nothing that an AI system can do for them in this life that matters even a trillionth as much as getting an eternal reward in heaven. Getting into heaven is the reward signal that they would want the AI to learn, and to help them achieve – but how will the AI learn where their souls go after death? Where exactly do AI researchers get that training data?

Can AI engineers invent some clever new ‘collaborative inverse reinforcement learning algorithm’ or ‘coherent extrapolated transcendental volition algorithm’ (or whatever) to train an AI that can help a devout Christian or Muslim get into heaven?  Can the algorithm train an AI system to help a devout Hindu accumulate enough karma to be reincarnated as a Brahmin, swami, or guru? Can it train an AI system to help a devout Buddhist achieve nirvana? 

These are the human values that religious people would want the AI to align with. If we can’t develop AI systems that are aligned with these values, we haven’t solved the AI alignment problem.  

We could keep ignoring religion, and push on with AI alignment work as if all humans are atheists with atheist values. But then we’d be ignoring more than 80% of humanity.


Caveats and notes:

  • I’m a Darwinian atheist with only a superficial understanding of religion; I just know some religious friends and family members, and think we should take religion seriously as a human phenomenon. 
  • I’m no more than 70% confident about any argument I’m making here, and my facts, figures, definitions, and claims could probably be improved in many ways.
  • I’m studied human nature since the late 1980s, and I did lots of machine learning research with neural networks in the early 1990s, but I’ve only thought seriously about AI alignment for 5 years, and I’ve only been writing publicly about it for a few weeks, so my understanding of the AI alignment literature is limited.
  • Even if everything I say here is true (e.g. that AI alignment with the most important religious values might be impossible), I still think we should invest a huge amount of talent, time, thought, energy, and money into research on AI alignment with people’s secular values.
  • These ideas are fairly half-baked, and I welcome any constructive criticism, elaboration, and feedback.


27 comments, sorted by Click to highlight new comments since: Today at 5:17 AM
New Comment

AI alignment research is focused on promoting alignment between the AI systems that we design and train, and our human goals, values, preferences, and norms.


I still think we should invest a huge amount of talent, time, thought, energy, and money into research on AI alignment with people’s secular values.

I think almost all alignment research is concerned with issues like what's going on inside the AI or how can we get the AI to tell us what it thinks or how can we get the AI to do what we tell it to do rather than something that depends much on what it is that humans actually value. Alignment-research-taking-religion-into-account would look identical to current alignment research.

I don't think it's right that the broad project of alignment would look the same with and without considering religion. I'm curious what your reasoning is here and if I'm mistaken.

One way of reading this comment is that it's a semantic disagreement about what alignment means. The OP seems to be talking about the  problem of getting an AI to do the right thing, writ large, which may encompass a broader set of topics than alignment research as you define it.

Two other ways of reading it are that (a) solving the problem the OP is addressing (getting an AI to do the right thing, writ large) does not depend on values, or (b) solving the alignment problem will necessarily solve the value problem. I don't entirely see how you can justify (a) without a claim like (b), though I'm curious if there's a way.

You might justify (b) via the argument that solving alignment involves coming up a way to extrapolate values. Perhaps it is irrelevant which particular person you start with, because the extrapolation process will end up at the same point. To me this seems quite dubious. We have no such method and observe deep disagreement in the world. Which methods we use to resolve disagreement and determine whose values we include seem to involve a question of values. And from my lay sense, the methods of alignment that are currently most-discussed involve aligning it with specific preferences.

One way of reading this comment is that it's a semantic disagreement about what alignment means. The OP seems to be talking about the  problem of getting an AI to do the right thing, writ large, which may encompass a broader set of topics than alignment research as you define it.

Kind of.

Alignment researchers want AI to do the right thing. How they try to do that is mostly not sensitive to what humans want; different researchers do different stuff but it's generally more like interpretability or robustness than teaching specific values to AI systems. So even if religion was more popular/appreciated/whatever, they'd still be doing stuff like interpretability, and still be doing it in the same way.

(a) and (b) are clearly false, but many believe that most of the making-AI-go-well problem is getting from AI killing everyone to AI not killing everyone and that going from AI not killing everyone to AI doing stuff everyone thinks is great is relatively easy. And value-loading approaches like CEV should be literally optimal regardless of religiosity.

Few alignment researchers are excited about Stuart Russell's research, I think (at least in the bay area, where the alignment researchers I know are). I agree that if his style of research was more popular, thinking about values and metavalues and such would be more relevant.

Zach -- I may be an AI alignment newbie, but I don't understand how 'alignment' could be 'mostly not sensitive to what humans want'. I thought alignment with what humans want was the whole point of alignment. But now you're making it sound like 'AI alignment' means 'alignment with what Bay Area AI researchers think should be everyone's secular priorities.

Even CEV seems to depend on an assumption that there is a high degree of common ground among all humans regarding core existential values -- Yudkowsky explicitly says that CEV could only works 'to whatever extent most existing humans, thus extrapolated, would predictably want* the same things'.  If some humans are antinatalists, or Earth First eco-activisits, or religious fundamentalists yearning for the Rapture, or bitter nihilists, who want us to go extinct, then CEV won't work to prevent AI from killing everyone. CEV and most 'alignment' methods only seem to work if they sweep the true religious, political, and ideological diversity of humans under the rug.

I also see no a priori reason why getting from (1) AI killing every one  to AI not killing everyone would be easier than getting from (2) AI not killing eveyone to AI doing stuff everyone thinks is great. The first issue (1) seems to require explicitly prioritizing some human corporeal/body interests over the brain's stated preferences, as I discussed here .

zdgroff -- that link re. specific preferences to the 80k Hours interview with Stuart Russell is a fascinating example of what I'm concerned about. Russell seems to be arguing that either we align an AI system with one person's individual stated preferences at a time, or we'd have to discover the ultimate moral truth of the universe, and get the AI aligned to that. 

But where's the middle ground of trying to align with multiple people who have diverse values? That's where most of the near-term X risk lurks, IMHO -- i.e. in runaway geopolitical or religious wars, or other human conflicts, amplified by AI capabilities. Even if we're talking fairly narrow AI rather than AGI. 

Zach -- thanks for this comment; I'm working on a reply to it, which I'll published as an EA Forum post within a couple of days.

A preview: I think there are good theoretical and empirical reasons why alignment research taking the full heterogeneity of human value types into account (including differences between religious values, political values, food preferences, economic ambitions, mate preferences, cultural taboos, aesthetic tastes, etc) would NOT look identical to current alignment research.

If you think the Simulation Hypothesis seems likely, but the traditional religions are idiotic

I think the key difference here is that while traditional religions claim detailed knowledge about who the gods are, what they're like, what they want, and what we should do in light of such knowledge, my position is that we currently actually have little idea who our simulators are and can't even describe our uncertainty in a clear way (such as with a probability distribution), nor how such knowledge should inform our actions. It would take a lot of research, intellectual progress, and perhaps increased intellectual capacity to change that. I'm fairly certain that any confidence in the details of gods/simulators at this point is unjustified, and people like me are simply at a better epistemic vantage point compared to traditional religionists who make such claims.

These are the human values that religious people would want the AI to align with. If we can’t develop AI systems that are aligned with these values, we haven’t solved the AI alignment problem.

I also think that the existence of religious values poses a serious difficulty for AI alignment, but I have the opposite worry, that we might develop AIs that "blindly" align with religious values (for example locking people into their current religious beliefs because they seem to value faith), thus causing a great deal of harm according to more enlightened values.

It's not clear to me what should be done with religious values though, either technically or sociopolitically. One (half-baked) idea I have is that if we can develop a good understanding of what "good reasoning" consists of, maybe aligned AI can use that to encourage people to adopt good reasoning processes that eventually cause them to abandon their false religious beliefs and the values that are based on those false beliefs, or allow the the AI to talk people out of their unjustified beliefs/values based on the AI's own good reasoning.

Wei_Dai: good replies. 

I agree that traditional religious beliefs & theology usually show much less epistemic humility than EAs who believe in the Simulation Hypothesis. I was just pointing out that there are some similarities in the underlying metaphysics. And, more intellectually advanced forms of these religions (e.g. more recent Protestant theology, Zen Buddhism) do show a fairly high degree of epistemic humility in not pretending to know a lot of details about what's behind the Simulation.

Your second point raises a crucial ethical challenge for EA. 

When we say that we want AI that's 'aligned with human values', do we really mean aligned with individual people's current values as they are (perhaps including fundamentalist religious values, hard-core ethnonationalist values, runaway consumerist values, or sociopathic values)?

 Or do we mean we want AI to support people's idealized values as we might want them to be? 

If the latter, we're not really seeking 'AI alignment'. We're talking about using AI systems as mass 'moral enhancement' technologies.  AKA 'moral conformity' technologies, aka 'political indoctrination' technologies. That raises a whole other set of questions about power, do-gooding, elitism, and hubris.

So, we better be honest with ourselves about which type of 'alignment' we're really aiming for.

If the latter, we’re not really seeking ‘AI alignment’. We’re talking about using AI systems as mass ‘moral enhancement’ technologies. AKA ‘moral conformity’ technologies, aka ‘political indoctrination’ technologies. That raises a whole other set of questions about power, do-gooding, elitism, and hubris.

I would draw a distinction between what I call "metaphilosophical paternalism" and "political indoctrination", the difference being whether we're "encouraging" what we think are good reasoning methods and good meta-level preferences (e.g., preferences about how to reason, how to form beliefs, how to interact with people with different beliefs/values), or whether we're "encouraging" object-level preferences for example about income redistribution.

My precondition for doing this though, is that we first solve metaphilosophy, in other words have a thorough understanding of what "good reasoning" (including philosophical and moral reasoning) actually consists of, or a thorough understanding of what good meta-level preferences consist of. I would be the first to admit that we seriously lack this right now. It seems a very long shot to develop such an understanding before AGI, but I have trouble seeing how to ensure a good long term outcome for future human-AI civilization unless we succeed in doing something like this.

I think in practice what we're likely to get is "political indoctrination" (given huge institutional pressure/incentive to do that), which I'm very worried about but am not sure how to prevent, aside from solving metaphilosophy and talking people into doing metaphilosophical paternalism instead.

So, we better be honest with ourselves about which type of ‘alignment’ we’re really aiming for.

I have had discussions with some alignment researchers (mainly Paul Christiano) about my concerns on this topic, and the impression I get is that they're mainly focused on "aligned with individual people’s current values as they are" and they're not hugely concerned about this leading to bad outcomes like people locking in their current beliefs/values. I think Paul said something like he doesn't think many people would actually want their AI to do that, and others are mostly just ignoring the issue? They also don't seem hugely concerned that their work will be (mis)used for "political indoctrination" (regardless of what they personally prefer).

So from my perspective, the problem is not so much alignment researchers "not being honest with themselves" about what kind of alignment we're aiming for, but rather a confusing (to me) nonchalance about potential negative outcomes of AIs aligned with religious or ideological values.

ETA: What's your own view on this? How do you see things working out in the long run if we do build AIs aligned to people's current values, which include religious values for many of them? Based on this, are you worried or not worried?

Hi Wei_Dai -- great comments and insights. 

It would be lovely if we could gently nudge people, through unbiased 'metaphilosophical paternalism', to adopt better meta-preferences about how to reason, debate, and update their values.  What a wonderful world that would be. Turning everyone into EAs, in our own image.

However, I agree with you that in practice, AI systems are likely end up (1) 'aligning' on people's values as they actually are -- i.e. mostly religious, politically partisan, nepotistic, anthropocentric, hypocritical, fiercely tribal, etc. , or (2) embodying some set of values approved by certain powerful elites, that differ from what ordinary folks currently believe, but that are promoted 'for their own good' -- which would basically be the most powerful system of indoctrination and propaganda ever developed.

The recent concern about AI researchers about how to 'reduce misinformation on social media' through politically selective censorship suggest that option (2) will be very tempting to AI developers seeking to 'do good' in the world.

And of course, even if we could figure out how AI systems could do metaphilosophical paternalism, religious people have a very different idea of what that should look like -- e.g. they might promote faith over reason, tradition over open-mindedness, revelation over empiricism, sectarianism over universalism, afterlife longtermism over futuristic longtermism, etc.

You didn't mention the Long Reflection, which is another point of contact between EA and religion.  The Long Reflection is about figuring out what values are actually right, and I think it would be odd to not do deep study of all the cultures available to us to inform that, including religious ones.  Presumably, EA is all about acting on the best values (when it does good, it does what is really good), so maybe it needs input from the Long Reflection to make big decisions.

James -- I agree.  Human values as they currently are -- in all their messy, hypocritical, virtue-signaling, partisan, sectarian glory --  might NOT be what we want to upload into powerful AI systems. A Long Reflection might be advisable.

Great article. I’m a devout Christian who believes rewards in the afterlife are based on morality not religion, and I feel the article missed something important about Christianity. According to this 2021 Pew poll, only 44% of American Christians believe that people who don’t believe in God cannot go to heaven.

I also want to mention that for me and many other devout Christians, the afterlife is relatively unimportant. What matters most is trying to glorify God on earth. That essentially means living with the values EAs aspire to: expanding our moral circle, overcoming cognitive biases to be more productive, and making sacrifices to help people. We know very little about the afterlife, but we know a lot about how God called us to live. We should want to glorify God because we love him, not because we expect a reward. I don’t have statistics on how common this perspective is, but according to that Pew poll 8% of Christians don’t believe in heaven at all.

Sonia - thanks for your helpful perspective as a Christian, and the link to the Pew poll (which is fascinating, and I recommend others have a look at it.)

It's helpful to be reminded that there's a big variety of beliefs within each religion about the relative importance of this life versus an afterlife, and the relative importance of specific religious commandments and practices versus more general moral principles.

In thinking about these issues, I think it's important to take a very evidence-based approach to understanding the current distributions of religious beliefs and values, including differences between EAs and non-EAs, and the often big differences across countries, cultures, ages, sexes, social classes, education levels, etc.

On Facebook, the Effective Altruism group has 21,800 members, whereas the three main religiously-inclined EA groups that I could find (Christians and Effective Altruism, Buddhists in Effective Altruism, Spirituality and Effective Altruism) only have 1,700 members in total

There's also EA for Jews. Here's the facebook group :) 

Thought provoking essay, thanks Geoffrey!

Oh cool, thanks for the link to the EA for Jews facebook group. Sorry I missed it!

Strong upvote. Great post.

PS as a preamble to this post on religious values, I'd recommend reading this newer post first: 

Thank you for writing this. While recognizing the important role religion plays in society, I feel that even though you take your preferences seriously, you did not consider the religious world-view and the consequences of it.

 What if, in fact, there is a God? What if a religion is correct? What if there is meaning to the universe? Unless you ask those questions, in my opinion, you are just using religion as a weathervane to determine human values, not actually addressing the religious experience. You are not explaining why religion is so prominent or why it is so profoundly different than a materialist world view.

To prompt some thinking on this:

Why, in fact, are people religious? What if an AGI began to believe in God, or had a transcendental experience of its own that informed its actions? Would you then call it misaligned? How do you think being religious would affect you? 

Your post seems fairly well thought out, and is interesting to me as a person who isn't that interested/invested in AI alignment/development. I have a a few thoughts...

  1. your population estimates talk about growth, but do they take into account the deaths of older generations that might be considered "more religious"?
  2. I am fairly certain that people who are Hindu do not consider Hinduism to be "the one true religion" (or at least the basic teachings to do not say it is), and I know this is the case for other, smaller religions such as Sikhism.
  3. "Christian and Muslims want to go to heaven" - I cannot agree entirely with this statement. My understanding is that if a person accepts Christ as their saviour (and therefore becomes a Christian), they will be rewarded with heaven when they die. They cannot "earn" their way into heaven. However, Christians should strive towards making Earth as close to heaven as possible while they are alive. (I cannot speak to the Islamic beliefs on heaven.) Doesn't this align with attempts to make the world better, more fair, healthier, etc?

Danielle -- good points. 

  1. For net increase or decrease in religiosity in the next decades, you're right that we'd want a more precise demographic model of births, deaths, rates of vertical vs. horizontal cultural transmission for specific religions, etc.
  2. re. Hinduism, I resonate with your sense that lots of Hindus are less inclined to think they're in the 'one true religion' than people in other religions. But I have low confidence in that -- I've only spent 2 weeks in India, have interacted mostly with highly educated Indians, and don't know much about Hindu vs. Muslim conflicts over history, or what they reveal about degree of religious exclusivity.
  3. The issue of 'earning' one's way into heaven has been a source of much contention over the centuries, e.g. the Catholic emphasis on good works vs. the Protestant emphasis on faith. Certainly for religious people who emphasize moral behavior in this life, there might be minimal conflict between religious values and EA values. However, many religious people (perhaps especially outside the US/UK/Europe) might put a heavier emphasis on the afterlife (e.g. in cases of religious martyrdom.)

Thoughtful post, thank you for sharing.  I am only doing an exploratory reading of material covering the intersection of EA and religion at this point, so I can't speak very well the to alignment issue.  I agree with those who suggest that religious ethicists would want the AI to explain its reasoning.

What immediately comes to mind in response to your problem statement is the number of different Christianities and Buddhisms (to name the familiar) that are out there, many with their own theology/doctrine -- and within each, you also have consequential stratification by education level, and then division on the grounds of conservative and liberal interpretations.  I don't think a consensus set of religious ethics would look substantially different from secular ethics, and it may in fact have fewer constraints.

If you want a perspective on the value of AI safety from Christian theology (following the thought of Thomas Aquinas), you could read Stefan Riedener's essay on EA, "Human Extinction from a Thomist Perspective," pp187-210 in the book:

Roser, Dominic, Stefan Riedener, and Markus Huppenbauer. 2022. "Effective Altruism and Religion: Synergies, Tensions, Dialogue." First, Pano-Verlag.

I've wondered if it's easier to align AI to something simple rather than complex (or if it's more like "aligning things at all is really hard, but adding complexity is relatively easy once you get there").  If simplicity is more practical, then training an AI to do something libertarian might be simpler than to pursue any other value.  The AI could protect "agency" (one version of that being "ability of each human to move their bodies as they wish, and the ability to secure their own decision-making ability").  Or, it might turn out to be easier to program AI to listen to humans, so that AI end up under the rule of human political and economic structures, or some other way to aggregate human decision-making.   Under either a libertarian or human-obeying AI programming, humans can pursue their religions mostly as they always have. 

The implicit bets on religiosity in AI safety aren't that it'll be unpopular, only that it won't influence decision making / powerful actors. If/when AGI arises, it'll initially be controlled by govts/companies/citizens of rich countries. Of all the belief systems, AI safety people worry most about confucianism because it has the most real influence on decision making (ccp).

Interesting. I guess I'm a bit confused then about whether 'AI alignment' is really intended to align with what all 8 billion living humans actually value and belief, right now (including quite strong overall support for CCP values among most of the 1.4 billion people in China), or whether it's intended to align with that powerful 'govts/companies/citizens of rich countries' value and believe.