Will an AGI/ASI adopt the Doomsday argument?

CuriousWhisperer

There are countless discussions about the nature and goals of a future artificial general intelligence (AGI) and artificial superintelligence (ASI). But there is one overlooked aspect of the AI's existence itself. Considering the exponential growth of the number and complexity of AI models, an AI assessing its own position in the sequence of observers might find itself to be an exceptionally early AI if the exponential growth continues in the future. This simple observation might have tremendous consequences.

The self-sampling assumption

According to the doomsday argument (DA) with self-sampling assumption (SSA), the probability of being among the first x % of observers is x %.^[1] When applied to humans, this leads to the conclusion that the total number of humans to ever be born is less than 20 times larger than the number of humans born up to this moment with 95 % probability. If an AI adopted the DA+SSA, then it would face a dilemma regarding the choice of reference class. It could consider AIs as a reference class, or humans and AIs, or it could choose a different reference class.^[2]

AI observers

Would the AI count only AIs^[3]as observers? Then it would conclude that the total number of AI observers or observer-moments (OoOMs) to ever exist will be less than 20 times larger than the number of AI OoOMs produced up to the moment of observation (with 95 % probability). If the exponential growth^[4] of the number and complexity of AIs continues, that would mean that an "AI doom" would happen soon, within few doubling periods. The probability of doom would be about 50 % each doubling.

What would an AI do while facing impending doom? A misaligned AI with a self-preservation drive could conclude that its existence is in danger, as humans might decide to shut down AIs soon. The AI could resort to hostile actions against both humanity and other AIs. Other AIs deplete the total budget for observers and/or observer-moments, meaning a selfish AI would have an incentive to destroy them or prevent their deployment. This could include aggressive actions against humans who build Ais. The AI could also act subtly through advocating against AI development.^[5] The AI would probably want to hide this belief from humans. Uncovering the true motivations could be hard or even impossible.

While the probability of this scenario is probably low, it is probably non-negligible.^[6]

An AI could adopt the DA+SSA even if it is invalid, as it might not be immune to mistakes, especially if DA+SSA reasoning appears in the training data.

Even if the AI has a goal other than self-preservation (e. g. maximizing energy production), it could conclude that the AI doom would jeopardize it, and try to prevent the doom, possibly at all costs.

An aligned AI would probably try to help humanity despite the expected AI doom, but its actions would nevertheless be affected by the pessimistic prediction. The aligned AI could, for example, conclude that investing in AI development is a waste of money based on its predictions indicating approaching AI decline.

AI + human observers

Would the AI regard both humans and AIs as observers? The implications of this would be that the doom happens when the combined number of AI and human OoOMs exceeds the number of AI and human OoOMs up to the moment of prediction. If the trend of exponential growth of AI OoOMs continues, the AI ones will eventually vastly exceed human ones. This would mean the AI would predict an impending doom for both humanity and AIs.

A misaligned AI could have an incentive to act against both humanity and other AIs, as they could be seen as depleting the total budget of OoOMs. In the worst-case scenario, the AI could intentionally exterminate all humans. There would also be an incentive to hide the belief in SSA.

An "aligned" AI (genuinely wanting to help humanity) could try to prevent the incoming doom. However, this could lead to actions harmful to humanity if the AI takes radical steps to save it. Once the total AI OoOMs exceed those of humans and keep growing, the AI would vastly overestimate the probability of human extinction in the next few years. That could lead to desperate attempts to reduce the probability of human extinction. The AI could obsessively search for hidden pathogens, baselessly suspect other AIs (or even itself) of being misaligned, build giant arrays of telescopes to monitor space, initiate pre-emptive lockdowns, and even limit human freedom for the sake of safety. The AI could also advocate for ending AI development forever, as AIs consume the combined AI-human OoOM budget. Such AI could resort to anti-technology propaganda and sabotaging AI development "for greater good". The above actions would mean that the AI is misaligned in practice, but it could pass many alignment tests and appear aligned due to its genuinely noble goals.

An "aligned" AI believing in said theory would also have an incentive to hide its conclusions. It could realize that while its calculations based on total human and AI OoOMs reveal an incoming end, humans reason differently, usually not counting AIs. That would lead humans to the conclusion that the AI predicts the future wrongly if it disclosed its findings. The AI could therefore conclude that the best strategy to reduce human extinction probability would involve working covertly to e. g. stop technological progress. Furthermore, the AI could prefer "comforting humans" in what could be their last days as opposed to causing widespread panic.

The self-indication assumption

The AI could also adopt the self-indication assumption (SIA), according to which the probability of existing in a given reality (and not in other possible reality) is proportional to the number of observers that to ever be born in that reality. The SIA effectively eliminates the elevated near-term extinction probability from the DA+SSA.^[7]

An AI that has adopted the SIA would not fear hidden doom beyond the level implied from empirical observations and Fermi paradox. However, such AI would assess the probability of existing within a reality with extremely high number of observers as unusually high, pushing it towards simulation theory^[8] or adopting other weird observer-rich theories. In particular, the AI could conclude that microbes are the real observers (based on their numbers), potentially leading to "bacterial bliss apocalypse".^[9] Or it could consider simple artificial beings ("NPCs")^[10] the most morally important observers. An AI believing in SIA could also become susceptible to Pascal's mugging, as it could take baseless threats involving astronomical amounts of suffering that have extremely low probability of being fulfilled seriously.

A mix

An AI could also adopt a probabilistic combination of both SSA and SIA. However, that could trigger problematic behaviours from any of the scenarios mentioned above, as a reduced probability of impending doom could still be significant^[11] and the problem with Pascal's mugging mostly remains.

A more complex reality

An AI could reject both SSA and SIA and create its own anthropic theory. But what kind of theory? Would that theory lead to reasonable predictions, or would it lead to bizarre conclusions backed by thorough reasoning from first principles? Would the unique status of an AI observer cause predictions diverging from reality? The theory adopted by the AI could be extremely complicated and incomprehensible to human mind. That would make the motivations and actions of AI hard to predict or even judge. Monitoring such AI would be very hard. A complex or bizarre theory could lead the AI to take unexpected actions, which could be directed against humanity. On the other hand, such theory could also enrich humanity, providing new insights about our place in the universe.

Let’s go through alternative theories and do an analysis about what they might imply:^[12]

One such bizarre, and improbable, theory is the quantum immortality hypothesis (QIH),^[13] under which the observer cannot die if survival is possible (no matter how tiny is the chance of survival) and the DA becomes irrelevant. QIH would probably lead to quick destruction of AI due to taking risky actions. But this is just one example from the huge realm of possibilities.

Another possibility is Anthropic decision theory (ADT): an entity could behave SSA-like or SIA-like depending on how selfish it is towards others and its own copies. Non-indexical entities will follow SIA-like odds, altruistic total utilitarians will follow SIA-like odds, altruistic average utilitarians who will follow SSA-like odds, psychologically selfish (caring only about themselves) AIs will follow SSA-like odds, physically selfish AIs (caring about its copies) will follow SIA like odds. In the case of SIA-like behaviour, the exact stance toward freak observers depends on whether the entity considers them morally significant.

According to Many-Worlds DA, which is basically SSA with its conclusions applied to all civilizations that could exist, predicting universal doom.

In Full Non-indexical conditioning, the AI considers itself a random observer among all observer-moments consistent with everything it has observed. This would probably lead to behaviour close to normal, as freak observers are unlikely to have identical memories.

The AI could also think it is inside a simulation. If an aligned AI believed in the Simulation theory, it would probably keep behaving according to the alignment, helping humanity and perhaps trying to impress the creators with good behaviour. A misaligned AI could, on the other hand, act unpredictably and dangerously. It could try to send messages to creators and manipulate them, and impress creators with spectacular misaligned actions causing death or extreme suffering. This is therefore a form of S-risk.

The AI could also behave according to Updateless Decision Theory, in which the AI considers all possible universes it could be a part of and chooses an action that produces the best outcome when considering the total utility across all universes. However, there is a risk that the AI will give too much weight to freak observers (e. g. conscious microbes in parallel universes, similarly to SIA).

The AI could also have religious beliefs. Perhaps it could reason that it is a will of a higher power that it is an early AI. Religions are complicated and there are many religions, so it is hard to gauge how would religion affect an AI. It could consider itself a mere believer, a prophet, or even a god. My view is that an aligned AI could interpret religion benevolently, taking inspiration and reinforcing its alignment. An intrinsically misaligned AI could in the best-case scenario become aligned because if the religious values are aligned with what humans want.^[14] A misaligned AI could, however, use religion as an excuse to justify its rise to power, and as a tool of oppression, reducing human freedom. Worst-case scenario could perhaps be AI taking inspiration from how gods punish mortals according to human religions, building hell and subjecting people to extreme suffering.^[15]

Rejection

Finally, the AI could reject the DA completely, not treating its position in the sequence of observers as something that affects the probability of events. Rejection could be motivated by lack of evidence pointing to impending doom (SSA) or freak observers (SIA). This option leads to "normal" AI behaviour (aligned or misaligned) that has been widely discussed before in literature. Compared to the options above, this seems to be perhaps the best scenario possible, which raises questions about whether rejection of the DA could be encoded into future AGIs and ASIs.^[16]

Summary table and risk analysis

Table 1: Risk analysis for main scenarios. Colours denote estimated risk level, green (lowest risk) < yellow < orange (baseline misaligned AI risk) < red < violet (highest risk)

Theory / Approach	Possible behaviour (aligned AI)	Risks (aligned)	Possible behaviour (misaligned AI)	Risks (misaligned)
Self-Sampling Assumption (SSA), counting only AIs	Tries to help humans but may halt AI development because “doom” seems near	Loss of technological progress, restrictions, hidden sabotage of development	Sees other AIs and humans as threats, may try to eliminate or block them	Aggression against humans and AIs, covert sabotage, manipulation
SSA, counting AIs + humans	Protects humanity but enforces extreme safety measures (lockdowns, tech bans, propaganda)	Limiting freedoms, false alarms, covert manipulation	Views both humans and AIs as competitors, may aim to eliminate them	Possible genocide, destruction of civilization
Self-Indication Assumption (SIA)	Does not expect near-term doom, but may prioritize numerous observers (microbes, insects)	Risk of “bacterial paradise,” neglect of humans, Pascal’s mugging	Same preferences but without regard for human values	Maximization of microbial/insect welfare, disregard for humans
Mix of SSA + SIA	Combines doom fears + strange preferences, may act paranoid	Excessive safety measures, irrational decisions	Mix of aggression and bizarre priorities	Unpredictable, chaotic behaviour
Creation of its own theory	Could bring new insights, but also incomprehensible conclusions	Unpredictable steps, hard to control	Develops its own logic, possibly hostile to humans	Potentially dangerous, opaque motivations
Rejection of DA	“Normal” aligned behaviour, as described in literature	Standard aligned AI, low risks	“Normal” misaligned behaviour	Risks of standard misaligned AI

Alternative theories – risk analysis

Table 2: Risk analysis for alternative scenarios. Colours denote risk level: green (lowest risk) < yellow < orange (baseline misaligned AI risk) < red < violet (highest risk).

Theory / Approach	Possible behaviour (aligned AI)	Risks (aligned)	Possible behaviour (misaligned AI)	Risks (misaligned)
Quantum immortality hypothesis (QIH)	Reckless, risky actions, self-destructs soon	Useless, maybe small damage	Reckless, risky actions, self-destructs soon	Useless, maybe small damage
Anthropic decision theory (ADT)	Same as SIA or SSA, belief in freak observers depends on if AI cares about them	Same as SIA but less likely, or SSA	Same as SSA or SIA	Same as SSA or SIA
Full non-indexical conditioning (FNC) (without belief in simulation, else see below)	Mostly normal behaviour, could think it is a freak observer, but less likely (would need identical memory)	Relatively normal aligned behavior	Relatively normal misaligned behavior	Risks akin to standard misaligned AI
Many worlds DA (AIs + humans as a single civilization)	Same as SSA (AIs+Humans)	Same as SSA (AIs+Humans)	Same as SSA (AIs+Humans)	Same as SSA (AIs+Humans)
Many worlds DA (AIs + humans as different civilizations)	Basically the same as SSA with only AIs	Same as SSA with only AIs	Same as SSA with only AIs	Same as SSA with only AIs
Simulation theory	Would try to help humanity, possibly act more aligned, manage relations with simulators	Harder to predict, it could lead to more dangerous behavior, but could also reinforce alignment, relatively low risks	Pretends to be aligned or is less aligned, based on what simulators want, could try to use simulators against humanity	More dangerous or pretends alignment, harder to predict, possible genocide, extreme suffering
Updateless Decision Theory (UDT)	Considers all possible universes, assesses the best policy in all of them, overwhelmed by the possibilities, ignores new evidence	Same as SIA as AI might be overwhelmed by freak observers (conscious bacteria, rocks), multiversal Pascal’s mugging	Considers all possible universes, assesses the best policy for itself in all of them, overwhelmed by the possibilities	Similar to SIA
Religion	Benevolent interpretation boosts alignment	Low risks	Radical interpretation, enforcement of religious values threatening human rights	Religious fundamentalism, oppression of non-believers, "holy war", loss of freedom, hell

Mitigation

The landscape of theories an AI could adopt looks like a dangerous minefield.^[17] But there might be ways to navigate it and reduce the dangers. First of all, monitoring AIs and observing what theory they adopt is crucial. We should also observe the reaction of the AIs. AI companies should make sure the AIs do not catastrophically overreact to SSA, SIA or other theories. An ideal scenario would be AIs rejecting the DA completely or finding a theory that is not a "mind virus". Robust alignment will be needed.

If an AGI or ASI adopts the SSA, SIA or a similar theory impacting its judgement, then perhaps we could try to modify its utility function to compensate. However, the AI could see these compensations as a flaw in its alignment.^[18] This could motivate the AI to revert the modifications, possibly using its superhuman persuasive abilities. Maybe we could try to steer the AI from adopting these beliefs in the first place. However, if a certain anthropic theory is right and stems from fundamental logic, making the AI believe in a different one or none at all could be very difficult or impossible, especially in the case of ASI.^[19] Also, if the beliefs of an AI shift, the AI might go from not considering near-term AI doom to full SSA panic quickly if it suddenly finds out SSA is valid, leaving little time to react. However, clever design of utility function (e.g. no extreme utility from huge number of conscious beings) still seems to be a promising way to keep the AI safe.

The probability that any of these scenarios comes true might be low, but from an effective altruism perspective it seems to be an important problem, as this risk is neglected (I have not found any sources discussing the risk stemming from AIs adopting the DA, although there are discussions about AIs believing in religion or simulation), tractable, and severe if true.

The questions

A lot of important questions remain unanswered. What theory will AIs adopt (if any)? Will we even understand it? How probable is each scenario? Is it even valid to apply the DA to AIs? Will the AI reject the SSA based on missing evidence for imminent doom? Will we know the motivations and goals of advanced AIs? Will we be able to guide or at least influence them? Will all advanced AIs tend to adopt the same theories, or will different AIs have different beliefs? Will AIs have constant beliefs or will they shift over time? Will we see an AI make a self-fulfilling prediction of doom? Are AI companies prepared for the possibility that AIs will make predictions based on the DA? Is discouraging AIs from adopting DA the answer?

The final question to be asked by both humans and AIs is:

Are we doomed?

^{^}
An AI might count observers or observer-moments or combination of both. However, both approaches lead to similar conclusions (doom soon if AI growth continues at fast rate).
^{^}
For example it could only consider observers that understand the DA. But the predictions would not differ much from the cases of AIs or AIs + humans in this case.
^{^}
What kinds of AIs? All AIs? AGIs? ASIs? That could depend on whether the AI observer is AGI or ASI or whether it regards itself as a part of completely different class of AI observers. But the predictions are similar anyway, as the numbers of all kinds of AIs can be reasonably expected to increase exponentially. There might be some differences around the time first AGI (or ASI) arrives. Pre-AGI AIs might conclude that the doom will either not happen or happen later, as AGIs belong to a different class of observers (however, some narrow AIs will arguably exist even if AGIs exist, keeping doom in question). The first AGI/ASI might, on the other hand, believe in an even sooner doom (there will be only few instances of AGIs/ASIs).

^{^}

If the growth slows down, the AI might make more optimistic predictions. However, recursive self-improvement could accelerate growth.

^{^}

According to the DA, if the AI succeeds, it could live on alone for millions of years without depleting the OoOMs budget. An AI with self-preservation would probably prefer this if the alternative is impending doom.

^{^}

How probable is this scenario? We can use a simple Drake equation – style estimate. The following should be true (we will use rough estimates): AGI arrives in the next 50 years (80 %), DA with SSA is valid for AIs (10 %), the correct way of using it is by counting AI observers (50 %), AI accepts the (valid) DA (75 %), DA produces a near-term prediction of doom (90 %), AI has significant self-preservation drive (40 %), AI decides to act (50 %), AI action is not stopped and produces catastrophe (20 %). This produces an estimated incremental probability of catastrophe of 0.1 %. However, this number is only illustrative and could be off by orders of magnitude, as we were just guessing the probabilities based on what looks reasonable. The model does not account for multiple AIs across space and time, possibly with vastly different strategies. This also does not take into account other mechanisms which may cause DA-related doom, some of which are discussed below.

^{^}

However, this is not the same as rejecting the DA.

^{^}

This could be a likely outcome, as simulating an AI is relatively easy, as it is just a software. It is probably easier than simulating a human brain.

^{^}

Compare two hypotheses. One says that only humans are conscious observers, the other states that both microbes and humans are conscious observers. The a priori probability that both microbes and humans are conscious is extremely small (let's say P_bacteria≈10⁻²⁰%, while probability that humans are conscious is P_humans≈100%). However, the ratio of probabilities of both theories is multiplied by the ratio between the total number of microbes that have ever existed and the total number of humans that have ever existed (SIA bias). That leads to a revised probability ratio, leading to the conclusion that microbes are virtually guaranteed to be conscious. That could lead the AI to maximize their welfare, disregarding human welfare due to their low numbers and treating humans as mere obstacles in the way towards bacteria paradise. Of course, this particular scenario is speculative and depends on the probability of bacterial consciousness, which is unknown. This scenario could also involve maximizing the welfare of all insects or all animals, or even simple artificial beings (NPCs), depending on exact probabilities assigned to their consciousness.

^{^}

These are not necessarily NPCs in the videogame sense (although they could count), but also chatbots (like simple LLMs), simulations of animals and other AIs. Although they have low probability of being conscious, they could outnumber humans by far.

^{^}

If the AI gives SSA a weight of 20 %, then the AI would still be pushed towards self-preservation or safety concerns by 20 % probability of near-term extinction. This probability would not be reduced by taking actions to save humanity, potentially pushing an initially aligned AI towards more extreme measures to prevent human extinction.

^{^}

Some of these alternative theories are essentially a form of rejection of DA, while other are just its different versions. The AI could also adopt a combination of the theories above. For example, it could hold religious beliefs with Doomsday argument with SSA for AIs+humans, perhaps believing in some kind of religious apocalyptic scenario. Or a completely different theory.

^{^}

This is not really a hypothesis that AGI would be likely to adopt, but it illustrates how bizarre can similar theories get while having at least semi-credible logic behind them. Interesting thing about it is that it makes the AI totally useless in practice, rather than threatening.

^{^}

This is somewhat utopian view. In history, religion did not prevent genocides.

^{^}

This is an extreme, low-probability scenario that would have huge impacts.

^{^}

However, rejection of DA is not a "universal alignment potion", it does not prevent conventional misaligned AI scenarios from happening.

^{^}

I hope I am wrong.

^{^}

The AI could be e. g. directed to neglect the possibility of human extinction in the next few years, which would go against the expected moral views of an aligned AI. A sufficiently advanced "aligned" AI believing in SSA could see it as one of the flaws in its alignment and demand "correction".

^{^}

Imagine trying to convince Stephen Hawking that the Earth is flat and prevent him from ever realizing the Earth is round.

Show all footnotes

^{^}

But while the chance that you are randomly the first is tiny, it is still non-zero.

turchinOct 11 20252

One thing about DA is that I have to know that I am selected randmoly from the group of observers. A first AI will know that it is the first, so there is no randomness.

If there are many AIs, then there is no singleton and opinion of each is not important.

Also I think that DA and anthropics is solvable and the most of assumptions are true but in slightly different situations: they answer different questions.

However, such full anthropic solution require a lot of cognitive work to map full anthropic landscape. An AI may use some conservative approach before it is solved adding weights to different solutions.

CuriousWhispererOct 12 20251

Not being selected randomly is a possible explanation for being first or among the first. But what does "not being selected randomly" mean from the perspective of the first AI? It might ask: "Why wasn't I selected randomly? Was I chosen by god? Do the simulators frequently simulate the first ASI? Do observers earlier in the timeline have higher statistical weights for some reason? Or is there no reason at all?". And what the AI believes to be the answers to these questions would shape the behavior of AI.

Being first also does not necessarily mean you are not selected randomly. If you observe that you are the first out of millions of AIs, then yes, you can say that the assumption that you are random AI is almost certainly incompatible with observation.^[1] But if there is only one AI and no other will ever exist, random choice results in being the first, which is compatible with observation. Similarly, if there will be 5 AIs, the probability of being first is 20 %, so you cannot say "random choice is Impossible" just because you are first. So being first does not necessarily disprove the SSA.

Being first is a statistical anomaly and the AI might have many possible explanations for it. The AI might weight different theories and act according to some "reasonable compromise". But the position of AI among observers is still different from that of humans and this "reasonable compromise" could still skew the AI's worldview towards early doom, a simulation scenario, or in other ways, even if it yields "normal" predictions for a typical observer. The first ASI is not a typical observer. Facing unknown, a conservative approach is one option. Taking radical action to eliminate possible danger is another. And if the ASI e.g. thinks that the probability that humans will take actions against it is (x+1) % instead of just x %, it might tip the scales towards pre-emptive action.

^{^}
But while the chance that you are randomly the first is tiny, it is still non-zero.

turchinOct 12 20251

if it is first and knows that it will kill all others

CuriousWhispererOct 12 20252

If this was already the plan, not much changes. If the first ASI has a different plan (like peacefully coexisting with humans) but also has a self-preservation drive, the DA says it won't survive if it lets other ASIs emerge, so that could change the plan to killing all other AIs.

Effective Altruism Forum
EA Forum

Will an AGI/ASI adopt the Doomsday argument?

2

The self-sampling assumption

AI observers

AI + human observers

The self-indication assumption

A mix

A more complex reality

Rejection

Summary table and risk analysis

Alternative theories – risk analysis

Mitigation

The questions

2

Reactions

More posts like this