Epistemic status: This text presents a thought experiment suggested by James Miller, along with Alexey Turchin's musings on possible solutions. While our thoughts are largely aligned (we both accept high chances of quantum Immortality and the timeline selection principle), some ideas are more personal (e.g., Turchin's "transcendental advantage") in Part 2.
TL;DR: If quantum immortality is true, I will survive AI doom either by unlikely luck or because p(DOOM) is small. Knowing that I will survive anyway, can I bet that P(doom) is small? Can we now observe a "future anthropic shadow," such as a Taiwan war, which would slow AI development?
Part 1. Thought experiment
Guessing the digit of π via quantum immortality
Before sleeping, I try to guess the 10th digit of π, presently a mystery to me. After falling asleep, seven coins will be flipped. Assume quantum uncertainty affects how the coins land. I survive the night only if I correctly guess the 10th digit of π and/or all seven coins land heads, otherwise I will be killed in my sleep.
Convinced of quantum immortality, I am confident of surviving the night. How then should I expect my future self to likely rationalize this survival? According to simple Bayesian reasoning, the most probable cause for my survival would be accurately guessing the 10th digit of π because I face a 10% chance of correctly guessing a digit of π but only a 1 in 128 chance of surviving because all coin lands heads. However, this suggests that before sleeping, I ought to consider my guess regarding 10th digit of π as probably correct, a concept that appears nonsensical.
Quantum immortality should influence my belief about whether the future me will think all the coins came up heads because my consciousness is more likely to persist in the branches of the multiverse where this happens. But quantum immortality should not affect whether future me thinks I have already guessed the 10th digit of π correctly because the accuracy of my guess is consistent across the multiverse. By this chain of logic, if I am convinced future me will survive, I should think it far more likely I will survive because of the coin flips than guessing the 10th digit of π correctly.
Now imagine that I am an AI doomer who thinks there are two ways I will survive: (a) if I am wrong about AI existential risk, (b) if humanity gets extremely lucky. Furthermore, assume that (a) is not influenced by quantum luck, but (b) is. Imagine I estimate (a) at 10% and (b) at 1/128. If I am convinced of quantum immortality, I assume that (a) and/or (b) will occur. Which possibility should I consider more probable?"
In short, we have three basic ways of handling the paradox:
(1) Give up on estimating probabilities (or just ignore QI).
(2) Bite the Bayesian Bullet and accept I can use quantum immortality to have a very accurate prediction of a digit of π, and
(3) “Anthropic future shadow”: future events can manifest themselves now if they help my future survival, e.g. the current development of life extension technologies. In AI Doom, future anthropic shadow can manifest itself, for example, as higher chances of war around Taiwan which presumably would slow AI development.
(The difference between 2 and 3 is that they give different interpretations, while giving similar predictions)
(3) should be seriously considered because (1) and (2) are so unsatisfactory.
Yudkowsky used a similar experiment with guessing the π digit to claim the inconsistency of anthropics in general.
Giving Up on Estimate Probabilities
Perhaps the notion of quantum immortality makes it impossible to estimate probabilities, and so AI doomers who believe in quantum immortality should not seek to estimate their likely cause of survival. But giving up has significance compared to going with straightforward Bayesian probabilities.
Assume I am very status-conscious and would only publicly support AI doomers if it bolsters my future reputation for wisdom. If humanity survives just due to quantum luck, validating the AI doomers' accuracy, future generations may well perceive them as wise, as it will be apparent that we only survived because of amazing luck. On the other hand, if AI doomers are proven incorrect, they will be deemed foolish by posterity. Thus, demonstrating that simplistic Bayesian estimation often overreaches might persuade status-conscious individuals to endorse the AI doomer viewpoint.
This issue might also be relevant to investment strategies. Imagine that, assuming the AI doomers are right, AI will likely become much more powerful in the short run. This is largely due to a primary reason for the doomers' potential miscalculation: AI might only reach human-level intelligence in several vital areas. Assuming the doomers are correct yet humanity survives through quantum luck, a long-term investment in an AI-heavy company like Microsoft would yield the highest returns. Since I will only benefit from my long-term investment if humanity survives, giving up on estimating the likely causes for my survival would make it nearly impossible to develop an optimal investment strategy.
Part 2. Anthropic Reasoning
(The rest of the article is mostly the work of Alexey Turchin.)
1. It is actually dilemma
Yudkowsky wrote a similar argument in his famous The Anthropic Trilemma about manipulating future observable probabilities by creating many copies and later merging them. The trilemma is the following:
1. Bite the bullet: "You could say, 'There's no reason why you shouldn't be able to exert anthropic psychic powers.'"
2. You will be a winner in 5 seconds but a loser in 15.
3. No-continuing-personal identity: "That there's any meaningful sense in which I can anticipate waking up as myself tomorrow, rather than Britney Spears."
And two additional possibilities: "The fourth horn of the trilemma... would be denying that two copies of the same computation had any more weight of experience than one" and the use of the quantum measure procedure which does not allow cheating this way.
The solution of the paradox discussed here can also be presented as a trilemma:
(1) Give up on estimating probabilities.
(2) Bite the Bayesian Bullet and accept that I can use quantum immortality and suicide to have a very accurate prediction of a digit of π.
(3) "Anthropic future shadow": future events can manifest themselves now if they help my future survival, e.g., the current development of life extension technologies, but it works only for non-deterministic events.
There is an obvious similarity between our trilemma and Yudkowsky's. Actually, both trilemmas boil down to dilemmas: we either bite the bullet in some form or accept inconsistency in probabilities and/or personal identity, that is, we are paying a theoretical cost. In our case, (3) is also a type of accepting that something weird is happening, and all counterarguments boil down to (1).
In Yudkowsky's trilemma, he either bites the bullet that he can manipulate probabilities, OR accepts that either probability updating or consciousness continuity is wrong.
Biting the bullet in our case is something to seriously consider (3): that I observe the world where I have higher future survival chances.
2. War in Taiwan and “future anthropic shadow”
It was suggested (by gwern) that a possible war in Taiwan would cause hardware shortages which will pause AI development globally, and that US sanctions on China’s AI tech increase the chances of such a war. Moreover, commentators suggested that the fact that we are in such a timeline can be explained by quantum immortality (they incorrectly use the wording “anthropic shadow” which originally was used by Bostrom to denote something like survivorship bias – that is, the underestimation of past risks, but not the change in the future probabilities caused by quantum immortality; let’s call this modified idea “future anthropic shadow”:
‘This is also related to the concept of an anthropic shadow: if artificial intelligence was to cause human extinction but required a lot of computing power, you would be more likely to find yourself in world lines in which the necessary conditions for cheap computing are not met. In such world lines, crypto miners causing a GPU shortage, supply chain disruptions due to a pandemic, and a war between the United States and China over Taiwan in which important chip fabrication plants are destroyed are more likely to occur in world lines that are not wiped out. An anthropic shadow hides evidence in favour of catastrophic and existential risks by making observations more likely in worlds where such risks did not materialize, causing an underestimation of actual risk’ https://twitter.com/XiXiDu/status/1582440301716992000
We can define “future anthropic shadow” as finding evidence now that you will survive via QI an impending future catastrophe in the future.
Note that there is a significant difference between ‘AI Doomers are wrong globally because alignment is easy” and this idea. AI hardware shortage will happen only in some worlds: it is not a deterministic outcome.
However, hardware shortages are not completely equal to random coins in our thought experiment: hardware shortages may already happen, but the coins will be tossed only in the future. Thus, hardware shortages are more deterministic in the sense that we already know that they are here (assuming for the sake of the argument that such shortages are real – it looks like NVIDIA will produce 3 million H100 GPUs in 2024 – but the risk of the war in Taiwan remains high plus recent earthquake swarms indicate a high risk of a natural disasters hindering AI progress.)
In some sense, future anthropic shadow is a reverse version of Doomsday argument: instead of “I live in a world which will end soon”, we have “I live in the world best suitable for survival”.
We may address this in future writings, but there is an important difference between AI Doomers' thought experiment and War in Taiwan – the first is predicting universal distribution, and the second is only about local circumstances. This type of difference appears many times in discussions about anthropics, like discussions about SIA, Presumptuous Philosopher and local vs Universal Doomsday argument.
3. The counterargument based on path-dependent identity
One can suggest the following counterargument to the proposed thought experiment:
• If my π-guess is wrong, my only chance to survive is getting all-heads.
• With 0.9 probability, my π-guess is wrong (but I will survive anyway), so I will survive because of all-heads.
• The low chances of all-heads don't matter, as quantum immortality will "increase" the probability to 1.
• So, I should expect my guess about π to be wrong and be more likely to survive because of random tossing of all-heads.
The argument is based on counting not the final states of the experiments, but the paths to the final states: if I am in the path with a wrong π digit, I will survive anyway, but by another mechanism.
Path dependence often appears in thought experiments about copies. Another example where the way of calculating copies affects the result: if 10 copies are created from me simultaneously, my chances of being each will be 0.1. But if each copy is created from a previous copy, then the last copy will have only 1 in 1024 chances of being me. The difference here is similar – we either follow paths or calculate probabilities by comparing the pools of resulting copies. The difference depends on the nature of personal identity – is it path-dependent (continuity as a carrier of identity) – or state-dependent?
Note that quantum immortality based on MWI is path-dependent, but big-world immortality based on chaotic inflation is state-dependent. Calculating probabilities in big-world immortality is more difficult as we don't know the distribution of all possible worlds, including simulations and non-exact copies. A deeper answer here would require an understanding of the relationship of continuity, qualia, and identity, which is a difficult question outside the scope of this paper.
In this thought experiment, we get different probabilities depending on the order in which we compute anthropic effects, which is a rather typical situation for anthropic paradoxes – e.g., Sleeping Beauty.
In other words:
- From the outside point of view: 9 out of 10 of my copies survive because they guess the π-digit correctly.
- From my point of view: there is only a 1 in 10 chance of surviving by guessing π correctly; if I guess incorrectly, I am sure to survive because of coins.
The Self-sampling assumption states that I am randomly selected from all my copies (in some reference class). If applied to survivors, it supports the outside view, but not an inside-path-dependent view. But Bostrom suggested the Strong SSA, in which not observers, but observer-moments are selected. SSSA is not path-dependent. Bostrom applied it to his "hybrid solution" in Sleeping Beauty. SSSA also creates strange anthropic effects – see Turchin's recent post "Magic by forgetting."
However, abandoning SSSA also has a serious theoretical cost:
If observed probabilities have a hidden subjective dimension (because of path-dependency), all hell breaks loose. If we agree that probabilities of being a copy are distributed not in a state-dependent way, but in a path-dependent way, we agree that there is a 'hidden variable' in self-locating probabilities. This hidden variable does not play a role in our π experiment but appears in other thought experiments where the order of making copies is defined.
In other words, both views produce strange probability shifts: SSSA over future states provides the ability to guess a digit of π, and the path-dependent view gives strange probabilities based on the way copies are created.
An interesting question arises: Are path-dependent and state-dependent views similar to the SSA and SIA dichotomy? The state-dependent view clearly looks like SSA. SIA uses the mere fact of my existence as evidence (of a larger group), so there appears to be a similarity between SIA and path-dependent identity, which assumes an externally invisible "measure" of existence.
It is tempting to apply this line of reasoning to the Sleeping Beauty problem – in a nutshell, SB is about path-dependency – at the first step, two copies are created using a coin, and after that, the tail copy is split by choosing the day of the week (halfers). Or all three copies are created simultaneously (thirds).
Conclusion: In the state-dependent model, we get a paradoxical ability to predict the future, but this is a well-known feature of SSA: even the Doomsday Argument, which is based on SSA, predicts the future.
The hidden subjective (path-dependent) part of probability makes 'future anthropic shadow" hypothetically possible. But we haven't proved it yet, as it is still not clear how the measure will move back in time.
One way this could happen is if the subjects of selection are not observer-moments, but whole paths: in that case, "fatter" paths with more observers are more likely, and I should find myself in the observer path which has more observers in the future.
I call this view the "two thirder position" in SB: In that case, I update on the fact that there are more tails than heads but later do not update on the fact that today is Monday. I will wrote separate post about this idea.
4.God incubator and logical anthropic shadow
Another difference between P(Doom) and π digit guessing is that in the whole universe there will be many π -guessing-experiments and there will always be survivors, but in the case of "easy alignment" it is applicable to any civilization and there are no regions of the multiverse with different results.
Surviving through 'easy alignment' is different from surviving via guessing the correct digits of π, as the whole world history will be different in the easy-alignment-world; for example, neural networks will be more effective than symbolic AI. Thus, my surviving variant will not be an exact copy of me in the worlds where I will not survive, as I will know about what is going on in the AI field. But type-me, that is, my psychological sameness, will be the same, as my identity core is not affected by my knowledge about the news in the AI field (This may not be true for a scientist who has devoted herself to a particular field of AI that has become part of her personality core, like neural nets for Hinton.) Here we want to say that quantum immortality works not only for exact copies but for type-copies too when some known information is not used for self-identification, and this is not a problem for our thought experiment.
The problem with the experiment with π is that in other regions of the universe, there are worlds absolutely similar to ours, but the experiment is performed on another digit of π. Thus, there is a class of my type-copies who win in similar worlds even if I lose here. But it is difficult to imagine a non-arbitrary variable that affects the distribution of my copies in all possible worlds, and AI alignment difficulty is one of them (see more in the section "other x-risks").
This is similar to the Incubator gedankenexperiment (God incubator thought experiment) by Bostrom discussed by Perrira in the sense that the number of copies is pre-defined but you just don't know how: in this experiment, God creates some number of copies and nothing else exists anywhere, so I should not think about my copies in other variants of the experiment. In the experiment, God flips a coin and creates either 1 copy on heads or 1000 copies on tails. What should be my estimation of the result of the coin toss based on the fact that I exist at all? It either remains one-half (as the fact of my existence doesn't provide any new information, the non-updating position) or is 1000/1001 (as I am more likely to be in the position where I am one of many copies, the updating position.)
Expecting a low a priori probability of AI Doom based on QI is similar to the updating position in the God incubator thought experiment. It is a much stronger claim than just the future anthropic shadow, which acts "locally", and says that only in our timeline do I observe more chances to survive. In other words, the future anthropic shadow predicts only the random part of survival – the 7 coins tosses in our initial thought experiment, as if I have a premonition about how the coins will land. Observing increasing chances of war in Taiwan is an example.
If I survive because P(AI doom) is low universally, there is no need for coincidence-based anthropic shadow, like wars: alignment will be easy and/or AI will be inherently limited. Though there can be a logical anthropic shadow: I will observe that AI is producing diminishing returns or that some alignment method works better than expected. If I were Gary Marcus, I would say that this is what happens with neural nets and RLHF.
Note that both shadows may work together, if P(AI doom) is small but not zero.
5. Other universal x-risks similar to AI Doomers
There are several deterministic x-risks-related factors which affect the survival chances of any civilization (they will also help to explain the Fermi paradox as they apply to all civilizations, if they are bad):
- Timing of AI relative to the timing of other disruptive technologies (likely bad if it is too long).
- General tendency of different x-risks to interact in a complex and chaotic manner. Chaos is bad for x-risk prevention.
- The general ability to prevent x-risks by a civilization and more generally, the ability to cooperate.
- Some more concrete but universal things: false vacuum decay, biological risk easiness.
If we expect that universal AI Doom probability should be low.
6. Transcendental advantage
Generalized QI and no free lunch
The idea of quantum immortality in a nutshell is that the personal history of me, the observer, is different from the average person's history – I will achieve immortality. But there is no free lunch here – it could be a bad quantum immortality, like eternal aging without the ability to die.
What we have suggested here could be called 'generalized quantum immortality" – and it is even better news, at first glance, than normal QI. Generalized QI says that I am more likely to be born in a world in which life extension technologies will be successfully developed in my lifetime, so bad QI like eternal aging is unlikely. It is a "future anthropic shadow" but for immortality.
However, even generalized QI doesn't provide a free lunch, as it doesn't exclude s-risk worlds.
I am most likely to be born in the universe where life extension is possible
If we think that updating the probability that I correctly guess π before going to sleep is the right line of reasoning, then all hypotheses about the nature of the universe which increase my survival chances must also be true. For example, if I survive for 10,000 years, I shouldn't be surprised to have been born into a world conducive to my survival.
For example, there are two main theories of aging, and one of them makes life extension easier. This is the theory that aging is a program (and it is a general evolutionary principle everywhere in the multiverse), and therefore it will be much easier to stop aging for any type of organism just by finding the correct switch. Alternatively, aging may have many mechanisms, which are pre-synchronized by evolution, and in that case, fighting aging will be extremely difficult. (See more in Turchin’s [No theory for old man]). Applying the same logic, as for AI Doomers, we should conclude that the first theory is more likely, as aging will be defeated sooner, and more likely during my lifetime.
An alternative view is that some local properties increase my personal chances of survival:
- I was born in a period of history when life extension technologies are likely to appear (this could be explained by confounders, like thinking about anthropics naturally coinciding with the development of life extension technologies).
- Or that my personal life history already has elements which ensure chances of my more likely survival (interest in life extension, cryocontract – but all this again can be confounders).
This includes not just beliefs but also unknown circumstances. This leads to a situation which I term 'transcendental advantage': if all unknown factors favor me, I should be in a highly advantageous position for extended survival. I should find myself in a world where life extension and mind uploading are imminent, where AI doomsday scenarios are false, and where I will eventually merge with an eternal AI. Some of these conditions may already be true.
Transcendental advantage: attractor in the space of measure
We can take one more step beyond generalized quantum immortality (QI). For this, we need to remind the reader of the idea of 'measure'. This concept originated from quantum mechanics, initially denoting something like blobs of probability or amplitude, but ultimately settled as an amount of existence of an observer or reality fluid.
The measure can be defined as follows: If there are two identical observers, but one has a higher measure, I am more likely to find myself to be the one with a higher measure. It is similar to the Ebborians described by Eliezer Yudkowsky – two-dimensional beings with different thicknesses: thickness here represents the measure.
If my timeline splits, my measure declines.
Now I can state the following: An observer who lives longer has a higher measure in time. Therefore, QI can be reformulated: I will have a future with the highest measure in time. However, we can drop 'in time' here, as the measure is not necessarily accumulated only in time. If there are several copies of me in the future, I am more likely to be the one with the highest level of reality fluid or measure, by definition.
This means that my personal life history has an attractor – a being with the highest possible measure among all possible Many-Worlds Interpretation (MWI) timelines. Who is it? Is it God? Not necessarily. Another idea is that I will merge with a future superintelligent AI which will also be able to cooperate between MWI branches and thus increase its measure – in a way described by Scott Alexander in The hour I first believed. In some theories, measure can grow if MWI timelines fail to split: If theories that consciousness causes wave function collapse are true (as proposed by David Chalmers), an observer may accumulate measure (e.g., by withholding the act of measurement and not splitting its timeline) – but this is purely speculative.
I call this idea transcendental advantage – the notion that the observer's fate will slowly but inevitably bend towards becoming god-like. I call “transcendental” because it is only observable from first-point perspective, but not objectively. This may sound absurd, but it is similar to the anthropic principle, projected into the future. In the anthropic principle, the whole set of properties of the observable universe including the existence of neutron stars, supernovas, and Jupiter, as well as the entire history of biological evolution and civilization, are necessary conditions for my appearance as an observer who can think about anthropics.