I am currently working on a research project as part of CEA’s summer research fellowship. I am building a simple model of so-called “multiverse-wide cooperation via superrationality” (MSR). The model should incorporate the most relevant uncertainties for determining possible gains from trade. To be able to make this model maximally useful, I would like to ask others for their opinions on the idea of MSR. For instance, what are the main reasons you think MSR might be irrelevant or might not work as it is supposed to work? Which questions are unanswered and need to be addressed before being able to assess the merit of the idea? I would be happy about any input in the comments to this post or via mail to johannes@foundational-research.org.
An overview of resources on MSR, including introductory texts, can be found on the link above. To briefly illustrate the idea, consider two artificial agents with identical source code playing a prisoner’s dilemma. Even if both agents cannot causally interact, one agent’s action provides them with strong evidence about the other agent’s action. Evidential decision theory and recently proposed variants of causal decision theory (Yudkowsky and Soares, 2018; Spohn, 2003; Poellinger, 2013) say that agents should take such evidence into account when making decisions. MSR is based on the idea that (i) humans on Earth are in a similar situation as the two AI agents: there probably is a large or infinite multiverse containing many exact copies of humans on Earth (Tegmark 2003, p. 464), but also agents similar but non-identical to humans. (ii) If humans and these other, similar agents take each other’s preferences into account, then, due to gains from trade, everyone is better off than if everyone were to pursue only their own ends. It follows from (i) and (ii) that humans should take the preferences of other, similar agents in the multiverse into account, to produce the evidence that they do in turn take humans’ preferences into account, which leaves everyone better off.
According to Oesterheld (2017, sec. 4), this idea could have far-reaching implications for prioritization. For instance, given MSR, some forms of moral advocacy could become ineffective: advocating for their particular values provides agents with evidence that others do the same, potentially neutralizing each other’s efforts. Moreover, MSR could play a role in deciding which strategies to pursue in AI alignment. It could become especially valuable to ensure an AGI will engage in a multiverse-wide trade.
One way I imagine dealing with this is that there is an oracle that tells us with certainty, for two algorithms and their decision situations, what the counterfactual possible joint outputs are. The smoothness then comes from our uncertainty about (i) the other agents' algorithms (ii) their decision situation (iii) potentially the outputs of the oracle. The correlations vary smoothly as we vary our probability distributions over these things, but for a fully specified algorithm, situation, etc., the algorithms are always either logically identical or not.
Unfortunately, I don't know what the oracle would be doing in general. I could also imagine that, when formulated this way, the conclusion is that humans never correlate with anything, for instance.