I am currently working on a research project as part of CEA’s summer research fellowship. I am building a simple model of so-called “multiverse-wide cooperation via superrationality” (MSR). The model should incorporate the most relevant uncertainties for determining possible gains from trade. To be able to make this model maximally useful, I would like to ask others for their opinions on the idea of MSR. For instance, what are the main reasons you think MSR might be irrelevant or might not work as it is supposed to work? Which questions are unanswered and need to be addressed before being able to assess the merit of the idea? I would be happy about any input in the comments to this post or via mail to johannes@foundational-research.org.

An overview of resources on MSR, including introductory texts, can be found on the link above. To briefly illustrate the idea, consider two artificial agents with identical source code playing a prisoner’s dilemma. Even if both agents cannot causally interact, one agent’s action provides them with strong evidence about the other agent’s action. Evidential decision theory and recently proposed variants of causal decision theory (Yudkowsky and Soares, 2018; Spohn, 2003; Poellinger, 2013) say that agents should take such evidence into account when making decisions. MSR is based on the idea that (i) humans on Earth are in a similar situation as the two AI agents: there probably is a large or infinite multiverse containing many exact copies of humans on Earth (Tegmark 2003, p. 464), but also agents similar but non-identical to humans. (ii) If humans and these other, similar agents take each other’s preferences into account, then, due to gains from trade, everyone is better off than if everyone were to pursue only their own ends. It follows from (i) and (ii) that humans should take the preferences of other, similar agents in the multiverse into account, to produce the evidence that they do in turn take humans’ preferences into account, which leaves everyone better off.

According to Oesterheld (2017, sec. 4), this idea could have far-reaching implications for prioritization. For instance, given MSR, some forms of moral advocacy could become ineffective: advocating for their particular values provides agents with evidence that others do the same, potentially neutralizing each other’s efforts. Moreover, MSR could play a role in deciding which strategies to pursue in AI alignment. It could become especially valuable to ensure an AGI will engage in a multiverse-wide trade.

13

0
0

Reactions

0
0
Comments7


Sorted by Click to highlight new comments since:

A few doubts:

  1. It seems like MSR requires a multiverse large enough to have many well-correlated agents, but not large enough to run into the problems involved with infinite ethics. Most of my credence is on no multiverse or infinite multiverse, although I'm not particularly well-read on this issue.

  2. My broad intuition is something like "Insofar as we can know about the values of other civilisations, they're probably similar to our own. Insofar as we can't, MSR isn't relevant." There are probably exceptions, though (e.g. we could guess the direction in which an r-selected civilisation's values would vary from our own).

  3. I worry that MSR is susceptible to self-mugging of some sort. I don't have a particular example, but the general idea is that you're correlated with other agents even if you're being very irrational. And so you might end up doing things which seem arbitrarily irrational. But this is just a half-fledged thought, not a proper objection.

  4. And lastly, I would have much more confidence in FDT and superrationality in general if there were a sensible metric of similarity between agents, apart from correlation (because if you always cooperate in prisoner's dilemmas, then your choices are perfectly correlated with CooperateBot, but intuitively it'd still be more rational to defect against CooperateBot, because your decision algorithm isn't similar to CooperateBot in the same way that it's similar to your psychological twin). I guess this requires a solution to logical uncertainty, though.

Happy to discuss this more with you in person. Also, I suggest you cross-post to Less Wrong.

[anonymous]3
1
0

Re 4): Correlation or similarity between agents is not really necessary condition for cooperation in the open source PD. LaVictoire et al. (2012) and related papers showed that 'fair' agents with completely different implementations can cooperate. A fair agent, roughly speaking, has to conform to any structure that implements "I'll cooperate with you if I can show that you'll cooperate with me". So maybe that's the measure you're looking for.

A population of fair agents is also typically a Nash equilibrium in such games so you might expect that they sometimes do evolve.

Source: LaVictoire, P., Fallenstein, B., Yudkowsky, E., Barasz, M., Christiano, P., & Herreshoff, M. (2014, July). Program equilibrium in the prisoner’s dilemma via Löb’s theorem. In AAAI Multiagent Interaction without Prior Coordination workshop.

The example you've given me shows that agents which implement exactly the same (high-level) algorithm can cooperate with each other. The metric I'm looking for is: how can we decide how similar two agents are when their algorithms are non-identical? Presumably we want a smoothness property for that metric such that if our algorithms are very similar (e.g. only differ with respect to some radically unlikely edge case) the reduction in cooperation is negligible. But it doesn't seem like anyone knows how to do this.

One way I imagine dealing with this is that there is an oracle that tells us with certainty, for two algorithms and their decision situations, what the counterfactual possible joint outputs are. The smoothness then comes from our uncertainty about (i) the other agents' algorithms (ii) their decision situation (iii) potentially the outputs of the oracle. The correlations vary smoothly as we vary our probability distributions over these things, but for a fully specified algorithm, situation, etc., the algorithms are always either logically identical or not.

Unfortunately, I don't know what the oracle would be doing in general. I could also imagine that, when formulated this way, the conclusion is that humans never correlate with anything, for instance.

Hey, a rough point on a doubt I have. Not sure if it's useful/novel.

Going through the mental processes of a utilitarian (roughly defined) will correlate with others making more utilitarian decisions as well (especially when they're similar in relevant personality traits and their past exposure to philosophical ideas).

For example, if you act less scope-insensitive, ommission-bias-y, or ingroup-y, others will tend to do so as well. This includes edge cases – e.g. people who otherwise would have made decisions that roughly fall in the deontologist or virtue ethics bucket.

Therefore, for every moment you end up shutting off utilitarian-ish mental processes in favour of ones where you think you're doing moral trade (including hidden motivations like rationalising acting from social proof or discomfort in diverging from your peers), your multi-universal compatriots will do likewise (especially in similar contexts).

(In case it looks like I'm justifying being a staunch utilitarian here, I have a more nuanced anti-realism view mixed in with lots of uncertainty on what makes sense.)

I remain unsure with MSR how to calculate the measure of agents in worlds holding positions to trade with so that we can figure out how much we should acausally trade with each. Also, how to address uncertainty about if anyone will independently arrive at the same position you hold and so be able to acausally trade with you since you can't tell them about what you would actually prefer.

I still have doubts as to whether you should pay in Counterfactual Mugging since I believe that (non-quantum) probability is in the map rather than the territory. I haven't had the opportunity to write up these thoughts yet as my current posts are building up towards it, but I can link you when I do.

Curated and popular this week
 ·  · 20m read
 · 
Advanced AI could unlock an era of enlightened and competent government action. But without smart, active investment, we’ll squander that opportunity and barrel blindly into danger. Executive summary See also a summary on Twitter / X. The US federal government is falling behind the private sector on AI adoption. As AI improves, a growing gap would leave the government unable to effectively respond to AI-driven existential challenges and threaten the legitimacy of its democratic institutions. A dual imperative → Government adoption of AI can’t wait. Making steady progress is critical to: * Boost the government’s capacity to effectively respond to AI-driven existential challenges * Help democratic oversight keep up with the technological power of other groups * Defuse the risk of rushed AI adoption in a crisis → But hasty AI adoption could backfire. Without care, integration of AI could: * Be exploited, subverting independent government action * Lead to unsafe deployment of AI systems * Accelerate arms races or compress safety research timelines Summary of the recommendations 1. Work with the US federal government to help it effectively adopt AI Simplistic “pro-security” or “pro-speed” attitudes miss the point. Both are important — and many interventions would help with both. We should: * Invest in win-win measures that both facilitate adoption and reduce the risks involved, e.g.: * Build technical expertise within government (invest in AI and technical talent, ensure NIST is well resourced) * Streamline procurement processes for AI products and related tech (like cloud services) * Modernize the government’s digital infrastructure and data management practices * Prioritize high-leverage interventions that have strong adoption-boosting benefits with minor security costs or vice versa, e.g.: * On the security side: investing in cyber security, pre-deployment testing of AI in high-stakes areas, and advancing research on mitigating the ris
 ·  · 32m read
 · 
Summary Immediate skin-to-skin contact (SSC) between mothers and newborns and early initiation of breastfeeding (EIBF) may play a significant and underappreciated role in reducing neonatal mortality. These practices are distinct in important ways from more broadly recognized (and clearly impactful) interventions like kangaroo care and exclusive breastfeeding, and they are recommended for both preterm and full-term infants. A large evidence base indicates that immediate SSC and EIBF substantially reduce neonatal mortality. Many randomized trials show that immediate SSC promotes EIBF, reduces episodes of low blood sugar, improves temperature regulation, and promotes cardiac and respiratory stability. All of these effects are linked to lower mortality, and the biological pathways between immediate SSC, EIBF, and reduced mortality are compelling. A meta-analysis of large observational studies found a 25% lower risk of mortality in infants who began breastfeeding within one hour of birth compared to initiation after one hour. These practices are attractive targets for intervention, and promoting them is effective. Immediate SSC and EIBF require no commodities, are under the direct influence of birth attendants, are time-bound to the first hour after birth, are consistent with international guidelines, and are appropriate for universal promotion. Their adoption is often low, but ceilings are demonstrably high: many low-and middle-income countries (LMICs) have rates of EIBF less than 30%, yet several have rates over 70%. Multiple studies find that health worker training and quality improvement activities dramatically increase rates of immediate SSC and EIBF. There do not appear to be any major actors focused specifically on promotion of universal immediate SSC and EIBF. By contrast, general breastfeeding promotion and essential newborn care training programs are relatively common. More research on cost-effectiveness is needed, but it appears promising. Limited existing
 ·  · 11m read
 · 
Our Mission: To build a multidisciplinary field around using technology—especially AI—to improve the lives of nonhumans now and in the future.  Overview Background This hybrid conference had nearly 550 participants and took place March 1-2, 2025 at UC Berkeley. It was organized by AI for Animals for $74k by volunteer core organizers Constance Li, Sankalpa Ghose, and Santeri Tani.  This conference has evolved since 2023: * The 1st conference mainly consisted of philosophers and was a single track lecture/panel. * The 2nd conference put all lectures on one day and followed it with 2 days of interactive unconference sessions happening in parallel and a week of in-person co-working. * This 3rd conference had a week of related satellite events, free shared accommodations for 50+ attendees, 2 days of parallel lectures/panels/unconferences, 80 unique sessions, of which 32 are available on Youtube, Swapcard to enable 1:1 connections, and a Slack community to continue conversations year round. We have been quickly expanding this conference in order to prepare those that are working toward the reduction of nonhuman suffering to adapt to the drastic and rapid changes that AI will bring.  Luckily, it seems like it has been working!  This year, many animal advocacy organizations attended (mostly smaller and younger ones) as well as newly formed groups focused on digital minds and funders who spanned both of these spaces. We also had more diversity of speakers and attendees which included economists, AI researchers, investors, tech companies, journalists, animal welfare researchers, and more. This was done through strategic targeted outreach and a bigger team of volunteers.  Outcomes On our feedback survey, which had 85 total responses (mainly from in-person attendees), people reported an average of 7 new connections (defined as someone they would feel comfortable reaching out to for a favor like reviewing a blog post) and of those new connections, an average of 3