MR

Maxime Riché 🔸

Research Engineer
149 karmaJoined Working (6-15 years)Brighton, UK

Posts
10

Sorted by New

Sequences
1

Evaluating the Existence Neutrality Hypothesis - Introductory Series

Comments
16

I want to very briefly argue that given the complexity of long-term trajectories, the lack of empirical evidence, and the difficulty of identifying robust interventions, efforts to improve future value are significantly less tractable than reducing existential risk.

[...]

And compared to existential risk, where specific interventions may have clear leverage points, such as biosecurity or AI safety, increasing the quality of long-term futures is a vast and nebulous goal.

I guess, there is a misunderstanding in your analysis. Please correct me if I am wrong. 

"Increasing the quality of long-term futures" reduces existential risks. When longtermists talk about "increasing the quality of long-term futures," they include progress on aligning AIs as one of the best interventions they have in mind.

To compare their relative tractability, let's look at the best intervention to reduce Extinction-Risks and, on the other hand, at the best interventions for "increasing the quality of long-term futures", what I call reducing Alignment-Risks.

  • Illustrative best PTIs for Extinction-Risks reduction: Improving AI control, Reducing AI misuses. These reduce the chance of AI destroying future Earth-originating intelligent agents.
  • Illustrative best PTIs for Alignment-Risks reduction: Technical AI alignment, improving AI governance. These improve the quality of the long-term futures.

Now, let's compare their tractability. How these interventions differ in tractability is not clear. These interventions actually overlap significantly. It is not clear if reducing misuse risks is actually harder than improving alignment or than improving AI governance.

Interestingly, this leads us to a plausible contradiction in arguments against Alignment-Risks: Some will say that the interventions to reduce Alignment-Risks and Extinction-Risks are the same, and some will say they have vastly different tractability. One of the two groups is incorrect. Interventions can't be the same and have different tractability.

It would only reduce the value of extinction risk reduction by an OOM at most, though?

Right, at most, one OOM. Higher updates would require us to learn that the universe is more Civ-Saturated than our current best guess. This could be the case if:
- humanity's extinction would not prevent another intelligent civilization from appearing quickly on Earth 
- OR that intelligent life in the universe is much more frequent (e.g., to learn that intelligent life can appear around red dwarfs whose lifespan is 100B to 1T years).
 

Suppose that Earth-originating civilisation's value is V, and if we all worked on it we could increase that to V+ or to V-. If so, then which is the right value for the alien civilisation? Choosing V rather than V+ or V- (or V+++ or V--- etc) seems pretty arbitrary. 

I guess, as long as V ~ V+++ ~ V--- (like the relative difference is less than 1%), then it is likely not a big issue. However, the relative difference may become large only when we become significantly more certain about the impact of our actions, e.g., if we are the operators choosing the moral values of the first ASI.

You can find a first evaluation of the Civ-Saturation hypothesis in Other Civilizations Would Recover 84+% of Our Cosmic Resources - A Challenge to Extinction Risk Prioritization. It seems pretty accurate as long as you assume EDT.

> Civ-Similarity seems implausible. I at least have some control over what humans do in the future
Maybe there is a misunderstanding here. The Civ-Similarity is not about having control; it is not about marginal utility. It is that the expected utility (not the marginal) produced by space-faring civilizations given either human ancestry or alien ancestry, are similar. The single strongest argument in favour of this hypothesis is that we are too uncertain about how conditioning on human ancestry or alien ancestry changes the utility produced in the far future by a space-faring civilization. We are too uncertain to say that U(far future | human ancestry) significantly differs from U(far future | alien ancestry).

Thank you for organizing this debate! 

Here are several questions. They are related to two hypotheses, that could, if both significantly true, make impartial longtermists update the value of Extinction-Risk reduction downward (potentially by 75% to 90%).

  • Civ-Saturation Hypothesis: Most resources will be claimed by Space-Faring Civilizations (SFCs) regardless of whether humanity creates an SFC.
  • Civ-Similarity Hypothesis: Humanity's Space-Faring Civilization would produce utility similar to other SFCs (per unit of resource controlled).

For context, I recently introduced these hypotheses here, and I will publish a few posts producing preliminary evaluations of those during the debate week.

General questions:

  • What are the best arguments against these hypotheses?
  • Is the AI Safety community already primarily working on reducing Alignment-Risks and not on reducing Extinction-Risks?
    • By Alignment-Risks, I mean "increasing the value of futures where Earth-originating intelligent-life survive".
    • By Extinction-Risks, I mean "reducing the chance of Earth-originating intelligent-life extinction".
  • What are the current relative importance given to Extinction-Risks and Alignment-Risks in the EA community? E.g., what are the relative grant allocations?
  • Should the EA community do more to study the relative priorities of Extinction-Risks and Alignment-Risks, or are we already allocating significant attention to this question?

Specific questions:

  • Should we prioritize interventions given EDT (or other evidential decision theories) or CDT? How should we deal with uncertainty there?
    • I am interested in this question because the Civ-Saturation hypothesis may be significantly true when assuming EDT (and thus at least assuming we control our exact copies, and they exist). However, this hypothesis may be otherwise pretty incorrect assuming CDT.
  • We are strongly uncertain about how the characteristics of ancestors of space-faring civilizations (e.g., Humanity) would impact the value space-faring civilizations would produce in the far future. Given this uncertainty, should we expect it to be hard to argue that Humanity's future space-faring civilization would produce significantly different value than other space-faring civilizations?
    • I am interested in this question, because I believe we should use the Mediocrity Principle as a starting point when comparing our future potential impact with that of aliens, and that it is likely (and also in practice) very hard to find robust enough arguments to update significantly away from this principle, especially given that we can find many arguments reinforcing the mediocrity principle prior (e.g., selection pressures and convergence arguments).
  • What are our best arguments supporting that Humanity's space-faring civilization would produce significantly more value than other space-faring civilizations?
  • How should we aggregate beliefs over possible worlds in which we could have OOMs of difference in impact?

You may want to see a series I am currently publishing, which includes some preliminary investigation of this question: https://forum.effectivealtruism.org/s/hi2DyFuqHmt9ieoCi

Great news! 

> If there are other posts you think more people should read, please comment them below. I might highlight them during the debate week, or before. 

I am in the process of publishing a series of posts ("Evaluating the Existence Neutrality Hypothesis") related to the theme of the debate ("Extinction risks" VS "Alignment risks / Future value"). The series is about evaluating how to update on those questions given our best knowledge about potential space-faring civilizations in the universe. 

I will aim to publish several of the remaining posts during the debate week.

I somewhat agree with your points. Here are some contributions, and pushbacks:

I get that there's been a lot of work on this and that we can make progress on it (I know, I'm an astrobiologist), but I'm sure there are so many unknown unknowns associated with the origin of life, development of sentience, and spacefaring civilisation that we just aren't there yet. The universe is so enormous and bonkers and our brains are so small - we can make numerical estimates sure, but creating a number doesn't necessarily mean we have more certainty.

Something interesting about these hypotheses and implications is that they get stronger the more uncertainty we are, as long as one uses some form of EDT (e.g., CDT + exact copies). The less we know about how conditioning on Humanity ancestry impacts utility production, the more the Civ-Similarity Hypothesis is close to correct. The broader our distribution over the density of SFC in the universe, the more the Civ-Saturation Hypothesis is close to correct. This seems true as long as you account for the impact of correlated agents (e.g., exact copies) and that they exist. For the Civ-Similarity Hypothesis, this comes from the application of the Mediocrity Principle. For the Civ-Saturation Hypothesis, this comes from the fact that we have orders of magnitude more exact copies in saturated worlds than in empty worlds.

I think you're posing a post-understanding of consciousness question. Consciousness might be very special or it might be an emergent property of anything that synthesises information, we just don't know. But it's possible to imagine aliens with complex behaviour similar to us, but without evolving the consciousness aspect, like superintelligent AI probably will be like. For now, the safe assumption is that we're the only conscious life, and I think it's very important that we act like it until proven otherwise. 

Consciousness is indeed one of the arguments pushing the Civ-Similarity Hypothesis toward lower values (humanity being more important), and I am eager to discuss its potential impact. Here are several reasons why the update from consciousness may not be that large:

  • Consciousness may not be binary, in that case, we don't know if humans are low, medium, or high consciousness, I only know that I am not at zero. We should then likely assume we are average. Then, the relevant comparison is no longer between P(humanity is "conscious") and P(aliens creating SFCs are "conscious") but between P(humanity's consciousness > 0) and P(aliens-creating-SFC's consciousness > 0)
  • If human consciousness is a random fluke and has no impact on behavior (or it could be selected in or out), then we have no reason to think that aliens will create more or less conscious descendants than us. Consciousness needs to have a significant impact on behavior to change the chance that (artificial) descendants are conscious. But the larger the effect of consciousness on behaviors, the more likely consciousness is to be a result of evolution/selection.
  • We don't understand much about how the consciousness of SFC creators would influence the consciousness of (artificial) SFC descendants. Even if Humans are abnormal in being conscious, it is very uncertain how much that changes how likely our (artificial) descendants are to be conscious.

I am very happy to get pushback and to debate the strength of the "consciousness argument" on Humanity's expected utility.

What's the difference between "P(Alignment | Humanity creates an SFC)" and "P(Alignment AND Humanity creates an SFC)"? 

I will try to explain it more clearly. Thanks for asking.

P(Alignment AND Humanity creates an SFC) = P(Alignment | Humanity creates an SFC) x P(Humanity creates an SFC)

So the difference is that when you optimize for P(Alignment | Humanity creates an SFC), you no longer optimize for the term P(Humanity creates an SFC), which was included in the conjunctive probability.
 

Can you maybe run us through 2 worked examples for bullet point 2? Like what is someone currently doing (or planning to do) that you think should be deprioritised? And presumably, there might be something that you think should be prioritised instead? 

Bullet point 2 is: (ii) Deprioritizing to some degree AI Safety agendas mostly increasing P(Humanity creates an SFC) but not increasing much P(Alignment | Humanity creates an SFC).

Here are speculative examples. The degree to which their priorities should be updated is to be debated. I only claim that they may need to be updated conditional on the hypotheses being significantly correct.

  • AI Misuse reduction: If the PTIs are (a) to prevent extinction through misuse and chaos, (b) to prevent the loss of alignment power resulting from a more chaotic world, and (c) to provide more time for Alignment research, then it is plausible that the PTI (a) would become less impactful.
  • Misalign AI Control: If the PTIs are (c) as above, (d) to prevent extinction through controlling early misaligned AI trying to take over, (e) to control misaligned early AIs to make them work on Alignment research, and (f) to create fire alarms (note: which somewhat contradicts the path (b) above), then it is plausible the PTI (d) would be less impactful since these early misaligned AI may have a higher chance to not create an SFC after taking over (e.g., they don't survive destroying humanity or don't care about space colonization).
    • Here is another vague diluted effect: If an intervention, like AI control, increases P(Humanity creates an SFC | Early Misalignment), then this intervention may need to be discounted more than if it was increasing P(Humanity creates an SFC) only. Changing P(Humanity creates an SFC) may have no impact when the hypotheses are significantly correct, but P(Humanity creates an SFC | Misalignment) is net negative, and Early Misalignment and (Late) Misalignment may be strongly correlated.
  • AI evaluations: The reduction of the impact of (a) and (d) may also impact the overall importance of this agenda.

These updates are, at the moment, speculative.

Sorry if that's not clear.

Are the reformulations in the initial summary helping? The second bullet point is the most relevant.
 

  • (i) Deprioritizing significantly extinction risks, such as nuclear weapon and bioweapon risks.
  • (ii) Deprioritizing to some degree AI Safety agendas mostly increasing P(Humanity creates an SFC) but not increasing much P(Alignment | Humanity creates an SFC).
  • (iii) Giving more weight to previously neglected AI Safety agendas. E.g., a "Plan B AI Safety" agenda that would focus on decreasing P(Humanity creates an SFC | Misalignment), for example, by implementing (active & corrigible) preferences against space colonization in early AI systems.

Interesting and nice to read!

Do you think the following is right?

The larger the Upside-focused Colonist Curse, the fewer resources agents caring about suffering will control overall and the smaller the risks of conflicts causing S-risks?

This may balance out the effect that the larger the Upside-focused Colonist Curse, the more neglected S-risks are.

High Upside-focused Colonist Curse produces fewer S-risks at the same time as making them more neglected.

Load more