Right. How can we prevent a misaligned AI from locking in bad values?
A misaligned AI surviving takeover counts as "no extinction", see the comment by MacAskill https://forum.effectivealtruism.org/posts/TeBBvwQH7KFwLT7w5/william_macaskill-s-shortform?commentId=jbyvG8sHfeZzMqusJ
Intelligent life extinction could be prevented by creating a misaligned AI locking-in bad moral values, no?
Maybe see the comment by MacAskill https://forum.effectivealtruism.org/posts/TeBBvwQH7KFwLT7w5/william_macaskill-s-shortform?commentId=jbyvG8sHfeZzMqusJ
I am curious about the lower tractability.
Do you think that changing the moral values/goals of the ASIs Humanity would create is not a tractable way to influence the value of the future?
If yes, is that because we are not able to change them, or because we don't know which moral values to input, or something else?
In the second case, what about inputting the goal of figuring out which goals to pursue ("long reflection")?
I am curious about (1)
Do you think that changing the moral values/goals of the ASIs Humanity would create is not a tractable way to influence the value of the future?
If yes, is that because we are not able to change them, or because we don't know which moral values to input, or something else?
In the second case, what about inputting the goal of figuring out which goals to pursue ("long reflection")?
There is a misunderstanding: "Increasing value of futures where we survive" is an X-risks reduction intervention.
See the comment by MacAskill https://forum.effectivealtruism.org/posts/TeBBvwQH7KFwLT7w5/william_macaskill-s-shortform?commentId=jbyvG8sHfeZzMqusJ which clarifies that the debate is between Extinction-Risks vs Alignment-Risks (AKA increasing the value of future) which both are X-risks. The debate is not between X-risks and Alignment-Risks.
One of the most impactful way to "increasing value of futures where we survive" is to work on AI governance and technical AI alignment.
I want to very briefly argue that given the complexity of long-term trajectories, the lack of empirical evidence, and the difficulty of identifying robust interventions, efforts to improve future value are significantly less tractable than reducing existential risk.
[...]
And compared to existential risk, where specific interventions may have clear leverage points, such as biosecurity or AI safety, increasing the quality of long-term futures is a vast and nebulous goal.
I guess, there is a misunderstanding in your analysis. Please correct me if I am wrong.
"Increasing the quality of long-term futures" reduces existential risks. When longtermists talk about "increasing the quality of long-term futures," they include progress on aligning AIs as one of the best interventions they have in mind.
To compare their relative tractability, let's look at the best intervention to reduce Extinction-Risks and, on the other hand, at the best interventions for "increasing the quality of long-term futures", what I call reducing Alignment-Risks.
Now, let's compare their tractability. How these interventions differ in tractability is not clear. These interventions actually overlap significantly. It is not clear if reducing misuse risks is actually harder than improving alignment or than improving AI governance.
Interestingly, this leads us to a plausible contradiction in arguments against Alignment-Risks: Some will say that the interventions to reduce Alignment-Risks and Extinction-Risks are the same, and some will say they have vastly different tractability. One of the two groups is incorrect. Interventions can't be the same and have different tractability.
It would only reduce the value of extinction risk reduction by an OOM at most, though?
Right, at most, one OOM. Higher updates would require us to learn that the universe is more Civ-Saturated than our current best guess. This could be the case if:
- humanity's extinction would not prevent another intelligent civilization from appearing quickly on Earth
- OR that intelligent life in the universe is much more frequent (e.g., to learn that intelligent life can appear around red dwarfs whose lifespan is 100B to 1T years).
Suppose that Earth-originating civilisation's value is V, and if we all worked on it we could increase that to V+ or to V-. If so, then which is the right value for the alien civilisation? Choosing V rather than V+ or V- (or V+++ or V--- etc) seems pretty arbitrary.
I guess, as long as V ~ V+++ ~ V--- (like the relative difference is less than 1%), then it is likely not a big issue. However, the relative difference may become large only when we become significantly more certain about the impact of our actions, e.g., if we are the operators choosing the moral values of the first ASI.
You can find a first evaluation of the Civ-Saturation hypothesis in Other Civilizations Would Recover 84+% of Our Cosmic Resources - A Challenge to Extinction Risk Prioritization. It seems pretty accurate as long as you assume EDT.
> Civ-Similarity seems implausible. I at least have some control over what humans do in the future
Maybe there is a misunderstanding here. The Civ-Similarity is not about having control; it is not about marginal utility. It is that the expected utility (not the marginal) produced by space-faring civilizations given either human ancestry or alien ancestry, are similar. The single strongest argument in favour of this hypothesis is that we are too uncertain about how conditioning on human ancestry or alien ancestry changes the utility produced in the far future by a space-faring civilization. We are too uncertain to say that U(far future | human ancestry) significantly differs from U(far future | alien ancestry).
Thank you for organizing this debate!
Here are several questions. They are related to two hypotheses, that could, if both significantly true, make impartial longtermists update the value of Extinction-Risk reduction downward (potentially by 75% to 90%).
For context, I recently introduced these hypotheses here, and I will publish a few posts producing preliminary evaluations of those during the debate week.
General questions:
Specific questions:
I think it's a better model to think about humanity and aliens ICs as randomly sampled from among Intelligent Civilizations (ICs) with the potential to create a space-faring civilization. Alien civilizations also have a chance at succeeding at aligning their ASI with positive moral values. Thus, by assuming the Mediocrity Principle, we can say that the expected value produced by both is similar (as long as we don't gain information that we are different).
Misaligned AIs, are not sampled from this distribution. Thus, letting loose a misaligned AGI does not produce the same expected value. I.e. letting loose humanity is equivalent to letting loose an alien IC (if we can't predict their differences and the impact of such differences), but letting loose a misaligned AGI does not produce the same expected value.
I hope that makes sense. You can also see the comment by MacAskill just below.
For clarity, I think that letting loose a misaligned AGI is strongly negative, as argued in posts I published.