| This is a Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked. |
TLDR: I made a back-of-the-envelope model for the value of steering the future of AI (link here).
I started with four questions:
a) How morally aligned can we expect the goals of an ASI to be?
b) How morally aligned can we expect future human goals to be?
c) How much can we expect ASI to increase or decrease human agency?
d) How would [an intervention] affect these expectations?
Here, by agency, I mean the proportion of decisions made based on someone’s expressed preferences. In my model, I compare a world in which a superintelligence (ASI) suddenly arises (World A) with a world in which there is a boom of AI safety research (World B) or one possible goal of AI governance is achieved (World C). More in the doc.
Although Bostrom (1, 2), Ord (Precipice, Chapter 1) or MacAskill (WWOTF) tackle all of these questions, I’m not aware of a post which would put them into a single “interactive” Excel, so that’s what I tried to do. Seeing how they weigh in my mind makes me think consciousness and the reliability of progress are somewhat under-discussed parts of the equation.
I was influenced by Joscha Bach’s arguments from this debate based on
The growth extinction angle is based on resource depletion, which I explored in this post, unable to find a credible basis for Bach’s argument. However, I think it’s reasonable to question the value of x-risk reduction if one’s uncertain that civilization without AGI could yield much positive value. Similarly, I think Bach’s specific theory of valence is likely wrong but grant that the hypothesized conclusion should be taken seriously based on a wider range of views on consciousness and AI.
As a result, my guess is that whether or not the AI safety movement will succeed at steering the values of ASI, the future will be better than today. However, these considerations haven’t changed my general outlook: It’s much more likely that the future will be good if humanity makes a conscious effort to shape the trajectory and values of ASI and this conclusion seems robust even to quite exotic considerations.
Nevertheless, my reflection highlighted a few ideas:
1. Alignment isn’t just about the control problem
Yes, AIS increases human agency (question C) but also the probability that the amount of agency given to humans or ASI will depend on their moral alignment (interactions C-A and C-B), as well as directly improving the probability any AI that will be developed will be morally aligned (question A). To a limited extent, AIS may also improve human (moral) decision-making (question B) via the routes discussed within AI ethics (such as preventing the rise of extremism via AI manipulation).
2. Increasing human agency does not guarantee positive outcomes.
It seems a truly long-lasting value lock-in is only possible with a heavy help of AI. Therefore, the risk that we would solve the alignment problem but nevertheless irrationally prevent ourselves from building a friendly AI, seems very low - relative to the billions of years we’ve got to realize our potential, cultural evolution is quick. More on this in point 5).
This consideration also potentially suggests that one possible risk of increasing humanity’s attempts to carefully shape AI values could be increasing the chances of a value lock-in. However, I think that if we solve the control problem (i.e. humans will stay in the decision loop), an AI capable of a value lock-in would understand how our meta-values interact with our true values. In other words, coherent extrapolated volition is a more rational way of interpreting goals than taking them literally, so I have a decent faith an aligned ASI would recognize that. And it doesn’t seem like there are important differences in CEV, that is meta-values (more in point 6).
I think there's a big chance I'm wrong here. If ASI arises by scaling a LLM, it could be analogous to a human who is very smart in terms of System 1 (can instantly produce complex plans to achieve goals) but not so rational, i.e. bright in terms of System 2 (doesn't care to analyze, how philosophically coherent these goals are). However, these scenarios seem like precisely the kind of problem reduced by increasing the attention oriented towards AI safety.
3. Consciousness, progress and uncertainty seem like key factors.
Understanding consciousness seems important to evaluate, what value we would lose if an AI proceeded to convert the universe's resources according to whatever value function which would happen to win the AI race. I explored this interaction more in a previous post.
Understanding progress seems important to evaluate whether humanity would be better equipped to create an ASI in 100 or 1000 years. For this purpose, I think "better equipped" can be nicely operationalized in a very value-uncertain way as "making decisions based on more reflection & evidence and higher-order considerations". Part of this question is whether morally misaligned actors, such as authoritarian regimes or terrorists may utilize this time to catch up and perhaps use an AI to halt humanity's potential (5).
The specific flavor of uncertainty we choose seems crucial. If it pushes us towards common-sense morality or if it pushes us to defer to later generations, AIS seems like a clear top-priority. If it pushes us towards views that assign moral patienthood to AI, it may decrease some forms of AIS (an infinite pause) while increasing others (e.g. implementing reliable AI philosophy / meta-cognition, see Chi's recent post) (6).
4. Increasing ASI agency does not guarantee negative outcomes.
Orthogonality thesis, as proposed by Bostrom is hard to disagree with - it does seem possible to imagine an AI holding any combination of goals and intelligence. However, the thesis alone doesn’t rule out a possible correlation - i.e. the possibility that given somewhat flexible goals, it’s more likely that an AI will be morally aligned, as opposed to misaligned.
Given the grand uncertainty and importance of these questions, hoping that such a correlation exists would be a terrible plan. Nevertheless, there’s a few interesting reasons one might think it does:
5. Progress with humans in charge seems reliable
As long as humans have agency - a collective leverage against actors who would like to take power into their own hands - any value dissatisfaction creates a tension and therefore, over the long run, systems that are positive for human well-being seem more stable. A much more speculative question is, to what extent this dynamic also selects for worldviews that are more congruent with people's belief that they are rational and moral. Both of these questions seem uncertain, however I think there are good reasons to believe that the evolution of democracy and expansion of the moral circle were not flukes, but a result of a selection for belief systems that are indeed congruent with evidence, higher-order reflection and human well-being.
Let’s say humans won’t become an interplanetary species. In such a case, I’d expect our species to continue thriving on this planet for the remaining lifetime of Earth, i.e. something like 500 million years. Let’s say current AI safety efforts do overshoot and in result, our civilization implements a tough international law that prevents civilization from making use of the positive side of AI and spreading between the stars. This could constitute a suboptimal lock-in. However, it seems unlikely to me that without AI, humans would be able to lock-in a bad idea for long enough to matter. In the 17th century slavery and witch trials were commonly accepted. If it took us a hundred times longer to reach some moral threshold, we would have just used up 0.006% of the remaining lifetime of our planet. In the optimistic scenario we utilize the full lifetime of the universe, this time could be trillion times longer.
6. “Indiscriminate moral uncertainty” supports AIS
Naively, absolute moral uncertainty would imply practical moral nihilism. Every moral claim would have a 50 % probability of being true - therefore, there’s no reason to judge actions on moral basis. However, such a position requires ~100 % credence that for each claim, this probability is indeed 0.5 and no further inspection can move it by any margin, which is paradoxically an expression of ridiculous certainty. True moral uncertainty probably leads to attempts to increase humanity’s philosophical reflection. This seems philosophically very straight-forward:
AIS could be a necessary precursor to make sure we have time for such reflection. This is a less “obviously true” statement but the uncertainty is epistemic, not moral. And provided ASI doesn’t happen in our lifetimes, such effort would merely be a waste, not actively harmful, which seems positive from the position of a "sincere" moral uncertainty.
Lastly, more uncertainty about cause X increases the necessity to develop an (aligned) ASI. For instance, one could argue that perhaps the universe is full of deadly rays that wipe life out the moment they meet it but we can’t observe any signs of it, because once we could observe them, we’d already be dead. However, I think the Grabby Aliens model provides an interesting argument against this reasoning - just based on conventional assumptions about the great filters, our civilization is suspiciously early in the universe (see this fun animated explainer). Therefore, any additional historical strong selection effect seems unlikely on priors.