Conditional on AGI being developed by 2070, what is the probability that humanity will suffer an existential catastrophe due to loss of control over an AGI system?
(Note: This is a submission for the Open Philanthropy AI Worldviews Contest. It was submitted before the deadline but posted here after.)
In examining this issue, I argue that (1) our primitive frameworks for consensus-making as a global society present massive s-risks and x-risks — AGI erodes core “tenets of human identity” such as collaboration, abstraction, and creativity, and has the power to cause (1a) repeated episodes of class warfare and (1b) suffering caused by a way of living that has not been accepted. I also argue that (2) constitutional AI using existing legal frameworks decreases x-risks significantly. Lastly, I talk about the (3) monoculture of a utopia of progress that AGI represents and implicitly advances, and how this presents as an s-risk.
A traditional approach to discussion about AI risk is suggested by the phrase in the question “loss of control” - that existential catastrophe is a result of failed intent alignment and the inability of humans to bind an AGI to a moral code, as exemplified by ‘Optimal Policies Tend to Seek Power’, the orthogonality thesis and the instrumental convergence thesis. However, I explore how human interaction with AGI itself represents a form of control that can be lost. Morality alone does not define the scope of control; our interaction with AGI shapes its role in society, and it is this process that can spiral out of control.
Regarding timelines, theoretical analysis is limited to the near future due to our limited knowledge of future societies. While a numerical analysis based on models could predict the probability of existential catastrophe on an indefinite timeline, it is crucial to address immediate concerns that would arise when AGI is invented before a political and moral consensus is established. Therefore, a near-term analysis that allows us to examine concrete concepts is appropriate.
A recap on the idea of existential catastrophe
Existential catastrophe refers to a disaster that threatens humanity’s survival and presence. An intuitive understanding of this is (1) extinction. After all, to most people, the value of life is immeasurable, and this consensus is reflected in the International Bill of Human Rights. Extinction can be caused externally, by a rogue AI seeking power in opposition to human desires, or internally, as a result of class struggle. (2) Suffering is also equally unpalatable. When suffering transcends what is an agreeable level of discomfort for the majority of humans, we can consider that an existential catastrophe.
Core tenets of humanity, and how they are challenged by AGI
Our current primitive frameworks for global consensus-making pose significant x- and s-risks when confronted with AGI. To illustrate this, let's consider a scenario where people are divided into two groups: those who believe apples should be eaten and those who believe apples should never be consumed. Apples are edible, nutritious, and non-poisonous, and neither belief about the morality of the consumption of apples is better than the other.
Imagine this hypothetical world as this reality right now. Imagine trying to convince others of your stance in 2023. What information would you encounter in the news, on your phone, or on social media? How would the situation unfold? How long would it take for one side’s beliefs to collapse?
This hypothetical society’s split in beliefs was never resolved. Soon, it was found that apples can provide for the full nutritional needs of humans, leading grocery stores to stock only apples despite the large majority of non-apple eaters. They were distressed by this shift in food options. How could something as fundamental as food be so swiftly altered?
AGI is technology that challenges what it means to be human. Our current methods of discourse do not allow us to redefine an identity across a global population. Without improved tools for discourse, AGI may cause suffering due to an imposed identity change. At its worst, this could lead to recurring episodes of class warfare each time a new iteration of AGI surpasses human limitations.
The existential catastrophe stemming from AGI lies not solely in its "superintelligence" or the erosion of human identity, but rather in our collective failure to chart a clear path for integrating AGI into society. In the following sections, I will explain why this failure is highly probable and how the absence of a shared agreement on AGI's societal role poses a significant risk.
How AGI challenges what it means to be human
(This section attempts to break down human identity and explains how AGI challenges it. Skip this section to read the argumentation for primitive discourse → existential catastrophe)
During the 2023 Hollywood writers' guild strikes, amidst the rise of large language model-aided writing, interviews with writers on the picket line revealed a common refrain: "A writer is a writer and a writer is a person" and "Scripts are written by writers. Writers are people." These refrains encapsulate the concerns and anxieties surrounding the intrusion of AI into the creative space of entertainment, a domain traditionally regarded as a uniquely human endeavor requiring the complex skill of eliciting emotional responses from audiences, and reflect a deeper unease and an eroding consensus regarding what it is to be human.
AGI presents a profound challenge to the core tenets of what it means to be human. Here, I identify five key aspects of humanity and explain how AGI poses risks to each of them: collaboration, abstraction, creativity, the sole contribution of humans to human culture, and intuition.
In interdisciplinary collaboration, social conventions, time constraints, and how meetings are set up are limitations that prevent team members from freely asking questions and gaining a comprehensive understanding of foreign domains, thereby limiting their ability to contribute meaningfully to all aspects of the collaboration. Consequently, the collaborative process may devolve into mere consultations rather than true collaboration.
Ideally, a solution to this issue is to have an environment that allows for unrestricted dialogue. We can see that this is difficult to achieve in human systems, but easy to achieve in an AGI system of networked agents. An AGI system that could share knowledge among its agents whenever, wherever they want would outperform an average network of humans.
Humans generally comprehend graphical representations with up to three axes. However, classical machine learning techniques already possess the ability to compute relationships within datasets of multiple dimensions. Against this backdrop, it becomes apparent that AGI systems might surpass humans in abstraction, and as abstraction is a key tool for humans to solve challenging problems, may surpass human performance in tackling challenging problems.
How do humans think of new ideas and mull over complex issues? Bits of the puzzle come to us in bursts, and then we look at these bits of the puzzle and come up with an idea that synthesizes all of this information. However, our working memory is limited. It cannot hold that many pieces of a puzzle simultaneously.
To illustrate this concept, imagine examining multiple subparts of a problem as if we were viewing a holographic display with multiple cubes. In the center of our vision, we possess the ability to rotate and explore all sides of the cubes, gaining a comprehensive understanding of their intricate details. However, in our peripheral vision, the cubes may appear fixed, allowing us to perceive only one side of the less important cubes. An AGI would be able to consider all parts of a problem simultaneously, achieving revelation and discovery at a far higher rate than humans.
What is creativity? Here are some examples to elucidate what it means.
- In art: Coming up with a novel concept (e.g. style of cinematography) and applying this style to an artistic product
- In engineering: Giving it an unsolved problem statement (create a robot vacuum that can clean all 6 sides of a room up to its corners) and it coming up with a solution
- Policy-making and professional fields of work: Novel solutions to problems
Creativity is important to humans because it is a catalyst for self-expression, problem solving, and cultural progress. It is also the foundation for human ingenuity and our pride in technological and cultural achievements, which is a core part of human identity, as this defines what sets us apart from other animals. Assuming human-level AGI were to possess the capacity for creativity, it would obviously challenge our perception of ourselves.
4. That humans are sole contributors to human culture
An AGI would be seen as a separate sentient group, similar to how aliens are portrayed in science fiction. New ideas about identity will have to be formed.
5. Contextual Intuition
Human understanding of a situation often relies on intuition, an intuitive sense or "vibe" that helps us make sense of complex contexts after absorbing a lot of sensory input. However, this falls short in the area of cross-cultural communication and understanding.
Firstly, language acts as a barrier. Attempts to explain Eastern philosophical thought, for instance, often rely on mapping it to Western philosophical terms, logic, and assumptions. This is why culture is often seen as incommensurable.
Now let’s say we try to understand a culture that we have never experienced personally. Can we predict how a person from a different cultural background would feel in a certain situation, and what decisions they would make? Anthropological texts, while informative, are uncanny and fall short of capturing the essence of a foreign environment through text and pictures. It seems that humans cannot understand a culture just by reading everything there is to know about it. Interestingly, large language models demonstrate the ability to understand concepts purely from textual information. Furthermore, infants effortlessly absorb and understand cultural nuances through exposure to images, text, and sound in their environment. An AGI exposed to just raw sensory input could learn the “vibe” of cultural identities like that of an infant, but it would also have the potential to experience and comprehend a multitude of lives beyond the capacity of a single human.
An AGI equipped with superior intuition would exceed a human who can only read a translated text.
How poor methods of discourse leading to a lack of consensus causes severe suffering as human identity is eroded
Contemporary human society lacks the necessary tools to construct moral consensus on issues as controversial as AGI’s role in society. The process of settling on an agreed upon response to these issues takes a long time, and may not come to a satisfying conclusion at all.
A case in point is the perennial culture war of moral universalism versus moral relativism, which has never been won; the discourse on universal human rights and how they might be at odds with culture has only concluded in vague terms encouraging dialogue.
A moral consensus is needed because there is no true morality. Vivisection was once deemed morally acceptable. Then, when animals were found to have the capacity to experience pain and were not mere automata, it was deemed morally unacceptable. But what is it about pain that makes this debate any different? Why can’t we disregard the pain of animals? It appears that philosophy merely defines right and wrong, and it is up to society to accept a particular belief. The possible roles that AGI should play in society with regards to its replacement or augmentation of human identity are beliefs that have no superior option.
One might think that democratic discourse (in which people of all backgrounds participate in a discussion) would speed up the rate of agreement. Everyone would hear differing perspectives, change their minds, and ultimately form a majority opinion that could influence the minority, resulting in a coalescence of opinions. So it seems like to define AGI’s role in society, a globally important issue requiring a quick resolution, global democratic discourse is necessary. Except the idealized version of it doesn’t exist in reality.
While social media initially promised a platform for global democratic discourse, it has fallen short of fostering discussions that include everyone, and result in the reaching of a consensus.
This is the state of today’s discourse framework:
- Debates are dominated by privileged elites and scholars, while the general public shared their views privately. Society agrees upon things either by decree, or extremely slowly under the leadership of a small class of influential individuals.
- The repetition of certain viewpoints can create imbalances in the dissemination of information and impact the formation of public opinion.
- Not all individuals have the tools to think about issues.
- The analysis of information is too time consuming for everyone to do, so we delegate that to researchers and analysts, whose works are often presented to the lay audience in journalism with no context or deeper examination.
- Opinions are presented as standalone statements rather than responses to other viewpoints, hindering the formation of context and discouraging a logical chain of thought that could lead to consensus.
Far from being a brownian-motion-esque method for social agreement, modern discourse has mimicked history - influential people share their arguments, and the people discuss short snippets of their views with the other people of the masses in online forums, never fully reaching an agreement. Meanwhile, technology keeps progressing, dissatisfaction rises, and the sparks of real conflict are lit.
We need a new communication framework that allows billions of humans to rapidly coalesce around consensus on global issues. Without consensus, technological advancements, such as AGI development, will proceed while society focuses on its immediate impacts, like labor market restructuring.
There exists a huge risk of suffering when large swathes of the population are forced to adopt a human identity that they had little say in. This has little to do with the idea that liberty is happiness and is distinct from the suffering experienced under totalitarianism. Rather, human identity is profoundly linked to everything, and its change needs to be a new identity accepted by all.
The inability to form a consensus as a global society of individuals also hampers our ability to address threats in a timely manner, as exemplified by multinational efforts in combating climate change.
How the erosion of human identity human leads to class warfare
The emergence of AGI is likely to create a division between those who can match its capabilities and those who cannot, leading to class warfare, as the threshold for what makes a human output worth increases.
Take the example of text2image and text2video models in 2023. They remove the need to learn how to draw and paint, but users still have to have an extremely detailed idea of what they want and represent that idea in text. Text2video models might make it easy for anyone to make a panning shot of a can of soup, but if you want to make a commercial for soup and have no idea how a commercial of soup should look like, or how many shots and angles it would take, the text2video model is useless. Assuming that not every cinematographer and director has novel ideas, laypeople using these models may replace some creative professionals, placing higher demands on humans to have unique insights.
In a market-driven society, an individual's worth is often determined by their productive capabilities, which are closely tied to their intelligence. As AGI advances and sets higher thresholds for competition, individuals who cannot keep up may be left behind.
If we extend this to a future AGI that can perform all intellectual tasks, there may still be tasks where humans outperform AGI due to human attempts to boost intelligence to match the abilities of AGI as it constantly improves. Given their superior financial resources and connections, the upper echelons of society are the most likely to access these performance-enhancing technologies, thereby creating an underclass of unemployed or underemployed individuals, who will face financial insecurity. Seeing that intelligence holds a significant place in human identity, they may also face the prospect of shame and dehumanization. These consequences are fertile ground for intense disaffection that could give rise to political instability and possibly violence, as the underclass seek to correct widening disparities, while the upper class seek to secure their privilege.
We can expect this class warfare to be an event that repeats across generations. As AGI improves, humans will attempt to enhance their performance to close that gap, before AGI advances again. The squeezing out of humans will occur repeatedly as human limits reach the limits of AGI again and again. As the fight to stay relevant in a market economy recurs throughout generations, class warfare is likely to recur, especially in cultures that emphasize individual action and have low expectations of responsibility from the ruling class.
Constitutional AI, shifting morality, and won’t laws work on an AGI that is basically a human?
Constitutional AI has been suggested by Anthropic as a more reliable method of alignment as AI scales and capabilities emerge that humans might be unable to detect. Instead of giving it rewards and running the risk of misspecifying rewards, resulting in an AI that does something morally wrong for the sake of its goals, we place a moral boundary around AGI.
Using our current laws as a foundation for constitutional AI is more effective than creating a new moral code. Civil and criminal laws, national constitutions, and human rights laws codify and enforce societal moral codes through court hearings and verdicts.
Crafting a bespoke moral code for AGI would have unintended consequences. A constitution consisting of only positive statements and without the detractions and caveats that exist in law and legal precedents would be too permissive. The attempt to simplify our moral code into a few statements would also be prone to missing out on key considerations. By binding AGI to the same proxy of moral code as we do humans, we ensure that their values are intimately aligned with ours.
If these laws can guide the moral behavior of a country, it is reasonable to expect AGI, possessing human-level intelligence, to adhere to the same laws. Similar moral tensions and contradictions exist in human decision-making, and humans navigate them successfully. For example, self defense resulting in murder is only valid if the person was defending themselves from serious harm. Similarly, if we bind AGI to the same laws that bind humans, the AGI would prioritize certain wrongnesses over others just as humans would.
What about ‘vagueness’ in these laws?
Concerns about vagueness in laws arise from unresolved tensions between competing value systems. Cases heard at the US Supreme Court or the European Court of Human Rights reflect such unresolved tensions. Several cases come to mind as illustrations. Kokkinakis v. Greece is the case of whether proselytizing goes against religious freedom, and Jersild v. Denmark is the case of the role of the journalist in abetting harassment balanced against the freedom of the press. In each of these cases, verdicts that read more like opinions on morality were passed, clarifying how moral codes should be applied in practice in these specific instances. The likelihood of an AGI acting dangerously within these vagueness boundaries is small, mirroring the incremental nature of human conflict resolution.
Our laws also address shifting morality through updates, and if these rules bind humans, they also bind AGIs. This approach helps to prevent the need to update the moral codes of AGIs.
Hence, I refute the idea that AGIs will tend to seek power due to instrumental convergence. It represents a way of looking at AGI that is way too dependent on RL conceptualizations of reward directed learning. Although training creates a goal for AI, an AGI that learns a multitude of goals using a multitude of rewards would be able to average them out, just like how a human learns about the nuances of the world through experience.
I also point out how an effective consensus-creating global discourse framework as described above can help resolve these contentions efficiently. In Malcolm Evans’ Religious Liberty and International Law in Europe, he writes, “Clearly the time is not yet ripe for a convention: not because of the unwillingness of States to adopt such an instrument, but because of the reluctance of the international community to accept that in the religious beliefs of others the dogmas of human rights are met with an equally powerful force which must be respected, not overcome.” Society stalls when things are open for discussion, and moves when it agrees.
Utopias of Progress versus Utopias of righteousness, and how AGI squashes cultural diversity
A tension exists between utopias of progress and utopias of righteousness, with implications for cultural diversity. AGI is often associated with progress, efficiency, and the potential to surpass human limitations. The discussion about AGI in today’s society is often accompanied by the idea of human flourishing, implicitly advocating for utopias of progress. Additionally, societies that view pushing boundaries and technological advancement as an ideal would tend to adopt AGI.
In contrast, utopias of righteousness prioritize moral principles, viewing the use of AGI and technological progress as secondary. However, technocentric perspectives tend to overshadow non-technocentric worldviews, as societies emphasizing technology are more likely to thrive in conflicts.
This carries the risk of cultural homogenization, where diverse cultural narratives are marginalized or assimilated into a dominant mainstream culture driven by AGI. As cultural diversity is considered beneficial for long-term societal development, the prevalence of cultural monoculture poses a potential existential threat.
Conclusion: Timelines and probability
- It is more important to predict and control for existential catastrophe events right around the invention of AGI, as we have the most leverage on that.
- We have a primitive global discourse framework → Failure to create a new human identity acceptable by all & Risk of class warfare
- Constitutional AI can decrease the risk of existential catastrophe. Constitutional AI should mimic our current laws that bind humans.
- AGI development may decrease cultural diversity as it is associated with a utopia of progress, increasing risk of existential catastrophe
- Probability of a primitive discourse framework: Highly likely
- Probability of class warfare: Likely
- Probability of a lack of consensus on a new human identity, resulting in suffering: Highly likely
Clifford, J., & Marcus, G. E. (2011). Writing culture: The poetics and politics of ethnography. University of California Press.
Brown et. al (2020). Large Language Models are Few Shot Learners. arXiv:2005.14165 [cs.CL].
Liao, Chen & Du. (2022). Concept Understanding in Large Language Models: An Empirical Study. https://openreview.net/forum?id=losgEaOWIL7
By 2070, most societies are expected to still be market-driven economies. Additionally, as long as humans are not completely pushed out of any production of value by AI, and have some space to produce goods of value, it is likely that society will still expect its members to produce value and trade value.
Bai et. al (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073 [cs.CL]