Hide table of contents

What is the counterfactual to an aligned AI?

How does the extinction risk of misaligned AI compare to the suffering risk of an aligned AI?

 

What is the counterfactual to an aligned AI?

I think of the counterfactual to an aligned AI, as being a misaligned AI. There has arguably been a significant amount of research that has gone into highlighting the benefits of an aligned AI and the risks of a misaligned AI, but how much has there been for the counterfactual? How much research has happened towards the benefits of a misaligned AI, and the risks of an aligned AI?

 

Particularly, how does the extinction risk of misaligned AI compare to the suffering risk of an aligned AI?

Often misaligned AI is referenced as a global catastrophic risk, as relates to extinction. There are a lot of resources on the subject. However, I want to delve into how significant is the suffering risk (ie, an unrecoverable dystopia) posed by an aligned AI?

My hypothesis prior to research on the subject is that the suffering risk from aligned AI is likely significantly more probable (and likely more neglected, albeit maybe for good reasons) than extinction risk from misaligned AI. At the center of this hypothesis is the question of the “safeness” to whose commands the AI is aligned to.

The hypothesis is that more than likely the entity controlling the AI will be a resource-rich board and executive team of whichever for-profit Silicon Valley Corporation builds the AI first. Imagine what it would look like when this corporation or handful of individuals controls a practically omnipotent, omniscient tool like transformative AI. 

We could ask what are this group’s motivations today? By their nature it is to maximize profits. How will that group’s motivations evolve if they were to in essence achieve control of the whole world? Would their motivations change, and if so why? What would likely change, would be that they can no longer be checked by the rest of the world (which arguably doesn’t happen enough even today). 

Hypothetically, how confident are you in any single individual’s judgement if they have absolute power in the world? Is there such an individual or group you would trust with this responsibility today? I want to highlight an old quote “Power tends to corrupt, absolute power corrupts absolutely.” A transformative AI would provide just that: absolute power. Is there is an individual or small group you trust completely today with absolute power? If so, can you be certain you will be able to trust future generations to whom this power is passed on? Or trust a process by which such power is transferred?

Today we barely have systems in place that can control some of the biggest corporations in the world. Instead, these corporations end up controlling even the most powerful governments in the world. There would no longer be any feasible checks for whomever is controlling the AI.

Taking all this into account the hypothesis states that creating an aligned transformative AI will very likely lead to a suffering risk and an unrecoverable dystopia. 

So the endgame question might become: Does humanity have better odds with aligned AI, or with unaligned AI? Which do you trust more, random all-powerful individual(s) to not abuse absolute power, or a transformative AI to not destroy everything?

Please share with me some thoughts to the contrary, I would love to see aligned AI in a more positive light, not based on its positives, but based on its risks.

 

PS

As the title says, I’m (embarrassingly) new to AI Safety. It’s November 2023 as I’m writing this, and even the initial thoughts which make up this post only originated a couple of months ago. Four months ago, I can truthfully say I only ever thought about AI Safety only as science fiction.

 

It has been a rollercoaster of an initiation, which began with taking the Intro to EA Virtual Program with CEA this past summer. That experience cracked the door open, and going to EAGxNYC and following the EA forum since then has felt somewhat like the floodgates opening.

 

In that time I’ve tried to learn more, and will make it a point to continue to learn, most likely though more official courses in the future. My point is that there is so much to learn and catch up on, especially for someone who does not have a software, deep learning, or machine learning background that it often feels overwhelming. It also feels confusing, and filled with questions that seem like they have easy answers, but not necessarily to new arrivals on the scene (like myself). 

 

I wanted to share some of these ideas, thoughts, but especially questions to have a discussion around them, and to get some feedback/answers, especially by those who have greater technical savvy and experience with AI Safety. My interest and draw to AI Safety is proportionally opposite my expertise, and even from the first articles covering AI alignment or how DL models work I’ve felt that I should do more in this cause area. Since I don’t have the experience or skills to do anything directly, my best way of giving back and making any impact is by sharing ideas, so I’m running with it.

Comments5


Sorted by Click to highlight new comments since:

There's a fair amount of discussion in AI alignment about what outer alignment requires, and how it's not just pursuing goals of a single person who is supposed to be in control.

As a few examples, you might be interested in some of these:
https://www.alignmentforum.org/posts/Cty2rSMut483QgBQ2/what-should-ai-owe-to-us-accountable-and-aligned-ai-systems

https://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ASIMOV2021-REUTH.pdf 

https://www.lesswrong.com/posts/Tmvvvx3buP4Gj3nZK/learning-societal-values-from-law-as-part-of-an-agi 

I appreciate you sharing these! I've already started to read them

Hey Andreas! Thanks for writing this up, it was a really interesting read and I'm glad you shared it! 

Some quick rambling thoughts after reading:

I think some of the distinctions might be semantic - some of what you describe would fall under misuse risk/malicious use, which could indeed be a real problem (If an AI is causing harm because it its values are aligned with a malicious human, is it aligned or misaligned overall? I'm not sure, but the human alignment problem seems to be the issue here) - and I'm not sure how to weight that against the risk of unaligned AI. I think given that we are nowhere close to solving the alignment problem, people tend to assume that if we have AGI, it will be misaligned "by definition". In terms of s-risks, I would really recommend checking out the work of CLR, as they seem to be the ones who spent most time thinking about s-risks.  I think they also have a course on s-risks coming up sometime!

Awesome thank for the links and thoughts. I have actually been debating applying to the s risk fellowship, but with your mention finally applied.

Agreed that the big picture falls under the human alignment / malicious human use. It's likely the area which has been more researched historically, and I need to delve deeper into it. I've been putting off being more involved in LessWrong, but I will now make an account there as well with your highlight. 

Thank you

Executive summary: The risk of suffering from an aligned AI controlled by a profit-seeking entity may be higher than the extinction risk from a misaligned AI.

Key points:

  1. An aligned AI controlled by a corporation risks being used to maximize profits without checks and balances. This could lead to dystopia.
  2. Absolute power granted by an aligned AI risks corrupting those in control, with no way to transfer power safely.
  3. Today's corporations already control governments; an aligned AI would remove any remaining checks on their power.
  4. Random all-powerful individuals with an aligned AI may be more dangerous than a misaligned AI.
  5. More analysis is needed on the potential suffering enabled by aligned AI rather than just extinction risks.
  6. The author is new to AI safety and wants feedback, especially from technical experts, on these ideas and questions.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
 ·  · 10m read
 · 
I wrote this to try to explain the key thing going on with AI right now to a broader audience. Feedback welcome. Most people think of AI as a pattern-matching chatbot – good at writing emails, terrible at real thinking. They've missed something huge. In 2024, while many declared AI was reaching a plateau, it was actually entering a new paradigm: learning to reason using reinforcement learning. This approach isn’t limited by data, so could deliver beyond-human capabilities in coding and scientific reasoning within two years. Here's a simple introduction to how it works, and why it's the most important development that most people have missed. The new paradigm: reinforcement learning People sometimes say “chatGPT is just next token prediction on the internet”. But that’s never been quite true. Raw next token prediction produces outputs that are regularly crazy. GPT only became useful with the addition of what’s called “reinforcement learning from human feedback” (RLHF): 1. The model produces outputs 2. Humans rate those outputs for helpfulness 3. The model is adjusted in a way expected to get a higher rating A model that’s under RLHF hasn’t been trained only to predict next tokens, it’s been trained to produce whatever output is most helpful to human raters. Think of the initial large language model (LLM) as containing a foundation of knowledge and concepts. Reinforcement learning is what enables that structure to be turned to a specific end. Now AI companies are using reinforcement learning in a powerful new way – training models to reason step-by-step: 1. Show the model a problem like a math puzzle. 2. Ask it to produce a chain of reasoning to solve the problem (“chain of thought”).[1] 3. If the answer is correct, adjust the model to be more like that (“reinforcement”).[2] 4. Repeat thousands of times. Before 2023 this didn’t seem to work. If each step of reasoning is too unreliable, then the chains quickly go wrong. Without getting close to co
 ·  · 11m read
 · 
My name is Keyvan, and I lead Anima International’s work in France. Our organization went through a major transformation in 2024. I want to share that journey with you. Anima International in France used to be known as Assiettes Végétales (‘Plant-Based Plates’). We focused entirely on introducing and promoting vegetarian and plant-based meals in collective catering. Today, as Anima, our mission is to put an end to the use of cages for laying hens. These changes come after a thorough evaluation of our previous campaign, assessing 94 potential new interventions, making several difficult choices, and navigating emotional struggles. We hope that by sharing our experience, we can help others who find themselves in similar situations. So let me walk you through how the past twelve months have unfolded for us.  The French team Act One: What we did as Assiettes Végétales Since 2018, we worked with the local authorities of cities, counties, regions, and universities across France to develop vegetarian meals in their collective catering services. If you don’t know much about France, this intervention may feel odd to you. But here, the collective catering sector feeds a huge number of people and produces an enormous quantity of meals. Two out of three children, more than seven million in total, eat at a school canteen at least once a week. Overall, more than three billion meals are served each year in collective catering. We knew that by influencing practices in this sector, we could reach a massive number of people. However, this work was not easy. France has a strong culinary heritage deeply rooted in animal-based products. Meat and fish-based meals remain the standard in collective catering and school canteens. It is effectively mandatory to serve a dairy product every day in school canteens. To be a certified chef, you have to complete special training and until recently, such training didn’t include a single vegetarian dish among the essential recipes to master. De
 ·  · 1m read
 · 
 The Life You Can Save, a nonprofit organization dedicated to fighting extreme poverty, and Founders Pledge, a global nonprofit empowering entrepreneurs to do the most good possible with their charitable giving, have announced today the formation of their Rapid Response Fund. In the face of imminent federal funding cuts, the Fund will ensure that some of the world's highest-impact charities and programs can continue to function. Affected organizations include those offering critical interventions, particularly in basic health services, maternal and child health, infectious disease control, mental health, domestic violence, and organized crime.