Hide table of contents

What is the counterfactual to an aligned AI?

How does the extinction risk of misaligned AI compare to the suffering risk of an aligned AI?


What is the counterfactual to an aligned AI?

I think of the counterfactual to an aligned AI, as being a misaligned AI. There has arguably been a significant amount of research that has gone into highlighting the benefits of an aligned AI and the risks of a misaligned AI, but how much has there been for the counterfactual? How much research has happened towards the benefits of a misaligned AI, and the risks of an aligned AI?


Particularly, how does the extinction risk of misaligned AI compare to the suffering risk of an aligned AI?

Often misaligned AI is referenced as a global catastrophic risk, as relates to extinction. There are a lot of resources on the subject. However, I want to delve into how significant is the suffering risk (ie, an unrecoverable dystopia) posed by an aligned AI?

My hypothesis prior to research on the subject is that the suffering risk from aligned AI is likely significantly more probable (and likely more neglected, albeit maybe for good reasons) than extinction risk from misaligned AI. At the center of this hypothesis is the question of the “safeness” to whose commands the AI is aligned to.

The hypothesis is that more than likely the entity controlling the AI will be a resource-rich board and executive team of whichever for-profit Silicon Valley Corporation builds the AI first. Imagine what it would look like when this corporation or handful of individuals controls a practically omnipotent, omniscient tool like transformative AI. 

We could ask what are this group’s motivations today? By their nature it is to maximize profits. How will that group’s motivations evolve if they were to in essence achieve control of the whole world? Would their motivations change, and if so why? What would likely change, would be that they can no longer be checked by the rest of the world (which arguably doesn’t happen enough even today). 

Hypothetically, how confident are you in any single individual’s judgement if they have absolute power in the world? Is there such an individual or group you would trust with this responsibility today? I want to highlight an old quote “Power tends to corrupt, absolute power corrupts absolutely.” A transformative AI would provide just that: absolute power. Is there is an individual or small group you trust completely today with absolute power? If so, can you be certain you will be able to trust future generations to whom this power is passed on? Or trust a process by which such power is transferred?

Today we barely have systems in place that can control some of the biggest corporations in the world. Instead, these corporations end up controlling even the most powerful governments in the world. There would no longer be any feasible checks for whomever is controlling the AI.

Taking all this into account the hypothesis states that creating an aligned transformative AI will very likely lead to a suffering risk and an unrecoverable dystopia. 

So the endgame question might become: Does humanity have better odds with aligned AI, or with unaligned AI? Which do you trust more, random all-powerful individual(s) to not abuse absolute power, or a transformative AI to not destroy everything?

Please share with me some thoughts to the contrary, I would love to see aligned AI in a more positive light, not based on its positives, but based on its risks.



As the title says, I’m (embarrassingly) new to AI Safety. It’s November 2023 as I’m writing this, and even the initial thoughts which make up this post only originated a couple of months ago. Four months ago, I can truthfully say I only ever thought about AI Safety only as science fiction.


It has been a rollercoaster of an initiation, which began with taking the Intro to EA Virtual Program with CEA this past summer. That experience cracked the door open, and going to EAGxNYC and following the EA forum since then has felt somewhat like the floodgates opening.


In that time I’ve tried to learn more, and will make it a point to continue to learn, most likely though more official courses in the future. My point is that there is so much to learn and catch up on, especially for someone who does not have a software, deep learning, or machine learning background that it often feels overwhelming. It also feels confusing, and filled with questions that seem like they have easy answers, but not necessarily to new arrivals on the scene (like myself). 


I wanted to share some of these ideas, thoughts, but especially questions to have a discussion around them, and to get some feedback/answers, especially by those who have greater technical savvy and experience with AI Safety. My interest and draw to AI Safety is proportionally opposite my expertise, and even from the first articles covering AI alignment or how DL models work I’ve felt that I should do more in this cause area. Since I don’t have the experience or skills to do anything directly, my best way of giving back and making any impact is by sharing ideas, so I’m running with it.





More posts like this

Sorted by Click to highlight new comments since: Today at 2:02 PM

There's a fair amount of discussion in AI alignment about what outer alignment requires, and how it's not just pursuing goals of a single person who is supposed to be in control.

As a few examples, you might be interested in some of these:



I appreciate you sharing these! I've already started to read them

Hey Andreas! Thanks for writing this up, it was a really interesting read and I'm glad you shared it! 

Some quick rambling thoughts after reading:

I think some of the distinctions might be semantic - some of what you describe would fall under misuse risk/malicious use, which could indeed be a real problem (If an AI is causing harm because it its values are aligned with a malicious human, is it aligned or misaligned overall? I'm not sure, but the human alignment problem seems to be the issue here) - and I'm not sure how to weight that against the risk of unaligned AI. I think given that we are nowhere close to solving the alignment problem, people tend to assume that if we have AGI, it will be misaligned "by definition". In terms of s-risks, I would really recommend checking out the work of CLR, as they seem to be the ones who spent most time thinking about s-risks.  I think they also have a course on s-risks coming up sometime!

Awesome thank for the links and thoughts. I have actually been debating applying to the s risk fellowship, but with your mention finally applied.

Agreed that the big picture falls under the human alignment / malicious human use. It's likely the area which has been more researched historically, and I need to delve deeper into it. I've been putting off being more involved in LessWrong, but I will now make an account there as well with your highlight. 

Thank you

Executive summary: The risk of suffering from an aligned AI controlled by a profit-seeking entity may be higher than the extinction risk from a misaligned AI.

Key points:

  1. An aligned AI controlled by a corporation risks being used to maximize profits without checks and balances. This could lead to dystopia.
  2. Absolute power granted by an aligned AI risks corrupting those in control, with no way to transfer power safely.
  3. Today's corporations already control governments; an aligned AI would remove any remaining checks on their power.
  4. Random all-powerful individuals with an aligned AI may be more dangerous than a misaligned AI.
  5. More analysis is needed on the potential suffering enabled by aligned AI rather than just extinction risks.
  6. The author is new to AI safety and wants feedback, especially from technical experts, on these ideas and questions.


This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week