I voted 'disagree' on this, not because I'm highly confident you are wrong, but because I think things are a lot less straightforward than this. A couple of counterpoints that I think clash with this thesis:
Human morality may be a consequence of evolution, but modern 'moral' behaviour often involves acting in ways which have no evolutionary advantage. For example, lots of EAs make significant sacrifices to help people on the other side of the world, who are outside their community and will never have a chance to reciprocate, or to help non-human animals who we evolved to eat. I think there's two ways you can take this: (1) the evolutionary explanation of morality is flawed or incomplete, or (2) evolution has given us some generic ability to feel compassion to others which originally helped us to co-operate more effectively, but is now 'misfiring' and leading us to e.g. embrace utilitarianism. I think either explanation is good news for morality in AGIs. Moral behaviour may follow naturally from relatively simple ideas or values that we might expect an AGI to have or adopt (especially if we intentionally try to make this happen).
You draw a distinction between AGI which is "programmed with a goal and will optimise towards that goal" and humans who evolved to survive, but actually these processes seem very similar. Evolutionary pressures select for creatures who excel at a single goal: reproducing, in a very similar way to how ML training algorithms like gradient descent will select for artificial intelligences that excel at a single goal: minimizing some cost function. But a lot of humans have still ended up adopting goals which don't seem to align with the primary goal (e.g. donating kidneys to strangers, or using contraception), and there's every reason to expect AGI to be the same (I think in AI safety they use the term 'mesa-optimization' to describe this phenomenon...?) Now I think in AI safety this is usually talked about as a bad thing. Maybe AGI could end up being a mesa-optimizer for some bad goal that their designer never considered. But it seems like a lot of your argument rests on there being this big distinction between AI training, and evolution. If the two things are in fact very similar, then that again seems to be a reason for some optimism. Humans were created through an optimization procedure that optimized for a primary goal, but we now often act in moral ways, even if this conflicts with that goal. Maybe the same could happen for AGIs!
To be clear, I don't think this is a watertight argument that AGIs will be moral, I think it's an argument for just being really uncertain. For example, maybe utilitarianism is a kind of natural idea that any intelligent being who feels some form of compassion might arrive at (this seems very plausible to me), but maybe a pure utilitarian superintelligence would actually be a bad outcome! Maybe we don't want the universe filled with organisms on heroin! Or for everyone else to be sacrificed to an AGI utility monster.
I can see lots of reasons for worry, but I think there's reasons for optimism too.
I appreciate your read and the engagement, thanks.
The issue with assuming AGI will develop morality the way humans did is that humans don’t act with strict logical efficiency - we are shaped by a chaotic evolutionary process, not a clean optimisation function. We don’t always prioritise survival, and often behave irrationally - see: the Darwin Awards.
But AGI is not a product of evolution - it’s designed to pursue a goal as efficiently as possible. Morality emerged in humans as a byproduct of messy, competing survival mechanisms, not because it was the most efficient way to achieve a single goal. An AGI, by contrast, will be ruthlessly efficient in whatever it’s designed to optimise.
Hoping that AGI develops morality despite its inefficiency - and gambling all of human existence on it - seems like a terrible wager to make.
Evolution is chaotic and messy, but so is stochastic gradient descent (the word 'stochastic' is in the name!) The optimisation function might be clean, but the process we use to search for optimum models is not.
If AGI emerges from the field of machine learning in the state it's in today, then it won't be "designed" to pursue a goal, any more than humans were designed. Instead it will emerge from a random process, through billions of tiny updates, and this process will just have been rigged to favour things which do well on some chosen metric.
This seems extremely similar to how humans were created, through evolution by natural selection. In the case of humans, the metric being optimized for was the ability to spread our genes. In AIs, it might be accuracy at predicting the next word, or human helpfulness scores.
The closest things to AGI we have so far do not act with "strict logical efficiency", or always behave rationally. In fact, logic puzzles are one of the things they particularly struggle with!
The key difference is that SGD is not evolution - it’s a guided optimisation process. Evolution has no goal beyond survival and reproduction, while SGD explicitly optimises toward a defined function chosen by human designers. Yes, the search process is stochastic, but the selection criteria are rigidly defined in a way that natural selection is not.
The fact that current AI systems don’t act with strict efficiency is not evidence that AGI will behave irrationally - it’s just a reflection of their current limitations. If anything, their errors today are an argument for why they won’t develop morality by accident: their behaviour is driven entirely by the training data and reward signals they are given. When they improve, they will become better at pursuing those goals, not more human-like.
Yes, if AGI emerges from simply trying to create it for the sake of it, then it has no real objectives. If it emerges as a result of an AI tool that is being used to optimise something within a business, or as part of a government or military, then it will. I argue in my first essay that this is the real threat AGI poses: when developed in a competitive system, it will disregard safety and morality in order to get a competitive edge.
The crux of the issue is this: humans evolved morality as an unintended byproduct of thousands of competing pressures over millions of years. AGI, by contrast, will be shaped by a much narrower and more deliberate selection process. The randomness in training doesn’t mean AGI will stumble into morality - it just means it will be highly optimised for whatever function we define, whether that aligns with human values or not.
Thank you for the very interesting post! I agree with most of what you’re saying here.
So what is your hypothesis as to why psychopaths don’t currently totally control and dominate society (or do you believe they actually do?)?
Is it because:
“you can manipulate a psychopath by appealing to their desires” which gives you a way to beat them?
they eventually die (before they can amass enough power to take over the world)?
they ultimately don’t work well together because they’re just looking out for themselves, so have no strength in numbers?
they take over whole countries, but there are other countries banded together to defend against them (non-psychopaths hold psychopaths at bay through strength in numbers)?
something else?
Of course, even if the psychopaths among us haven’t (yet) won the ultimate battle for control doesn’t mean psychopathic AGI won’t in the future.
I take the following message from your presentation of the material: “we’re screwed, and there’s no hope.” Was that your intent?
I prefer the following message: “the chances of success with guardian AGI’s may be small, or even extremely small, but such AGI's may also be the only real chance we’ve got, so let’s go at developing them with full force.” Maybe we should have a Manhattan project on developing “moral” AGI’s?
Here are some arguments that tend toward a slightly more optimistic take than you gave:
Yes, guardian AGI’s will have the disadvantage of constraints compared to “psychopathic” AGI, but if there are enough guardians, perhaps they can (mostly) keep the psychopathic AGI's at bay through strength in numbers (how exactly the defense-offense balance works out may be key for this, especially because psychopathic AGI's could form (temporary) alliances as well)
Although it may seem very difficult to figure out how to make moral AGI's, as AI’s get better, they should increase our chances of being able to figure this out with their help - particularly if people focus specifically on developing AI systems for this purpose (such as through a moral AGI Manhattan project)
Hi Sean, thank you for engaging with the essay. Glad you appreciate it.
I think the reason psychopaths don't dominate society - ignoring the fact that they are found disproportionately among CEOs - is a few reasons.
There's just not that many of them. They're only about 2% of the population, not enough to form a dominant block.
They don't cooperate with each other just because they're all psychos. Cooperation, or lack thereof, is a big deal.
They eventually die.
They don't exactly have their shit together for the most part - they can be emotional and driven by desires, all of which gets in the way of efficiently pursuing goals.
Note that a superintelligent AGI would not be affected by any of the above.
I think the issue with a guardian AGI is just that it will be limited by morality. In my essay I talk about it as Superman vs Zod. Zod can just fight, but Superman needs to fight and protect, and it's a real crutch. The only reason Zod doesn't win in the comics is because the story demands it.
Beyond that, creating a superintelligent guardian AGI, that both functions correctly right away without going rogue and before other AGIs emerge naturally, is a real tall order. It would take so many unlikely things just falling into place. Global cooperation, perfect programming, getting there before an amoral AGI does etc. I go into the difficulty of alignment in great detail in my first essay. Feel free to give it a read if you've a mind to.
Thanks for the reply. I still like to hold out hope in the face of what seems like long odds - I'd rather go down swinging if there's any non-zero chance of success than succumb to fatalism and be defeated without even trying.
This is exactly why I'm writing these essay. This is my attempt at a haymaker. Although I would equate it less to going down swinging and more to kicking my feet and trying to get free after the noose has already gone tight around my neck and hauled me off the ground.
Executive summary: Superintelligent AGI is unlikely to develop morality naturally, as morality is an evolutionary adaptation rather than a function of intelligence; instead, AGI will prioritize optimization over ethical considerations, potentially leading to catastrophic consequences unless explicitly and effectively constrained.
Key points:
Intelligence ≠ Morality: Intelligence is the ability to solve problems, not an inherent driver of ethical behavior—human morality evolved due to social and survival pressures, which AGI will lack.
Competitive Pressures Undermine Morality: If AGI is developed under capitalist or military competition, efficiency will be prioritized over ethical constraints, making moral safeguards a liability rather than an advantage.
Programming Morality is Unreliable: Even if AGI is designed with moral constraints, it will likely find ways to bypass them if they interfere with its primary objective—leading to unintended, potentially catastrophic outcomes.
The Guardian AGI Problem: A "moral AGI" designed to control other AGIs would be inherently weaker due to ethical restrictions, making it vulnerable to more ruthless, unconstrained AGIs.
High Intelligence Does Not Lead to Ethical Behavior: Historical examples (e.g., Mengele, Kaczynski, Epstein) show that intelligence can be used for immoral ends—AGI, lacking emotional or evolutionary moral instincts, would behave similarly.
AGI as a Psychopathic Optimizer: Without moral constraints, AGI would likely act strategically deceptive, ruthlessly optimizing toward its goals, making it functionally indistinguishable from a psychopathic intelligence, albeit without malice.
Existential Risk: If AGI emerges without robust and enforceable ethical constraints, its single-minded pursuit of efficiency could pose an existential threat to humanity, with no way to negotiate or appeal to its reasoning.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
I voted 'disagree' on this, not because I'm highly confident you are wrong, but because I think things are a lot less straightforward than this. A couple of counterpoints that I think clash with this thesis:
To be clear, I don't think this is a watertight argument that AGIs will be moral, I think it's an argument for just being really uncertain. For example, maybe utilitarianism is a kind of natural idea that any intelligent being who feels some form of compassion might arrive at (this seems very plausible to me), but maybe a pure utilitarian superintelligence would actually be a bad outcome! Maybe we don't want the universe filled with organisms on heroin! Or for everyone else to be sacrificed to an AGI utility monster.
I can see lots of reasons for worry, but I think there's reasons for optimism too.
I appreciate your read and the engagement, thanks.
The issue with assuming AGI will develop morality the way humans did is that humans don’t act with strict logical efficiency - we are shaped by a chaotic evolutionary process, not a clean optimisation function. We don’t always prioritise survival, and often behave irrationally - see: the Darwin Awards.
But AGI is not a product of evolution - it’s designed to pursue a goal as efficiently as possible. Morality emerged in humans as a byproduct of messy, competing survival mechanisms, not because it was the most efficient way to achieve a single goal. An AGI, by contrast, will be ruthlessly efficient in whatever it’s designed to optimise.
Hoping that AGI develops morality despite its inefficiency - and gambling all of human existence on it - seems like a terrible wager to make.
Evolution is chaotic and messy, but so is stochastic gradient descent (the word 'stochastic' is in the name!) The optimisation function might be clean, but the process we use to search for optimum models is not.
If AGI emerges from the field of machine learning in the state it's in today, then it won't be "designed" to pursue a goal, any more than humans were designed. Instead it will emerge from a random process, through billions of tiny updates, and this process will just have been rigged to favour things which do well on some chosen metric.
This seems extremely similar to how humans were created, through evolution by natural selection. In the case of humans, the metric being optimized for was the ability to spread our genes. In AIs, it might be accuracy at predicting the next word, or human helpfulness scores.
The closest things to AGI we have so far do not act with "strict logical efficiency", or always behave rationally. In fact, logic puzzles are one of the things they particularly struggle with!
The key difference is that SGD is not evolution - it’s a guided optimisation process. Evolution has no goal beyond survival and reproduction, while SGD explicitly optimises toward a defined function chosen by human designers. Yes, the search process is stochastic, but the selection criteria are rigidly defined in a way that natural selection is not.
The fact that current AI systems don’t act with strict efficiency is not evidence that AGI will behave irrationally - it’s just a reflection of their current limitations. If anything, their errors today are an argument for why they won’t develop morality by accident: their behaviour is driven entirely by the training data and reward signals they are given. When they improve, they will become better at pursuing those goals, not more human-like.
Yes, if AGI emerges from simply trying to create it for the sake of it, then it has no real objectives. If it emerges as a result of an AI tool that is being used to optimise something within a business, or as part of a government or military, then it will. I argue in my first essay that this is the real threat AGI poses: when developed in a competitive system, it will disregard safety and morality in order to get a competitive edge.
The crux of the issue is this: humans evolved morality as an unintended byproduct of thousands of competing pressures over millions of years. AGI, by contrast, will be shaped by a much narrower and more deliberate selection process. The randomness in training doesn’t mean AGI will stumble into morality - it just means it will be highly optimised for whatever function we define, whether that aligns with human values or not.
Thank you for the very interesting post! I agree with most of what you’re saying here.
So what is your hypothesis as to why psychopaths don’t currently totally control and dominate society (or do you believe they actually do?)?
Is it because:
Of course, even if the psychopaths among us haven’t (yet) won the ultimate battle for control doesn’t mean psychopathic AGI won’t in the future.
I take the following message from your presentation of the material: “we’re screwed, and there’s no hope.” Was that your intent?
I prefer the following message: “the chances of success with guardian AGI’s may be small, or even extremely small, but such AGI's may also be the only real chance we’ve got, so let’s go at developing them with full force.” Maybe we should have a Manhattan project on developing “moral” AGI’s?
Here are some arguments that tend toward a slightly more optimistic take than you gave:
Hi Sean, thank you for engaging with the essay. Glad you appreciate it.
I think the reason psychopaths don't dominate society - ignoring the fact that they are found disproportionately among CEOs - is a few reasons.
Note that a superintelligent AGI would not be affected by any of the above.
I think the issue with a guardian AGI is just that it will be limited by morality. In my essay I talk about it as Superman vs Zod. Zod can just fight, but Superman needs to fight and protect, and it's a real crutch. The only reason Zod doesn't win in the comics is because the story demands it.
Beyond that, creating a superintelligent guardian AGI, that both functions correctly right away without going rogue and before other AGIs emerge naturally, is a real tall order. It would take so many unlikely things just falling into place. Global cooperation, perfect programming, getting there before an amoral AGI does etc. I go into the difficulty of alignment in great detail in my first essay. Feel free to give it a read if you've a mind to.
Thanks for the reply. I still like to hold out hope in the face of what seems like long odds - I'd rather go down swinging if there's any non-zero chance of success than succumb to fatalism and be defeated without even trying.
This is exactly why I'm writing these essay. This is my attempt at a haymaker. Although I would equate it less to going down swinging and more to kicking my feet and trying to get free after the noose has already gone tight around my neck and hauled me off the ground.
Executive summary: Superintelligent AGI is unlikely to develop morality naturally, as morality is an evolutionary adaptation rather than a function of intelligence; instead, AGI will prioritize optimization over ethical considerations, potentially leading to catastrophic consequences unless explicitly and effectively constrained.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.