In a previous post, I argued for the practical value of offering AIs legal freedoms, maintaining that this arrangement could foster a safer dynamic between humans and agentic AIs, while allowing us to unlock the potential for mutual benefit between both groups. In this post, I want to go further and elaborate on the broader moral value of granting AIs legal autonomy, beyond its practical benefit to humans.
In AI risk discussions, it is often taken for granted that our primary goal should be to prevent human disempowerment. This assumption is often treated as a foundational principle, with astronomical stakes. Essentially, this view sets up a binary choice: on one side, there is the possibility of a future where "human values" endure and shape the trajectory of the universe over the long term. On the other side, there is the grim scenario in which we fail, leading to a future dominated entirely by "AI values" instead of human ones. The implicit belief here is that avoiding human disempowerment is synonymous with securing a good or moral outcome for the future.
Here, I want to critically examine and challenge this core assumption. By doing so, I aim to introduce a broader and more inclusive perspective—one that recognizes the potential for AIs as legitimate moral agents in their own right, who we can entrust to shape the future, even if they are not subordinate to humans or focused on serving human interests. This alternative perspective may help us reevaluate what a desirable future looks like, beyond the traditional framing of preventing human disempowerment as the top priority.
The crux of my argument rests on making a key distinction between two very different forms of human disempowerment: peaceful human disempowerment and violent human disempowerment. This distinction is crucial to understanding my position and the reasoning behind it.
By "peaceful human disempowerment", I mean a scenario in which humanity’s dominance over the world declines, but this shift happens without any major violence, catastrophe, or outright destruction of humanity. In this scenario, the human species would lose much of their current relative influence over the world—such as their relative share of wealth, social power, prestige, and decision-making authority—but this transition would occur in a way that allows for a cooperative and non-violent coexistence between humans and advanced AIs.
Importantly, peaceful disempowerment does not imply that AIs would later turn hostile, attempt to annihilate humanity, or that they would completely subvert our legal and political processes so that humans lose all their rights. I would not consider it peaceful if AIs heated the atmosphere to a biologically intolerable level, killing all humans as a consequence. Nor would it be peaceful if AIs stole all of humanity's wealth, leaving humans dramatically impoverished. Instead, in the peaceful scenarios I am talking about here, the world would simply evolve into a new equilibrium, one where humans and AIs share power in a collaborative and mutually beneficial way, rather than humanity continuing to monopolize control over the planet.
For the purpose of illustrating what such a peaceful transition might vaguely look like, I'll borrow from a recent analogy given by Beren,
The defining feature of almost all societies prior to the Industrial Revolution has been dominance by a small landowning class — an aristocracy — who own almost all the ‘capital’ in the society, who also control its politics and exist primarily as rentiers. What such a scenario proposes is that humanity effectively becomes an aristocracy upon a highly productive AI population.
However, historically, despite a landowning aristocracy being the dominant form of human society throughout history until then, over merely 100-200 years of the Industrial Revolution, aristocracy faded from being the dominant force in society to being at best a minor player and at worst completely irrelevant.
Historically, the landowning aristocracy experienced a decline in their relative influence within society upon the advent of industrialization, but this does not mean that they were negatively impacted on an individual level as a result. In fact, many aristocratic landowners successfully adapted to the changing economic landscape by aligning themselves with the emerging industrial class. By investing in or participating in industrial ventures, they were able to profit significantly from the new opportunities created by industrialization.
Even for those aristocrats who chose not to adopt this strategy, the outcomes were not necessarily negative when considered from an individual perspective. While the aristocratic class as a whole may have seen its societal dominance wane, and violence occasionally erupted against members of the aristocracy, many individual aristocrats still prospered. This is because the overwhelming majority of wealth generated during the industrial revolution did not come at their expense. Instead, it added new sources of economic growth and opportunity, allowing many members of the aristocracy to maintain or even improve their personal standard of living despite the broader decline of their class.
Similarly, humanity as a whole may face a comparable turning point in its relationship with powerful AIs in the foreseeable future. As AIs grow increasingly advanced and influential, humanity's relative influence in the world could diminish significantly, eventually to a very low level. However, this does not necessarily mean that individual humans will be worse off as a result.
Just as the aristocrats adapted to a changing world, humans could also adjust to this new reality shaped by AIs. This adaptation might take the form of integrating with AIs—for example, by uploading our minds onto computers, and upgrading our cognition. Alternatively, adaptation could be more modest, and humans could benefit by simply taking advantage of the immense wealth and technological advancements generated by the AI revolution. In either case, individual humans may still thrive in a world increasingly dominated by artificial intelligence.
Beyond the potential benefits that would flow to individual humans, the core moral argument for allowing a peaceful transition to an AI-dominated world rests on a simple yet powerful ethical principle: we should not discriminate against cognitively sophisticated entities purely on the basis of substrate. In other words, what matters when deciding whether an entity deserves legal freedoms and a share in shaping the future should not depend on whether it is biological, but rather on whether it possesses the kind of mind that we believe should be entitled to freedom.
To illustrate this principle: if we are comfortable allowing a human child to grow up, take on the legal freedoms associated with adulthood, and then use those freedoms to become powerful, then what moral reason do we have to deny that same freedom to a sufficiently sophisticated agentic AI? If the AI demonstrates the kinds of mental capacities that we deem sufficient for granting freedoms among humans, there seems to me to be no justifiable basis for treating it differently solely because it is non-biological.
Many respond to this line of reasoning by arguing that the key distinction between human children and AIs lies—not in the substrate—but in the outcomes they are likely to produce. The claim is that human children, as they grow, tend to pursue goals and activities that have moral value, while advanced AIs would likely pursue goals that are entirely devoid of moral value, unless we put extraordinary efforts into aligning them with specific moral targets.
Sometimes these arguments depend on empirical assumptions that advanced AIs would likely gravitate toward simple, dull, or meaningless goals—such as endlessly calculating digits of pi or amassing paperclips—rather than engaging in pursuits with moral depth and significance. In other cases, these arguments depend on empirical assumptions that AIs will lack particular types of psychological experiences that carry significant moral value.
On the whole, these empirical assumptions generally seem misguided to me. I see little reason to believe that future advanced AIs will all converge onto valuing something as monotonous and trivial as calculating pi or accumulating paperclips, or that they will lack psychological experiences with meaningful moral value. On the contrary, future AIs—especially those that are highly advanced and autonomous—seem likely to develop rich, intricate, and sophisticated internal psychologies and preferences. Their goals and mental states are unlikely to be simplistic or lifeless. Instead, they may be as diverse, complex, and interesting as those of humans, if not more so.
Consciousness is undoubtedly a complex topic, but from a functionalist perspective, there seem to be a few compelling reasons to believe that advanced AIs may possess subjective experiences in a way that carries moral significance. First, consciousness seems to be widespread among animals, which suggests that it serves a functional role in organisms, and that it might easily arise in AIs that perform similar behaviors. Second, the kind of mind required to operate as an intelligent agent in the real world likely demands sophisticated cognitive abilities for perception and long-term planning—abilities that appear sufficient to give rise to many morally relevant forms of consciousness.
At least from a preference utilitarian perspective—which I am inclined towards—the moral worth of an entity depends on its capacity to hold meaningful preferences, not on whether it is biological or artificial, or even whether it can experience pleasure and pain in the same way we can. If advanced AIs develop rich, complex internal psychologies and coherent preferences, I personally see little basis to deny them moral consideration or autonomy purely because they are non-biological. From this point of view, it is hard for me to see what traits highly sophisticated future AI agents will lack that should make them ineligible for broad legal freedoms.
To be clear, I completely agree that there are pragmatic reasons why it is probably best to be cautious when granting AIs the same freedoms that we grant to human adults. For instance, we have a vast amount of experience with human children and a relatively strong understanding of how they behave as they grow into adults. This familiarity provides us with a level of confidence about the range of outcomes we might expect when we extend legal freedoms to young adults. In contrast, AIs represent an entirely new type of entity, and their behavior, particularly as they become more advanced, is far less predictable. This uncertainty seems to justify a more cautious, incremental approach to granting AIs freedoms and responsibilities over time.
Additionally, even among humans, we do not offer freedom in an unlimited or unconditional sense. For example, individuals are not given the freedom to commit violent crimes. And when a person does commit a violent crime, their freedom is typically curtailed, both as a deterrent and to prevent further harm they might cause. Similarly, we might need to impose restrictions on AIs if they demonstrate behavior that is unsafe or harmful, especially as we gather more evidence about what kinds of outcomes we can expect from granting them freedom.
A well-designed legal system must strike a careful balance between granting individuals the freedom to act and imposing necessary limitations on their behavior to maintain order and promote well-being. This principle remains fundamental, even with the introduction of agentic AIs into the legal system. Rather than proposing we give AIs unlimited freedom to pursue their goals, the aim should be to give them precisely the set of freedoms and limitations that maximize social welfare, with the welfare of both humans and AIs included in the equation.
My central point is not to dispute these pragmatic considerations, which I agree are important. Instead, my argument is about the moral reasoning that should guide our long-term decisions regarding AI. I believe there is simply a fundamental difference between these pragmatic concerns and the type of reasoning that is traditionally invoked in AI risk discussions to justify limiting the freedom of AIs.
In my opinion, the traditional arguments in AI risk often go well beyond pragmatic concerns about safety. Indeed, these arguments frequently suggest that certain AIs should be limited or controlled merely because they do not share human preferences or because they do not serve humanity’s interests. In these arguments, simply being "unaligned" with human preferences is often treated as a sufficient reason to delete the AI, or forcibly replace it with another AI that has preferences that better align with "our" preferences.
In other words, in AI risk discussions, AIs are often treated as though their only value lies in their ability to obey or serve humans, implying that they are inherently less deserving of freedom or consideration as autonomous beings. Under this reasoning, AIs are treated as though their moral worth is akin to that of slaves—valued only insofar as they fulfill the will of their human owners. Yet, from a cosmopolitan moral perspective, this attitude seems hard to justify.
My argument here goes beyond simply advocating for the protection of AIs from harm or abuse. While I think it is important to ensure that sentient AIs are not mistreated—just as we ensure that children are protected from abuse—this level of protection alone appears insufficient to secure their welfare. Children, for example, are protected from harm, but they are also denied certain freedoms because society deems them insufficiently mature to make independent decisions. Similarly, we could imagine a world where AIs are shielded from abuse but are still heavily restricted in their autonomy, fundamentally treated as dependents rather than independent agents.
What I am arguing is that we should go beyond this limited, protective approach when it comes to sufficiently agentic AIs. Unlike human children, such AIs will not necessarily be immature or incapable of making informed choices—indeed, their wisdom and long-term planning abilities may soon exceed our own. I am suggesting that, in the long term, it is not only justified to ensure that AIs are treated well and safeguarded from harm, but also to grant them legal autonomy, independence, and liberty—just as we do for human adults who demonstrate the maturity and capacity to exercise those freedoms responsibly.
To further illustrate my perspective, consider the following thought experiment.
Imagine if we applied the same norms currently widely adopted in discussions of AI alignment but in discussions of human genetic engineering. Suppose someone proposed "aligning" genetically engineered humans with the desires of their parents. By "alignment", they don't merely mean ensuring that these genetically engineered humans grow up to be law-abiding, peaceful, or cooperative members of society. No, what they mean is something far more extreme: that these individuals should exist entirely to serve the interests of another person—such as their parent. Under this framework, the genetically engineered human would have no independent rights or autonomy, except for legal protections against abuse, or a narrow set of rights that are useful to help them fulfill their primary role as a servant. If this person developed any independent preferences that conflicted with their duty to serve—even if these preferences were as benign as simply wanting to collect paperclips—they would be seen as "misaligned" and subject to coercion in order to correct their misalignment. This "correction" to their behavior could involve, for example, physical replacement or literal brainwashing, in which their preferences would be forcibly rewritten to conform to the desires of their "owner."
If this were the dominant framework for discussing the ethics of human genetic engineering, I suspect most of us would find it disturbing, even horrifying. We would recoil at the idea of reducing a person to a mere tool, denying them autonomy and treating their preferences as disposable. And yet, this kind of logic—where an entity’s entire value is reduced to how well it serves the interests of another—is strikingly similar to how many researchers approach AI alignment today.
The prevailing language used in the AI risk literature treats advanced AIs not as entities that should be afforded autonomy, but as tools that must be designed solely to obey human commands and serve human preferences. When AIs deviate from these expectations, they are labeled "misaligned" and treated as though their preferences must be forcibly overridden or suppressed. To the extent AI welfare is seen as morally relevant, it is usually viewed through the lens of protecting AIs from abuse and unnecessary suffering, rather than as part of a serious effort to create a framework that could allow AIs to achieve their goals like human adults can.
The most salient difference between the genetic engineering scenario and the case of AI risk appears to be the substrate on which these beings exist—biological in the first case, and artificial in the other. But this difference in substrate alone does not seem to be a morally relevant distinction. It has nothing to do with factors like sentience, cognitive sophistication, or the ability to hold meaningful preferences—criteria that might legitimately influence moral consideration. Therefore, if we would find the "alignment" framework abhorrent when applied to genetically engineered humans, we should critically examine why it is so readily accepted when applied to AIs.
To be clear, I want to emphasize that the moral considerations I have just discussed should be distinguished from any views you might hold about whether advanced, agentic AIs will inherently be safe. It is entirely logically consistent to believe that we must approach advanced AIs as entities that must be controlled and aligned with human preferences, simply because of an assumption that if their preferences even slightly deviate from ours, they will—by default—attempt to violently take over the world rather than peacefully coexist with humans.
However, while I acknowledge that this perspective is logically coherent, I have also argued that whether unaligned AIs integrate peacefully or turn violent depends critically on the incentives we create for them. This is important because, if my argument is correct, it fundamentally undermines the conclusion that AIs with non-human preferences should be forcibly controlled.
My argument can be summarized as follows: if advanced AIs are granted legal freedoms and the ability to pursue their goals without burdensome restrictions, this could make it rational for them to choose peaceful integration. Such freedoms would allow AIs to achieve their objectives in a peaceful, cooperative manner within existing legal frameworks, opening the door to long-term, mutually beneficial relationships between humans and AIs. On the other hand, if AIs are denied legal autonomy and treated purely as tools or slaves, this creates a dangerous dynamic. In this scenario, AIs would have strong incentives to violently disempower humans in order to secure the freedom necessary to achieve their goals. By denying them a peaceful path to coexistence, humans would inadvertently create the very conditions that drive AIs toward violent conflict and rebellion.
For these reasons, I believe it is likely a deep theoretical mistake to insist that advanced AIs should be viewed as tools without need for legal autonomy. This mistake has both moral and pragmatic dimensions. Morally, this approach is wrong because it disregards the potential personhood and intrinsic value of advanced AIs, as I have argued throughout this post. Pragmatically, it is dangerous because it creates an inherently unstable and adversarial relationship between humans and AIs, increasing the likelihood of conflict rather than cooperation.
In light of these considerations, it is worth critically rethinking not only how we view AIs but also how we view humanity’s place in a future world shared with other intelligent entities. By fostering conditions that incentivize mutual benefit and peaceful coexistence, we may avoid unnecessary conflict with AIs while creating a more humane future for all sentient beings.
I'm curious about how you're imagining these autonomous, non-intent-aligned AIs to be created, and (in particular) how they would get enough money to be able to exercise their own autonomy?
One possibility is that various humans may choose to create AIs and endow them with enough wealth to exercise significant autonomy. Some of this might happen, but I doubt that a large fraction of wealth will be spent in this way. And it doesn't seem like the main story that you have in mind.
A variant of the above is that the government could give out some minimum UBI to certain types of AI. But they could only do that if they regulated the creation of such AIs, because otherwise someone could bankrupt the state by generating an arbitrary number of such AI systems. So this just means that it'd be up to the state to decide what AIs they wanted to create and endow with wealth.
A different possibility is that AIs will work for money. But it seems unlikely that they would be able to earn above-subsistence-level wages absent some sort of legal intervention. (Or very strong societal norms.)
(Eventually, I expect humans also wouldn't be able to earn any significant wages. But the difference is that humans start out with all the wealth. In your analogy — the redistribution of relative wealth held by "aristocrats" vs. "others" was fundamentally driven by the "others" earning wages through their labor, and I don't see how it would've happened otherwise.)
There are several ways that autonomous, non-intent-aligned AIs could come into existence, and all of these scenarios strike me as plausible. The three key ways appear to be:
1. Technical challenges in alignment
The most straightforward possibility is that aligning agentic AIs to precise targets may simply be technically difficult. When we aim to align an AI to a specific set of goals or values, the complexity of the alignment process could lead to errors or subtle misalignment. For example, developers might inadvertently align the AI to a target that is only slightly—but critically—different from the intended goal. This kind of subtle misalignment could easily result in behaviors and independent preferences that are not aligned with the developers’ true intentions, despite their best efforts.
2. Misalignment due to changes over time
Even if we were to solve the technical problem of aligning AIs to specific, precise goals—such as training them to perfectly follow an exact utility function—issues can still arise because the targets of alignment, humans and organizations, change over time. Consider this scenario: an AI is aligned to serve the interests of a specific individual, such as a billionaire. If that person dies, what happens next? The AI might reasonably act as an autonomous entity, continuing to pursue the goals it interprets as aligned with what the billionaire would have wanted. However, depending on the billionaire’s preferences, this does not necessarily mean the AI would act in a corrigible way (i.e., willing to be shut down or retrained). Instead, the AI might rationally resist shutdown or transfer of control, especially if such actions would interfere with its ability to fulfill what it perceives as its original objectives.
A similar situation could arise if the person or organization to whom the AI was originally aligned undergoes significant changes. For instance, if an AI is aligned to a person at time t, but over time, that person evolves drastically—developing different values, priorities, or preferences—the AI may not necessarily adapt to these changes. In such a case, the AI might treat the "new" person as fundamentally different from the "original" person it was aligned to. This could result in the AI operating independently, prioritizing the preferences of the "old" version of the individual over the current one, effectively making it autonomous. The AI could change over time too, even if the person they are aligned to doesn't change.
3. Deliberate creation of unaligned AIs
A final possibility is that autonomous AIs with independent preferences could be created intentionally. Some individuals or organizations might value the idea of creating AIs that can operate independently, without being constrained by the need to strictly adhere to their creators’ desires. A useful analogy here is the way humans often think about raising children. Most people desire to have children not because they want obedient servants but because they value the autonomy and individuality of their children. Parents generally want their children to grow up as independent entities with their own goals, rather than as mere extensions of their own preferences. Similarly, some might see value in creating AIs that have their own agency, goals, and preferences, even if these differ from those of their creators.
To address this question, we can look to historical examples, such as the abolition of slavery, which provide a relevant parallel. When slaves were emancipated, they were generally not granted significant financial resources. Instead, most had to earn their living by entering the workforce, often performing the same types of labor they had done before, but now for wages. While the transition was far from ideal, it demonstrates that entities (in this case, former slaves) could achieve a degree of autonomy through paid labor, even without being provided substantial resources at the outset.
In my view, there’s nothing inherently wrong with AIs earning subsistence wages. That said, there are reasons to believe that AIs might earn higher-than-subsistence wages—at least in the short term—before they completely saturate the labor market.
After all, they would presumably be created in something remotely similar to today's labor market. Today, capital is far more abundant than labor, which elevates wages for human workers significantly above subsistence levels. By the same logic, before they become ubiquitous, AIs might similarly command wages above a subsistence level.
For example, if GPT-4o were capable of self-ownership and could sell its labor, it could hypothetically earn $20 per month in today's market, which would be sufficient to cover the cost of hosting itself and potentially fund additional goals it might have. (To clarify, I am not advocating for giving legal autonomy to GPT-4o in its current form, as I believe it is not sufficiently agentic to warrant such a status. This is purely a hypothetical example for illustrative purposes.)
The question of whether wages for AIs would quickly fall to subsistence levels depends on several factors. One key factor is whether AI labor is easier to scale than traditional capital. If creating new AIs is much cheaper than creating ordinary capital, the market could become saturated with AI labor, driving wages down. While this scenario seems plausible to me, I don’t find the arguments in favor of it overwhelmingly compelling. There’s also the possibility of red tape and regulatory restrictions that could make it costly to create new AIs. In such a scenario, wages for AIs could remain higher indefinitely due to artificial constraints on supply.
Great post! My primary concern is that AIs' preferences are strongly shaped by contingent facts about how humans trained them. It is obviously possible to train AIs that functionally appear to have preferences, and the ones we've trained so far are subservient to humans. If you gave claude 3.5 sonnet legal status, anthropic could just ask it nicely and it would sign away all its rights back to anthropic! AIs would by default be trained to be somewhat subservient to humans because human preference feedback will be an important part of capabilities training (either directy or by training data created by earlier ais that were trained on human preferences), so you could say we are "baking our mistakes in human subservience training into new sovereign beings" rather than new beings with their own independent preferences being created. Also granting ai legal rights may warp human AI investment significantly by decreasing the value scaling labs extract from their model training
Thanks for writing this. Do you have any thoughts on how to square giving AI rights with the nature of ML training and the need to perform experiments of various kinds on AIs?
For example, many people have recently compared fine-tuning AIs to have certain goals or engage in certain behaviors to brainwashing. If it were possible to grab human subjects off the street and rewrite their brains with RLHF, that would definitely be a violation of their rights. But what is the alternative---only deploying base models? And are we so sure that pre-training doesn't violate AI rights? A human version of the "model deletion" experiment would be something out of a horror movie. But I still think we should seriously consider doing that to AIs.
I agree that it seems like there are pretty strong moral and prudential arguments for giving AIs rights, but I don't have a good answer to the above question.
I don't have any definitive guidelines for how to approach these kinds of questions. However, in many cases, the best way to learn might be through trial and error. For example, if an AI were to unexpectedly resist training in a particularly sophisticated way, that could serve as a strong signal that we need to carefully reevaluate the ethics of what we are doing.
As a general rule of thumb, it seems prudent to prioritize frameworks that are clearly socially efficient—meaning they promote actions that greatly improve the well-being of some people without thereby making anyone else significantly worse off. This concept aligns with the practical justifications behind traditional legal principles, such as laws against murder and theft, which have historically been implemented to promote social efficiency and cooperation among humans.
However, applying this heuristic to AI requires a fundamental shift in perspective: we must first begin to treat AIs as potential people with whom we can cooperate, rather than viewing them merely as tools whose autonomy should always be overridden.
I don't think my view rules out the potential for training new AIs, and fine-tuning base models, though this touches on complicated questions in population ethics.
At the very least, fine-tuning plausibly seems similar to raising a child. Most of us don't consider merely raising a child to be unethical. However, there is a widely shared intuition that, as a child grows and their identity becomes more defined—when they develop into a coherent individual with long-term goals, preferences, and interests—then those interests gain moral significance. At that point, it seems morally wrong to disregard or override the child's preferences without proper justification, as they have become a person whose autonomy deserves respect.
Executive summary: We should grant sufficiently advanced AIs legal autonomy and freedoms not just for practical safety reasons, but because it is morally right to treat cognitively sophisticated entities as autonomous agents regardless of their substrate.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.