My first post is here.
Epistemic status: Confident in the ultimate conclusion but unconfident in my reasoning.
An interesting article from Vox that argues that AI ethics and AI Alignment could do better by simply cooperating with each other. I think the arguments are false but lead to a true conclusion in this case. Short version: The AI Alignment and AI ethics fields are solving vastly different problems, so much so that they have little in common. Now I'll excerpt it:
There are teams of researchers in academia and at major AI labs these days working on the problem of AI ethics, or the moral concerns raised by AI systems. These efforts tend to be especially focused on data privacy concerns and on what is known as AI bias — AI systems that, using training data with bias often built in, produce racist or sexist results, such as refusing women credit card limits they’d grant a man with identical qualifications.
There are also teams of researchers in academia and at some (though fewer) AI labs that are working on the problem of AI alignment. This is the risk that, as our AI systems become more powerful, our oversight methods and training approaches will be more and more meaningless for the task of getting them to do what we actually want. Ultimately, we’ll have handed humanity’s future over to systems with goals and priorities we don’t understand and can no longer influence.
There's also an inner alignment problem, in that once we spin up mesa-optimization, we also have to make sure it's aligned as well. Now yeah if we had much better interpretability tools, I'd be far more optimistic on AI Alignment, but as it stands, AI Alignment has another difference from AI ethics.
A difference between AI ethics and AI Alignment
And finally this is where a massive difference between AI Alignment and AI ethics reveals itself.
But first, an excerpt:
And in some ways, AI alignment is just the problem of AI bias writ (terrifyingly) large: We are assigning more societal decision-making power to systems that we don’t fully understand and can’t always audit, and that lawmakers don’t know nearly well enough to effectively regulate.
As impressive as modern artificial intelligence can seem, right now those AI systems are, in a sense, “stupid.” They tend to have very narrow scope and limited computing power. To the extent they can cause harm, they mostly do so either by replicating the harms in the data sets used to train them or through deliberate misuse by bad actors.
But AI won’t stay stupid forever, because lots of people are working diligently to make it as smart as possible.
Part of what makes current AI systems limited in the dangers they pose is that they don’t have a good model of the world. Yet teams are working to train models that do have a good understanding of the world. The other reason current systems are limited is that they aren’t integrated with the levers of power in our world — but other teams are trying very hard to build AI-powered drones, bombs, factories, and precision manufacturing tools.
The first paragraph is exactly wrong, specifically there's a very large difference between AI ethics and AI Alignment: boundedness vs unboundedness.
Let's unpack that: AI ethics focuses on biased models not being fair to discriminated groups, but the problem is implicitly bounded, in that things like say existential catastrophe due to AI are effectively out of scope. AI Alignment has to deal with unbounded problems, where nearly any action can be taken by an AI system that doesn't interact with discrimination.
That's just a taste of the difference is between AI Alignment and AI ethics.
I absolutely agree though that AI Alignment and AI ethics would benefit from a healthier relationship or trying to morally trade/value handshake, even as it acknowledges that the problems of one area aren't likely to carry over to another (Except people caring about ethics or safety.) For example, the fact that Timnit Gebru was fired is very worrying for alignment people, because it implies that even for socially embedded causes like racism won't stop Google or Deepmind. Trying to communicate weirder causes like AI Alignment and take action on it is even harder than that.
Why does the field of AI Alignment and AI ethics fight with each other?
We'll we've got an excerpt from Kelsey Piper herself:
The AI ethics/AI alignment battle doesn’t have to exist. After all, climate researchers studying the present-day effects of warming don’t tend to bitterly condemn climate researchers studying long-term effects, and researchers working on projecting the worst-case scenarios don’t tend to claim that anyone working on heat waves today is wasting time.
You could easily imagine a world where the AI field was similar — and much healthier for it.
Why isn’t that the world we’re in?
I actually want to add on to Kelsey Piper's theory and say why the divide exists in AI and not cilmate, and I suspect the following reasons exist:
-
Lack of clear solutions combined with unbounded outcomes is a massive issue. At the end of the day climate change both have solutions, and the Inflation Reduction Act passed that would cut US emissions by 40% from this source https://www.vox.com/policy-and-politics/2022/7/28/23282217/climate-bill-health-care-drugs-inflation-reduction-act so there's some implementation of known solutions. Neither is the case for AI Alignment at all, and AI ethics solves bounded problems.
-
Doomed-by-default vs. Doomed-at-the-tails. Specifically, there's a large chance that, conditioned on AI being built, the default outcome is existential catastrophe based on a seemly-random goal by mesa-optimizers, demonstrating inner alignment failure. I'm somewhat more optimistic that good goals (corresponding to outer alignment) will be chosen, and the tricky part is not having those initial goals derailed by mesa-optimizers. In contrast, Climate Change is only really an existential disaster at the tails. It also nicely comes on much slower than any AI takeoff will happen, and even slow takeoff is more like a decade than Climate Change's centuries to get to an existential disaster.
-
It's very likely that a narrow line exists between no fire alarm for the public and politicization, and despite the no fire alarm scenario being bad, politicization is even worse. AI being politicized in particular is a much more worrying scenario, since barring the "Just don't develop AGI" scenario this century, which is possible, but not the default, any delaying of alignment solutions are disastrous.
-
AI ethics people due to thinking on relatively bounded terms view the question of how best to solve current problems like discrimination, and generally think human nature will remain the same, while AI Alignment people, due to thinking about unbounded problems, believe that if we could get an AGI to replicate the systemic racism of the US, we'd be in the top 0.01% of outcomes, depressingly enough. (They also place more weight on the idea that human nature will probably change vastly, thus making discrimination less relevant, though not absent. See the end for why.)
-
AI Alignment is much weirder than AI ethics due to 1, thus by default it's very slow-going to communicate this to other people even if 3 didn't hold.
Why both fields should be nice to each other anyway
But the conclusion does hold: AI Alignment and AI ethics people do need to at least try to morally trade, or do a value handshake with each other, as while there are real differences, this shouldn't lead to constant fighting between each other.
Notes
There's an impossibility result here that roughly states there is an irreconcilable conflict between the systemic or group fairness and individual fairness, which is quite pessimistic. Here's the PDF: https://arxiv.org/abs/1609.07236