Criticism of the main framework in AI alignment

Michele Campolo

Comments 9

Sorted by

New & upvoted

[anonymous]

Tl;dr As far as you know, you're the only person in the world directly working on how to build AI that's capable of making moral progress i.e. thinking critically about goals as humans do.

(I find this pretty surprising and worrying so wanted to highlight.)

Michele Campolo

Maybe "only person in the world" is a bit excessive :)

As far as I know, no one else in AI safety is directly working on it. There is some research in the field of machine ethics, about Artificial Moral Agents, that has a similar motivation or objective. My guess is that, overall, very few people are working on this.

[anonymous]

I dunno, I still think my summary works. (To be clear, I wasn't trying to be like, "You must be exaggerating, tsk tsk," - I think you're being honest and for me it's the most important part of your post so I wanted to draw attention to it.)

Michele Campolo

Thank you!

Tristan Katz

12mo

Another very late comment here :)

So as with your post on 'Free Agents', I believe that thinking about this is important, because it presents a potential way to align AI if we ourselves are unsure about the values that align AI.

But I'm not sure I'm convinced by the main reason given in this post: that if AI is controllable, bad agents will be able to use it malevolently. The goal of alignment research is usually to align AI with the values or goals of the designer, and not anyone who uses it. LLMs today already refuse to do many things you might want them to do. So if technical research is successful, I expect it would be hard for malevolent actors to use the same AI in bad ways.

Maybe it won't be impossible - but what seems more likely is that they would simply use the most advanced AI research to build their own AI for malevolent purposes, allowing them to pursue their goals with far more ease. And if they were to do that, then they would simply not train it to do its own moral reasoning. Which leads to another premise of the 'main framework' that you've missed - most people working in alignment (in my own experience) assume that once someone 'wins' the AGI/ASI race, they will be able to use that AI to control or prevent the development of other potentially dangerous AIs. For that reason, it may be sufficient for the AI to act according to the values of its developers (assuming they have good values!), rather than carrying out its own moral reasoning.

To be clear, I don't think it's likely that AI developers will be able to identify the exactly right values to align AGI to. But that's a different concern to what you've expressed here. So I do think that developing AI moral reasoning might be valuable, but I'm not convinced that it's valuable in order to prevent malevolent actors using AI.

Michele Campolo

12mo*

I hadn't considered the narrative you bring up here when I wrote the post, that is interesting. As you write, it relies on the assumption that

once someone 'wins' the AGI/ASI race, they will be able to use that AI to control or prevent the development of other potentially dangerous AIs

Here we are entering the realm of forecasting stuff about world politics — stuff I am definitely not an expert on. As far as I know, the probability of that scenario could be extremely low. I can also think of alternative scenarios that don't seem obviously absurd, so I doubt that the probability is extremely high, but it's hard for me to say much more than that. Anyway, as you said, AI moral reasoning might be valuable in that scenario as well.

but I'm not convinced that it's valuable in order to prevent malevolent actors using AI.

That's a bit too much, I don't think I claimed that moral reasoning in AI can directly prevent that. It seems that in order to prevent malevolent actors from using AI for bad purposes we would have to either stop AI research completely, because it is not only alignment research that works on the control problem but also standard AI research; or ensure that bad actors never get access to powerful and controllable AI, which also seems hard to do and not something AI moral reasoning can help with.

The weaker claim I made in the post is that research on moral reasoning in AI is less likely to help malevolent actors use AI for bad purposes (and/or help them to a lesser degree) wrt research that aims to make AI controllable.

Tristan Katz

12mo

Regarding your last point: I see. I thought this was an argument for "alignment via moral reasoning as an addition to alignment via control", not "alignment via moral reasoning instead of alignment via control." So you would hope that alignment via moral reasoning would displace or replace alignment via control.

In that case, your argument is plausible but... quite hopeful? I'm sure many people will pursue control methods regardless. I suppose you might argue that, if enough people buy your argument, then research on AI that is merely controlled will advance more slowly, and research on AI that does its own moral reasoning, and is therefore harder to misuse, would advance faster or at least in parallel. Then I would accept that this might reduce the chance of malevolent misuse, but that's quite a hopeful scenario! In less hopeful scenarios, I am unsure if people concerned with malevolent misuse ought to pursue this kind of work, or if they wouldn't be better off simply advocating for a pause/slow down.

Michele Campolo

12mo

In short, I am not hoping for a specific outcome, and I can't take into account every single scenario. If someone starts giving more credit to research on moral reasoning in AI after reading this, that's already enough, considering that the topic doesn't seem to be popular within AI alignment, and it was even more niche at the time I wrote this post.

Tristan Katz

12mo

Sure! And like I said, I do think this is valuable: it just seems more obviously valuable as a way to ensure the best outcomes (aligned AI), rather than as a means to avoid the worst outcomes.

Comments

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·5d ago·Curated 1d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

150

Let's taboo the V-word

lincolnq·5d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·2d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...