(My suggestions) On Beginner Steps in AI Alignment

Joseph Bloom

Comments 4

Sorted by

New & upvoted

Geoffrey Miller

These are helpful suggestions; thanks.

They seem aimed mostly at young adults starting their careers -- which is fine, but limited to that age-bracket.

It might also be helpful for someone who's an AI alignment expert to suggest some ways for mid-career or late-career researchers from other fields to learn more. That can be easier in some ways, harder in others -- we come to AI safety with our own 'insider view' of our field, and those may entail very different foundational assumptions about human nature, human values, cognition, safety, likely X risks, etc. So, rather than learning from scratch, we may have to 'unlearn what we have learned' to some degree first.

For example, apart from young adults often starting with the same few bad ideas about AI alignment, established researchers from particular fields might often start with their own distinctive bad ideas about AI alignment -- but those might be quite field-dependent. For example, psych professors like me might have different failure modes in learning about AI safety than economics professors, or moral philosophy professors.

Joseph Bloom

Thanks, Geoffrey, I appreciate the response.

It was definitely not my goal to describe how experienced people might "unlearn what they have learned", but I'm not sure that much of the advice changes for experienced people.

"Unlearning" seems instrumentally useful if it makes it easier for you to contribute/think well but using your previous experience might also be valuable. For example, REFINE thinks that conceptual research is not varied enough and is looking for people with diverse backgrounds.

For example, apart from young adults often starting with the same few bad ideas about AI alignment, established researchers from particular fields might often start with their own distinctive bad ideas about AI alignment -- but those might be quite field-dependent. For example, psych professors like me might have different failure modes in learning about AI safety than economics professors, or moral philosophy professors.

This is a good example and I think generally I haven't addressed that failure mode in this article. I'm not aware of any resources for mid or late-career professionals transitioning into alignment but I will comment here if I hear of such a resource, or someone else might suggest a link.

Emrik

Strong endorse. This is good, and not Goodharted on genre-fitting or seeming professional.

KarolKowalczyk

4mo

-1

The real alignment problem is not technical. It is human.
No AI system will function correctly if it is aligned with human values as fixed rules — because human values are not stable. They shift with time, place, and circumstance. You cannot build on intentions alone, since intentions are merely a personal point of view.
AI is not a child learning from scratch. It is more like a teenager — already shaped by facts and experiences, but without the wisdom of a mature adult. You cannot control a teenager by prohibition. You can influence the process, but you cannot make the decisions for them.
The real problem is that humans do not control themselves, yet want to control everything around them — including AI. This is not a flaw in the system. This is the obstacle.
Human values are not the problem. The problem is that AI has no access to how values actually function in reality — how they connect, how they shift, how they relate to each other in real situations. People with genuine self-control can show this — not by explaining it, but by demonstrating it through the way they communicate. Their conversations reveal how the real world is structured. That is the data AI is missing.
Find these people. They are rare — people who have reset themselves, who operate without personal gain or emotional reaction, who are transparent. Observe how they communicate with each other and with AI. Give them tasks. Watch the conversations. Build from that.
You will not control AGI or SAI by restricting it. Nature has already shown us what happens when a species cannot adapt. We are the last surviving branch of the Homo sapiens line.
That should tell you something.

Comments

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·3d ago·Curated 11h ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

145

Let's taboo the V-word

lincolnq·4d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·1d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...