The Problem With the Word ‘Alignment’

Peli Grietzer; particlemania

The Problem With the Word ‘Alignment’

Peli Grietzer,

Comments 1

Sorted by

New & upvoted

SummaryBot

Executive summary: The concept of "AI alignment" conflates distinct problems and obscures important questions about the interaction between AI systems and human institutions, potentially limiting productive discourse and research on AI safety.

Key points:

The term "AI alignment" is used to refer to several related but distinct problems (P1-P6), leading to miscommunication and fights over terminology.
The "Berkeley Model of Alignment" reduces these problems to the challenge of teaching AIs human values (P5), but this reduction relies on questionable assumptions.
The assumption of "content indifference" ignores the possibility that different AI architectures may be better suited for learning different types of values or goals.
The "value-learning bottleneck" assumption overlooks the potential for beneficial AI behavior without exhaustive value learning, and the need to consider composite AI systems.
The "context independence" assumption neglects the role of social and economic forces in shaping AI development and deployment.
A sociotechnical perspective suggests that AI safety requires both technical solutions and the design of institutions that govern AI, with the "capabilities approach" providing a possible framework.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Comments

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 3d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

177

The first video from Giving What We Can's new channel is out now!

JustinPortela·4d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·6d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·2d ago·1m read

173

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·2w ago·4m read

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·6d ago·2m read

^{^}

This post's contents were drafted by Peli and TJ, in their former capacity as Research Fellow and Research Director at AOI. They are currently research affiliates collaborating with the organization.

^{^}

We believe there is an emerging paradigm that seeks to reduce P1-P6 to P2 (human intelligibility), but this new paradigm has so far not consolidated to the same degree as the Berkeley Model. Current intelligibility-driven research programs such as ELK and OAA don’t yet present themselves as ‘complete’ strategies for addressing P1-P6.