Defining alignment research

richard_ngo

Defining alignment research

richard_ngo

9 min readAug 19, 2024

Comments 1

Sorted by

New & upvoted

SummaryBot

Executive summary: The distinction between "alignment research" and "capabilities research" is problematic, and should be replaced with a focus on worst-case scenarios and cognitive understanding of AI systems.

Key points:

Categorizing research as "alignment" or "capabilities" based on impacts is difficult due to unpredictable effects and disagreements about threat models.
Most valuable alignment research should focus on worst-case scenarios rather than average performance.
A scientific, cognitivist approach to understanding AI systems is more useful for alignment than a behaviorist one.
The author proposes a two-dimensional categorization of AI research based on focus (average-case to worst-case) and approach (engineering to cognitivist science).
"Alignment research" should refer to work closer to worst-case, cognitivist science, while "capabilities research" refers to average-case engineering.
This framework may evolve as the field progresses towards a unified science of artificial and biological cognition.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Comments

More from the author

My 2025 donations (so far)

richard_ngo·7mo ago·4m read

110

Third-wave AI safety needs sociopolitical thinking

richard_ngo·1y ago·31m read

214

AGI safety career advice

richard_ngo·3y ago·Curated 3y ago·15m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·4d ago·Curated 19h ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

151

Let's taboo the V-word

lincolnq·4d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·1d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

EA Organisation Updates thread: July 2026

Dane Valerie·3d ago·1m read

Help us launch AI safety university groups by referring potential founders

Jason Chin🔸·9h ago·4m read

Save the date: Swiss AI Safety Days 2026 (7-8 November, ETH Zurich)

Andre Santos 🔸, patrickwidmann, mariuswenk·11h ago·1m read

	Average-case	Pessimistic-case	Worst-case
Engineering	Scaling	RLHF	Adversarial robustness
Behaviorist science	Optimization science	Scalable oversight	AI control
Cognitivist science	Concept-based interpretability	Mechanistic interpretability	Agent foundations

Defining alignment research

Defining alignment research

“Alignment” and “capabilities” are primarily properties of AIs not of AI research

What types of research are valuable for preventing misalignment?

Valuable property 1: worst-case focus

Valuable property 2: scientific approach

A better definition of alignment research