What do you mean with ‘alignment is solvable in principle’?

Remmelt

What do you mean with ‘alignment is solvable in principle’?

Remmelt

1 min read · Jan 17, 2025

Comments 1

Sorted by

New & upvoted

Remmelt

Here's how I specify terms in the claim:

AGI is a set of artificial components, connected physically and/or by information signals over time, to in aggregate sense and act autonomously over many domains.
- 'artificial' as configured out of a (hard) substrate that can be standardised to process inputs into outputs consistently (vs. what our organic parts can do).
- 'autonomously' as continuing to operate without needing humans (or any other species that share a common ancestor with humans).
Alignment is at the minimum the control of the AGI's components (as modified over time) to not (with probability above some guaranteeable high floor) propagate effects that cause the extinction of humans.
Control is the implementation of (a) feedback loop(s) through which the AGI's effects are detected, modelled, simulated, compared to a reference, and corrected.

Comments

More from the author

Anthropic's leading researchers acted as moderate accelerationists

Remmelt·10mo ago·50m read

Our bet on whether the AI market will crash

Remmelt, Marcus Abramovitch 🔸·1y ago·2m read

This might be the last AI Safety Camp

Remmelt, Linda Linsefors·2y ago·1m read

Curated and popular this week

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 4d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

130

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·6d ago·4m read

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·13h ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Recent opportunities to take action

$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects

Matt Brooks·1d ago·3m read

130

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·6d ago·4m read

How to Lobby Against the Save Our Bacon Act

minthin·5h ago·1m read

^{^}

Some analogies I've seen a few times (rough paraphrases):

‘humans are generally intelligent too, and humans can align with humans’
'LLMs appear to do a lot of what we want them to do, so AGI could too'
‘other impossible-seeming engineering problems got solved too’

^{^}

E.g. what does ‘in principle’ mean? Does it assert that the problem described is solvable based on certain principles, or some model of how the world works?