AI as a science, and three obstacles to alignment strategies

So8res

AI as a science, and three obstacles to alignment strategies

So8res

13 min read · Oct 25, 2023

Comments 1

Sorted by

New & upvoted

SummaryBot

Executive summary: The post argues that AI alignment is difficult due to the intertwining of alignment and capabilities research, the difficulty of distinguishing real vs. fake solutions, and the high risk of theories failing on their first critical application.

Key points:

Efforts to understand and aim AI systems often also uncover ways to make them more capable, allowing unchecked development to continue.
Distinguishing scientifically-grounded alignment solutions from insufficient ones will be very difficult for regulators.
Even rigorous alignment theories may fail catastrophically the first time they are tested in a real superintelligence.
Recommends civilization pursue non-AI routes to transhumanism like uploading instead.
Developing scientific theories of artificial cognition is important but risks accelerating progress.
Minimal pivotal tasks are worth considering to contain failures of new alignment theories.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Comments

More from the author

323

A personal reflection on SBF

So8res·3y ago·23m read

356

On Caring

So8res·11y ago·12m read

115

Comments on OpenAI's "Planning for AGI and beyond"

So8res·3y ago·15m read

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·6d ago·Curated 1d ago·22m read

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 6d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·2d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Recent opportunities to take action

RP is looking for project founders in neglected animal areas

Rethink Priorities·18h ago·7m read

Time Sensitive Do Gooding Opportunities

Bentham's Bulldog·19h ago·5m read

146

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·1w ago·4m read

^{^}

Albeit slightly less, since there’s nonzero prior probability on this unknown system turning out to be simple, elegant, and well-designed.

^{^}

An exception to this guess happens if the AI is at the point where it’s correcting its own flaws and improving its own architecture, in which case, in principle, you might not see much room for capabilities improvements if you took a snapshot and comprehended its inner workings, despite still being able to see that the ends it pursues are not the ones you wanted. But in that scenario, you’re already about to die to the self-improving AI, or so I predict.

^{^}

Not least because there are no sufficiently clear signs that it’s time to stop — we blew right past “an AI claims it is sentient”, for example. And I’m not saying that it was a mistake to doubt AI systems’ first claims to be sentient — I doubt that Bing had the kind of personhood that’s morally important (though I am by no means confident!). I’m saying that the thresholds that are clear in science fiction stories turn out to be messy in practice and so everyone just keeps plowing on ahead.

AI as a science, and three obstacles to alignment strategies

AI as a science, and three obstacles to alignment strategies

Background

1. Alignment and capabilities are likely intertwined

2. Distinguishing real solutions from fake ones is hard

3. Most theories don’t work on the first real try