The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

Marc Carauleanu; Judd R; Trent Hodgeson; Cameron B

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

Marc Carauleanu,

Comments

Sorted by

New & upvoted

No comments on this post yet.

Be the first to respond.

Comments

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·5d ago·Curated 5h ago·22m read

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 6d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·2d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Recent opportunities to take action

146

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·1w ago·4m read

You Should Come to The AI Protest

Ronak Mehta·18h ago·5m read

$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects

Matt Brooks·2d ago·3m read

^{^}

Miscellaneous cool accomplishment: before we started getting involved in AI safety in any serious way, two AE engineers with no prior background in alignment developed a framework for studying prompt injection attacks that went on to win Best Paper at the 2022 NeurIPS ML Safety Workshop.

^{^}

To illustrate this point more precisely, we can consider a highly simplified probabilistic model of the research space. (We recognize this sort of neglect math is likely highly familiar to many EAs, and we don't mean to be pedantic by including it; we've put it here because we think it is a succinct way of demonstrating—if only to ourselves—why taking on multiple neglected approaches is rational.)

Let’s say the total number of plausible alignment agendas is $n$ . Let’s stipulate that currently, alignment researchers have meaningfully explored $k$ approaches, meaning that $n - k$ approaches remain unexplored. (As stated previously, we suspect that current mainstream alignment research is likely exploiting only a small subset of the total space of plausible alignment approaches, rendering a large number of alignment strategies either completely or mostly unexplored—i.e., we think that $n - k$ is large.) Each neglected approach, $i$ , has a very small but nonzero probability $p_{{neglect}_{i}}$ of being crucial for making significant progress in alignment. Treating these probabilities as independent for the sake of simplicity, the chance that all $n - k$ neglected approaches are not key is $\prod_{i = 1}^{n - k} (1 - p_{{neglect}_{i}})$ . Conversely, the probability that at least one neglected approach is key is $1 - \prod_{i = 1}^{n - k} (1 - p_{{neglect}_{i}})$ . This implies—at least in our simplified model—that even with low individual probabilities, a sufficiently large number of neglected approaches can collectively hold a high chance of including a crucial solution in expectation. For instance, in a world with 100 neglected approaches and a probability of 99% that each approach is not key (i.e., a 1% likelihood of pushing the needle on alignment), there’s still about a 63% chance that one of these approaches would be crucial; with 1000 approaches and a probability of 99% that each approach is not key, the probability rises to over 99% that one will be pivotal. This simple model motivates us to think it makes sense to take many shots on goal, pursuing as many plausible neglected alignment agendas as possible.

^{^}

Please note: (1) we are primarily interested in aggregating the best ideas to begin, so don’t worry if you have an idea that you think fits the criteria above but is challenging to implement/you wouldn’t want to actually implement it. (2) There is space on the form to denote that your suggested approach is exfohazardous.

^{^}

This is a core trade-off in our work and something that we have made substantial progress on since our founding.

^{^}

We want to call out that this approach is likely the least neglected of the ten we enumerate here—which is not to say it isn’t neglected in an absolute sense.

^{^}

While there are a good number of newer organizations working on fieldbuilding for alignment, we think it remains highly neglected given the potential impact, especially in likely-impactful fields that are only now starting to be considered within the Overton window.

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

TL;DR

About us

Why and how we think we can help solve alignment

We can probably do with alignment what we already did with BCI

Many shots on goal with neglected approaches

…but what are these neglected approaches?

Your neglected approach ideas

Our neglected approach ideas

We want to make these ideas stronger

Concluding thoughts