A note about differential technological development

So8res

Comments 8

Sorted by

New & upvoted

Another problem with the differential development argument is that even if you buy that “alignment can be solved”, it’s not like it’s a vaccine you can apply to all AI so it all suddenly turns beneficial. Other people, companies, nations will surely continue to train and deploy AI models, and why would they all apply your alignment principles or tools?

I heard two arguments in response to this concern: that (1) the first aligned AGI will then kill off all other forms of AGI and make all AI related problems go away and that (2) there are more good people than bad people in the world so once techniques for alignment become available, everyone will naturally adopt them. Both of these seem like fairy tales to me.

In other words the premise that any amount of AI capabilities research is OK so long as we “solve alignment” has serious issues, and you don’t even have to believe in AGI for this to bother you.

[anonymous]

Re 1) this relates to the strategy stealing assumption: your aligned AI can use whatever strategy unaligned AIs use to maintain and grow their power. Killing the competition is one strategy but there are many others including defensive actions and earning money / resources.

Edit: I implicitly said that it's okay to have unaligned AIs as long as you have enough aligned ones around. For example we may not need aligned companies if we have (minimally) aligned government+law enforcement.

Lauro Langosco

I don't think the strategy-stealing assumption holds here: it's pretty unlikely that we'll build a fully aligned 'sovereign' AGI even if we solve alignment; it seems easier to make something corrigible / limited instead, ie something that is by design less powerful than would be possible if we were just pushing capabilities.

[anonymous]

I don't mean to imply that we'll build a sovereign AI (I doubt it too).

Corrigible is more what I meant. Corrigible but not necessarily limited. Ie minimally intent aligned AIs which won't kill you but by the strategy stealing assumption can still compete with unaligned AIs.

David Krueger

I'm curious to dig into this a bit more, and hear why you think these seem like fairy tales to you (I'm not saying that I disagree...).

I wonder if this comes down to different ideas of what "solve alignment" means (I see you put it in quotes...)

1) Are you perhaps thinking that realistic "solutions to alignment" will carry a significant alignment tax? Else why wouldn't ~everyone adopt alignment techniques (that align AI systems with their preferences/values)?

2) Another source of ambiguity: there are a lot of different things people mean by "alignment", including:
* AI is aligned with objectively correct values
* AI is aligned with a stakeholder and consistently pursues their interests
* AI does a particular task as intended/expected
Is one of these in particular (or something else) that you have in mind here?

[anonymous]

I agree that it's not trivial to assume everyone will use aligned AI.

Let's suppose the goal of alignment research is to make aligned AI equally easy/cheap to build as unaligned AI. I. e. no addition cost. If we then suppose aligned AI also has a nonzero benefit, people are incentivized to use it.

The above seems to be the perspective in this alignment research overview https://www.effectivealtruism.org/articles/paul-christiano-current-work-in-ai-alignment.

More ink could be spilled on whether aligning AI has a nonzero commercial benefit. I feel that efforts like prompting and Instruct GPT are suggestive. But this may not apply to all alignment efforts.

InquilineKea

How does one translate mathematical/high-level agenty-foundations guidelines into code/instructions that an RL agent (or any AI agent, including a scaling laws one) can follow?

Karthik Tadepalli

Bandwagoning onto this sensible post, another problem with this argument is that differential technological development is very fuzzy to reason about, since most of the mechanisms by which it could advance alignment are things that haven't happened yet. This means it's possible to reach any conclusion ("this work is good on net", "this work is bad on net") and motivated reasoning will make people want to reach the conclusion that the work they are doing is good on net. It's a classic case of suspicious and surprising convergence.

Comments

More from the author

323

A personal reflection on SBF

So8res·3y ago·23m read

356

On Caring

So8res·11y ago·12m read

115

Comments on OpenAI's "Planning for AGI and beyond"

So8res·3y ago·15m read

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·1w ago·Curated 2d ago·22m read

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·4d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Maybe do the thing you wish CEA would do

alejoacelas 🔸·1d ago·2m read

I used AI to fix transcription errors, rerrarange the ideas, and suggest tweaks to the title and some sentences. Three of the most exciting projects to come out of EA in recent years are, in a vague sense, CEA spinouts: * Kairos is directly a spinout of CEA and now handles most support for university AI safety groups. Basically everyone I've found who knows them is really excited about what they do * NEST is an opinionated ideas-fi...

Recent opportunities to take action

RP is looking for project founders in neglected animal areas

Rethink Priorities·2d ago·7m read

Announcing the Safe Pareto Improvements (SPI) Fundamentals Program

Center on Long-Term Risk, Anthony DiGiovanni 🔸, Santeri T 🔹·1d ago·3m read

158

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·1w ago·4m read

^{^}

On the other hand, weirder research is more likely to shorten timelines a lot, if it shortens them at all. More mainstream research progress is less likely to have a large counterfactual impact, because it’s more likely that someone else has the same idea a few months or years later.

“Low probability of shortening timelines a lot” and “higher probability of shortening timelines a smaller amount” both matter here, so I advocate that both niche and mainstream researchers be cautious and deliberate about publishing potentially timelines-shortening work.

^{^}

"Decades" would require timelines to be longer than my median. But when I condition on success, I do expect we have more time.