Racing through a minefield: the AI deployment problem

Holden Karnofsky

Racing through a minefield: the AI deployment problem

Comments 1

Sorted by

New & upvoted

Wonder if there might be some avenue of leading groups holding equity stakes in each other as an angle of aligning incentives. Imperfect analogy is in the auto industry, for example how Toyota/ Subaru and others hold equity in each other and share best practices in safety/hybrid tech. https://www.reuters.com/article/us-toyota-subaru/toyota-strengthens-japan-partnerships-with-bigger-subaru-stake-idUSKBN1WC04E

Comments

More from the author

135

Responsible Scaling Policy v3

Holden Karnofsky·4mo ago·43m read

644

Some comments on recent FTX-related events

Holden Karnofsky·3y ago·5m read

523

EA is about maximization, and maximization is perilous

Holden Karnofsky·3y ago·8m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 14h ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

155

The first video from Giving What We Can's new channel is out now!

JustinPortela·2d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

153

Maybe do the thing you wish CEA would do

alejoacelas 🔸·6d ago·2m read

I used AI to fix transcription errors, rerrarange the ideas, and suggest tweaks to the title and some sentences. Three of the most exciting projects to come out of EA in recent years are, in a vague sense, CEA spinouts: * Kairos is directly a spinout of CEA and now handles most support for university AI safety groups. Basically everyone I've found who knows them is really excited about what they do * NEST is an opinionated ideas-fi...

Recent opportunities to take action

Find funding, fast

Austin·1d ago·3m read

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·3d ago·2m read

173

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·2w ago·4m read

Generally, or at least, this is what I’d like it to refer to. ↩
Thanks to beta reader Ted Sanders for suggesting this analogy in place of the older one, “removing mines from the minefield.” ↩
One genre of testing that might be interesting: manipulating an AI system’s “digital brain” in order to simulate circumstances in which it has an opportunity to take over the world, and seeing whether it does so. This could be a way of dealing with the King Lear problem. More here. ↩
Modern AI systems tend to be trained with lots of trial-and-error. The actual code that is used to train them might be fairly simple and not very valuable on its own; but an expensive training process then generates a set of “weights” which are ~all one needs to make a fully functioning, relatively cheap copy of the AI system. ↩
I mean, this is part of the challenge. In theory, you should deploy an AI system if the risks of not doing so are greater than the risks of doing so. That’s going to depend on hard-to-assess information about how safe your system is and how dangerous and imminent others’ are, and it’s going to be easy to be biased in favor of “My systems are safer than others’; I should go for it.” Seems hard. ↩

Racing through a minefield: the AI deployment problem

Racing through a minefield: the AI deployment problem

The basic premises of “racing through a minefield”

What success looks like

Alignment (charting a safe path through the minefield²)

Threat assessment (alerting others about the mines)

Avoiding races (to move more cautiously through the minefield)

Selective information sharing - including security (so the incautious don’t catch up)

Global monitoring (noticing people about to step on mines, and stopping them)

Defensive deployment (staying ahead in the race)

So?

Footnotes

Racing through a minefield: the AI deployment problem

Racing through a minefield: the AI deployment problem

The basic premises of “racing through a minefield”

What success looks like

Alignment (charting a safe path through the minefield2)

Threat assessment (alerting others about the mines)

Avoiding races (to move more cautiously through the minefield)

Selective information sharing - including security (so the incautious don’t catch up)

Global monitoring (noticing people about to step on mines, and stopping them)

Defensive deployment (staying ahead in the race)

So?

Footnotes

Alignment (charting a safe path through the minefield²)