Thoughts on the AI Safety Summit company policy requests and responses

So8res

Comments 3

Sorted by

New & upvoted

vaniver

(I'm Matthew Gray)

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

My sense from reading Inflection's response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don't seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:

Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.

Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.

I think AIs thinking specifically about human psychology--and how to convince people to change their thoughts and behaviors--are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn't seem to have shown up in their response.

I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.

SummaryBot

Executive summary: The post provides thoughts on AI safety policies requested from AI labs by the UK government. It argues the policies are inadequate but some labs like Anthropic and OpenAI are relatively better. It suggests alternative priorities like compute limits, risk assessments, and contingency planning.

Key points:

The UK government's policy categories seem reasonable but miss key issues like independent risk assessments and contingency planning.
Current AI systems pose unacceptable risks; progress should halt until risks are addressed. But policies help labs acknowledge risks.
Anthropic and OpenAI's policies seem best, taking risks more seriously. DeepMind's is much worse. Meta's is far worse.
Governments should also institute compute limits, monitor chips, halt chip progress, require risk assessments, and develop contingency plans.
Independent risk assessments from actuaries could help determine which labs can continue operating.
If risks appear unaddressable before wide availability, governments need a plan for that scenario now.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Oliver Sourbut

Another high(er?) priority for governments:

start building multilateral consensus and preparations on what to do if/when
- AI developers go rogue
- AI leaked to/stolen by rogue operators
- AI goes rogue

Comments

More from the author

323

A personal reflection on SBF

So8res·3y ago·23m read

359

On Caring

So8res·11y ago·12m read

115

Comments on OpenAI's "Planning for AGI and beyond"

So8res·3y ago·15m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 5d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

201

The first video from Giving What We Can's new channel is out now!

JustinPortela·6d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

122

Let's taboo the V-word

lincolnq·1d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Recent opportunities to take action

vaniver

(I'm Matthew Gray)

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.

Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.

I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.

^{^}

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

Thanks to Rob Bensinger for assembling, editing, and occasionally rephrasing/extending my draft of this post, with shallow-but-not-deep thumbs up from me.

^{^}

And, as OpenAI’s write-up notes: “We refer to our policy as a Risk-Informed Development Policy rather than a Responsible Scaling Policy because we can experience dramatic increases in capability without significant increase in scale, e.g., via algorithmic improvements.”

^{^}

Matthew Gray writes: “I think OpenAI did a surprisingly good job of responding to this with ‘the real deal’.” Matt cites this line from OpenAI’s discussion of “superalignment”:

Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on human ability to supervise AI. But these techniques will not work for superintelligence, because humans will be unable to reliably supervise AI systems much smarter than us.

^{^}

Doing this fully correctly would also require that you in some sense hold the money that goes to possible future people for risking their fate. Taking into account only the interests of people who are presently alive still doesn’t properly line up all the incentives, since present people could then have a selfish excessive incentive to trade away large amounts of future people’s value in exchange for relatively small amounts of present-day gains.

^{^}

I (Nate) agree with Matt here.

^{^}

Unlike the CFI post authors, I (Nate) would give all of the companies here an F. However, some get a much higher F grade than others.

^{^}

From DeepMind:

This is why we are building on our industry-leading general and infrastructure security approach. Our models are developed, trained, and stored within Google’s infrastructure, supported by central security teams and by a security, safety and reliability organisation consisting of engineers and researchers with world-class expertise. We were the first to introduce zero-trust architecture and software security best practices like fuzzing at scale, and we have built global processes, controls, and systems to ensure that all development (including AI/ML) has the strongest security and privacy guarantees. Our Detection & Response team provides a follow-the-sun model for 24/7/365 monitoring of all Google products, services and infrastructure - with a dedicated team for insider threat and abuse. We also have several red teams that conduct assessments of our products, services, and infrastructure for safety, security, and privacy failures.

Thoughts on the AI Safety Summit company policy requests and responses

1. Thoughts on the AI Safety Policy categories

2. Higher priorities for governments

3. Thoughts on the submitted AI Safety Policies