Safety evaluations and standards for AI | Beth Barnes | EAG Bay Area 23

Beth Barnes

Safety evaluations and standards for AI | Beth Barnes | EAG Bay Area 23

Beth Barnes

20 min readJun 16, 2023

Comments

Sorted by

New & upvoted

No comments on this post yet.

Be the first to respond.

Comments

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 6d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

How (not) to fundraise from Anthropic staff

Jack Lewars·6d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·4d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·2d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·2d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·1d ago·1m read

Safety evaluations and standards for AI | Beth Barnes | EAG Bay Area 23

Safety evaluations and standards for AI | Beth Barnes | EAG Bay Area 23

Background

Our plan

A high level story of AI x-risk

Overall idea

“Autonomous replication” threshold

An example task

Why this threshold?

Downsides of this eval

Missing some threat models

Humans in the loop = subjective and expensive

FAQs: possible issues and how we’re going to deal with them

Why would labs agree to safety standards?

How does alignment come into it?

Are you worried about accelerating capabilities?