Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

Dylan Xu; caroq

Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

[anonymous], Dylan Xu,

Comments 1

Sorted by

New & upvoted

emily.fan

What is the main difference between SPAR and AI Safety Camp?

Comments

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 2d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

172

The first video from Giving What We Can's new channel is out now!

JustinPortela·4d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·5d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·1d ago·1m read

PauseCon London '26: Applications now open

Jonathan@PauseAI·1d ago·1m read

Seeking feedback and collaborators for an AI welfare project

Juliana Grant·1d ago·2m read

^{^}

Special thanks to Gabe Mukobi and Aaron Scher for sharing a number of invaluable resources from Stanford AI Alignment’s Supervised Program in Alignment Research, which we drew heavily from, not least the program name.

Supervisor	Project Title
Erdem Bıyık and Vivek Myers, UC Berkeley / CHAI	Inferring Objectives in Multi-Agent Simultaneous-Action Systems
Erik Jenner, UC Berkeley / CHAI	Literature Review on Abstractions of Computations
Joe Benton, Redwood Research	Disentangling representations of sparse features in neural networks
Nora Belrose, FAR AI (now at EleutherAI)	Exhaustively Eliciting Truthlike Features in Language Models
Juan Rocamonde, FAR AI	Using Natural Language Instructions to Safely Steer RL Agents
Kellin Pelrine, FAR AI	Detecting and Correcting for Misinformation in Large Datasets
Zac Hatfield-Dodds, Anthropic	Open-source software engineering projects (to help students develop skills for research engineering)
Walter Laurito, FZI / SERI MATS	Consistent Representations of Truth by Contrast-Consistent Search (CCS)
Leon Lang, University of Amsterdam / SERI MATS	RL Agents Evading Learned Shutdownability
Marius Hobbhahn, International Max Planck Research School / SERI MATS (now at Apollo Research)	Playing the auditing game on small toy models (trojans/backdoor detection)
Asa Cooper Stickland, University of Edinburgh / SERI MATS	Understanding to what extent language models “know what they don't know”

Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

Motivation

Research projects

Operational logistics

Room for improvement

Conclusion