Thoughts on AI Safety Camp

Charlie Steiner

crosspost from https://www.lesswrong.com/posts/3kErRpEprB8iJvnNq/thoughts-on-ai-safety-camp

Early this year I interviewed a sample of AISC participants and mentors, and spent some time thinking about the problems the AI safety research community is facing, and have changed my mind about some things.

AI Safety Camp is a program that brings together applicants into teams, and over about a hundred hours of work those teams do AI safety-related projects that they present at the end (one project made it into a Rob Miles video). I think it's really cool, but what exactly it's good for depends on a lot of nitty gritty details that I'll get into later.

Who am I to do any judging? I'm an independent alignment researcher, past LW meetup organizer, physics PhD, and amateur appliance repairman. What I'm not is a big expert on how people get into alignment research - this post is a record of me becoming marginally more expert.

The fundamental problem is how to build an ecosystem of infrastructure that takes in money and people and outputs useful AI safety research. Someone who doesn't know much about AISC (like my past self) might conceive of many different jobs it could be doing within this ecosystem:

Educating relative newcomers to the field and getting them more interested in doing research on AI alignment.
Providing project opportunities that are a lot like university class projects - contributing to the education of people in the process of skilling up to do alignment research.
Providing potentially-skilled and potentially-interested people a way to "test their fit" to see if they want to commit to doing more AI alignment work.
Catalyzing the formation of groups and connections that will persist after the end of the camp.
Helping skilled and interested people send an honest signal of their alignment research skills to future employers and collaborators.
Producing object-level useful research outputs.

In addition to this breakdown, there's orthogonal dimensions of what parts of AI safety research you might specialize to support:

Conceptual or philosophical work.
Machine learning projects.
Mathematical foundations.
Policy development.
Meta-level community-building.

Different camp parameters (length, filters on attendees, etc.) are better-suited for different sorts of projects. This is why AISC does a lot of machine learning projects, and why there's a niche for AISC alum Adam Shimi to start a slightly different thing focused on conceptual work (Refine).

III

Before talking to people, I'd thought AISC was 35% about signalling to help break into the field, 25% about object-level work, and 15% about learning, plus leftovers. Now I think it's actually 35% about testing fit, 30% about signalling, and 15% about object-level work, plus different leftovers.

It's not that people didn't pick projects they were excited about, they did. But everyone I asked acknowledged that the length of the camp wasn't that long, they weren't maximally ambitious anyhow, and they just wanted to produce something they were proud of. What was valuable to them was often what they learned about themselves, rather than about AI.

Or maybe that's too pat, and the "testing fit" thing is more about "testing the waters to make it easier to jump in." I stand by the signalling thing, though. I think we just need more organizations trying to snap up the hot talent that AISC uncovers.

Looking back at my list of potential jobs for AISC (e.g. education, testing fit, catalyzing groups, signalling) I ordered them roughly by the assumed skill level of the participants. I initially thought AISC was doing things catered to all sorts of participants (both educating newcomers and helping skilled researchers signal their abilities, etc.), while my revised impression is that they focus on people who are quite skilled and buy into the arguments for why this is important, but don't have much research experience (early grad school vibes). In addition to the new program Refine, another thing to compare to might be MLSS, which is clearly aimed at relative beginners.

When I talked to AISC participants, I was consistently impressed by them - they were knowledgeable about AI safety and had good ML chops (or other interesting skills). AISC doesn't need to be in the business of educating newbies, because it's full of people who've already spent a year or three considering AI alignment and want to try something more serious.

The size of this demographic is actually surprisingly large - sadly the organizers who might have a better idea didn't talk to me, but just using the number applying to AISC as the basis for a Fermi estimate (by guessing that only 10-20% of people who want to try AI alignment research had the free time and motivation to apply) gets you to >2000 people. This isn't really a fixed group of people, either - new people enter by getting interested in AI safety and learning about AI, and leave when they no longer get much benefit from the fit-testing or signalling in AISC. I would guess this population leaves room for ~1 exact copy of AISC (on an offset schedule), or ~4 more programs that slightly tweak who they're appealing to.

Most participants cut their teeth on AI alignment through independent study and local LW/EA meetup groups. People are trying various things (see MLSS above) to increase the amount of tooth-cutting going on, and eventually the end game might be to have AI safety just be "in the water supply," so that people get exposed to it in the normal course of education and research, or can take a university elective on it to catch up most of the way to the AISC participants.

The people I talked to were quite positively disposed to AISC. At the core, people were glad to be working on projects that excited them, and liked working in groups and with a bit of extra support/motivational structure.

Some people attended AISC and decided that alignment research wasn't for them, which is a success in its own way. On average, I think attending made AI alignment research feel "more real," and increased peoples' conviction that they could contribute to it. Several people I talked to came away with ideas only tangentially related to their project that they were excited to work on - but of course it's hard to separate this from the fact that AISC participants are already selected for being on a trajectory of increasing involvement in AI safety.

In contrast, the mentorship aspect was surprisingly (to me) low-value to people. Unless the mentor really put in the hours (which most understandably did not), decisions about each project were left in the hands of the attendees, and the mentor was more like an occasional shoulder angel plus useful proofreader of their final report. Not pointless, but not crucial. This made more sense as I came to see AISC as not being in the business of supplying education from outside.

Note that in the most recent iteration that I haven't interviewed anyone from, the format of the camp has changed - projects now come from the mentors rather than the groups. I suspect this is intended to solve a problem where some people just didn't pick good projects and ran into trouble. But it's not entirely obvious whether the (probable) improvement of topics dominates the effects on mentor and group engagement etc., so if you want to chat about this in the comments or with me via video call, I have more questions I'd be interested to ask.

Another thing that people didn't care about that I'd thought they would was remote vs. in-person interaction. In fact, people tended to think they'd prefer the remote version (albeit not having tried both). Given the lower costs and easier logistics, this is a really strong point in favor of doing group projects remotely. It's possible this is peculiar to machine learning projects, and [insert other type of project here] would really benefit from face to face interaction. But realistically, it looks like all types should experiment with collaborating over Discord and Google Docs.

What are the parameters of AISC that make it good at some things and not others?

Here's a list of some possible topics to get the juices flowing:

Length and length variability.
Filtering applicants.
Non-project educational content.
Level of mentor involvement.
Expectations and evaluation.
Financial support.
Group size and formation conditions.
Setting and available tools.

Some points I think are particularly interesting:

Length and length variability: Naturally shorter time mandates easier projects, but you can have easy projects across a wide variety of sub-fields. However, a fixed length (if somewhat short) also mandates lower-variance projects, which discourages the inherent flailing around of conceptual work and is better suited to projects that look more like engineering.

Level of mentor involvement: Giving participants more supervision might reduce length variability pressure and increase the object-level output, but reduce the signalling power of doing a good job (particularly for conceptual work). On the other hand, participating in AISC at all seems like it would still be a decent sign of having interesting ideas. The more interesting arguments against increasing supervision are that it might not reduce length variability pressure by much (mentors might have ideas that are both variable between-ideas and that require an uncertain amount of time to accomplish, similar to the participants), and might not increase the total object-level output, relative to the mentor and participants working on different topics on the margin.

Evaluation: Should AISC be grading people or giving out limited awards to individuals? I think that one of its key jobs is certainly giving honest private or semi-private feedback to the participants. But should it also be helping academic institutions or employers discriminate between participants to increase its signalling power? I suspect that with current parameters there's enough variation in project quality to serve as a signal already if necessary, and trying to give public grades on other things would be shouldering a lot of trouble with perverse incentives and hurt feelings for little gain.

You can get lots of variations on AISC's theme by tweaking the parameters, including variations that fill very different niches in the AI safety ecosystem. For example, you could get the ML for Alignment Bootcamp with different settings of applicant filtering, educational content, group size, and available tools.

On the other hand, there are even more different programs that would have nontrivial values of "invisible parameters" that I never would have thought to put on the list of properties of AISC (similar to how "group size" might be an invisible parameter for MLAB). These parameters are merely an approximate local coordinate system for a small region of infrastructure-space.

What niches do I think especially need filling? For starters, things that fit into a standard academic context. We need undergrad- and graduate-level courses developed that bite off various chunks of the problems of AI alignment. AISC and its neighbors might tie into this by helping with the development of project-based courses - what project topics support a higher amount of educational content / teacher involvement, while still being interesting to do?

We also need to scale up the later links in the chain, focused on the production of object-level research. Acknowledging that this is still only searching over a small part of the space, we can ask what tweaks to the AISC formula would result in something more optimized for research output. And I think the answer is that you can basically draw a continuum between AISC and a research lab in terms of things like financial support, filtering applicants, project length, etc. Some of these variables are "softer" than others - it's a lot easier to match MIRI on project length than it is to match them on applicant filtering.

VII

Should you do AISC? Seems like a reasonable thing for me to give an opinion about, so I'll try to dredge one up.

You should plausibly do it IF:

(

You have skills that would let you pull your weight in an ML project.

You've looked at the AISC website's list of topics and see something you'd like to do.

)

AND

You know at least a bit about the alignment problem - at the very least you are aware that many obvious ways to try to get what we want from AI do not actually work.

AND

(

You potentially want to do alignment research, and want to test the waters.

You think working on AI alignment with a group would be super fun and want to do it for its own sake.

You want to do alignment research with high probability but don't have a signal of your skillz you can show other people.

)

This is actually a sneakily broad recommendation, and I think that's exactly right. It's the people on the margins, those who aren't sure of themselves, the people who could only be caught by a broad net that most benefit from something like this. So if that's you, think about it.

Effective Altruism Forum
EA Forum

Thoughts on AI Safety Camp

18

18

Reactions

More posts like this