Crossposted on Less Wrong.
Summary: AI alignment needs many great people working on research, engineering, governance, social science, security, operations, and more. I think AI alignment-focused university groups are uniquely underexploited sources of that future talent. Here are my plans for running such a group at Stanford, broadly involving mass outreach to a wide range of students, efficient fit-testing and funneling to a core group of promising students, and robust pipelining of those students into a connected network of impactful opportunities.
Confidence: Moderate. This theory of change is heavily inspired by similar plans from Michael Chen (Effective Altruism at Georgia Tech) and Alexander Davies (Harvard AI Safety Team) as well as EA field-building learnings from Stanford Effective Altruism. While parts of this have been tested in those communities, parts are new or different. That said, I’m fairly confident this is at least a significant improvement over our current local AI alignment field-building operations. The purpose of posting this is both to refine these plans and to share ideas that might be useful for groups with similar goals.
Terminology: I use the term “AI alignment” as an umbrella term throughout this to refer not only to technical AI alignment/existential AI safety, but also to AI governance, AI strategy, and other work that directly reduces existential risk from AI. The specific choice of this term (as opposed to e.g. “AI safety”) is to practically distinguish the long-term/existential risk focus of our new group from the near-term-leaning Stanford Center for AI Safety.
Stanford Effective Altruism has gradually been increasing its focus on AI alignment over the last year. We invited more AI-related guests to our open Monday dinners, ran events like an ELK weekend and a Rob Miles binge party, organized one and a half alignment-focused retreats, and ran a local AGI Safety Fundamentals Technical Alignment reading group in the spring. From these activities, we saw some promising signs, like:
- Many people we’d never met before attended our Monday dinners because of cold email list messages advertising an AI alignment speaker.
- Among our group of core EAs, many got more interested in AI alignment and asked for more activities like these.
However, we also found some potential issues:
- Many of those students we’d never met before, suggesting our EA outreach was failing to reach and capture them (e.g. some people interested in AI alignment who attended our Monday dinners had heard about but decided not to apply to our intro EA fellowship).
- For our core EAs who did get significantly interested in AI alignment, there was a big gap between “I have read and discussed AI enough to think this is the most pressing problem” and “I feel like I have the skills and opportunities to have an impact.”
- It could feel at times like Stanford EA was just about AI alignment and that there wasn’t a space for other cause areas.
These initial activities showed a lot of promise for expanding and formalizing AI alignment opportunities at Stanford, but they also made us feel like we should try a different approach.
1.2 SAIA: A New Organization
One day in spring 2022, the Stanford EA organizers decided to split off a sub-group solely dedicated to AI alignment field-building. The initial idea was simple: Run most of our existing AI-focused activities under this new AI-focused group in order to keep Stanford EA more EA/cause-prioritization focused and outreach to students who could be good fits for AI alignment without the direct EA association. And thus, Stanford AI Alignment (SAIA, pronounced "sigh-uhh") was (unofficially) born.
But we wanted to go further. We wanted to build a group and scale up a community extraordinarily more effective at AI alignment field-building than Stanford EA. This post is a high-level sketch of our plans for robustly scaling up our outreach, support systems, and formal operations over the coming 2022-23 school year and beyond. We hope this helps achieve our goals of building the AI alignment community at Stanford, getting Stanford students into highly positively effective careers around AI, and ultimately making AGI go as best as possible.
It should be noted that much of this is heavily inspired by similar plans from Michael Chen (Effective Altruism at Georgia Tech) and Alexander Davies (Harvard AI Safety Team) who have been devising and locally testing ways to run AI alignment student groups. Credit for the parts that work goes to them!
Several assumptions inform these decisions, and different takes on these assumptions could lead to different plans:
- AI alignment is talent constrained, and getting more talented people working on various problems in the field will effectively improve the world.
- Universities have many talented students who would be good fits for AI alignment work in research, engineering, governance, social science, security, operations, and more.
- Of those many talented students, many aren’t working on AI alignment simply because they have yet to be exposed to it.
- EA groups are not the best way to expose students to AI alignment because working on AI alignment does not require an EA mindset and many students who could be good fits for AI alignment work are not good fits for EA (e.g. they might be turned off by certain aspects). That said, we still want to select students with decent levels of altruism, responsibility, and safety-focus to minimize the negative externalities of growing AI capabilities (more on this later).
- Many students who could be doing impactful AI alignment work are held back because they care but lack the skills necessary to do the work, they can get the skills but lack the confidence to start doing work, or they feel that they need a mentor in order to effectively learn. In all these cases, clear career guides, regular support, and simple accountability structures can significantly amplify their impact.
- 1-on-1 meetings between a developing student and an engaged community member are effective ways to guide and advance the student’s career.
- Peer accountability systems where many people work on similar things together and publicly agree to SMART goals are effective mechanisms for ensuring work gets done.
2. The Tree of Life
Below is the theory of change diagram for Stanford AI Alignment (click here to view a zoomable version). I explain each section in detail later, but in general, the plan consists of four phases: A Talent Pool that searchers much wider than our EA outreach currently does to find promising students and expose them to AI alignment; a Funnel that efficiently selects the students who are interested in learning more about and could be good fits for AI alignment; a Pipeline to robustly up-skill and deploy those students into impactful careers; and a Network of professional communities to connect the university and “real-world” AI alignment spheres.
2.1 Soil: The Talent Pool
For the first phase, we aim to draw from a very wide pool of potentially impactful students, so we’ll have to reach far beyond our EA student group community. While we will direct EAs who go through the intro fellowship and express interest in AI alignment to this group, we’ll also directly invite non-EA connections, ask professors to speak at the beginning of their classes, send emails to school-wide email lists and especially to established student groups focused on AI/policy/entrepreneurship, and reach out to students involved in various forms of research.
Because we will mostly have low-bandwidth communication channels to each of these groups, the message we reach out to them with is something along the lines of: “Transformative Artificial Intelligence could come within our lifetimes and be tremendously good or bad for the world, but there’s a lot of evidence that indicates it will be bad by default. AI alignment is an exciting and growing field focused on the technical and societal problems of making it good, and the field greatly needs talented students like you studying engineering, policy, social science, entrepreneurship, communication, and more. If this sounds interesting, weird, or concerning to you, come to Stanford AI Alignment’s introductory session. We’ll have free food and books about alignment for all who attend! RSVP here now” (Note: this is placeholder language to communicate the general idea in this post—I will thoroughly workshop this invitation with help before using it). We want the invitation to be low-cost and enticing for students to attend this event. A key part of this will involve operations-style preparation to be able to send out these invitations very early in the term (especially at the start of each school year) and catch students before their schedules are already full with commitments.
The actual introductory session, which should also happen early in the term, seeks to rapidly expose these students to AI alignment ideas. With our audience captive with free food and books for even an hour, we will do a few things:
- Present a medium-bandwidth introduction to some of the core ideas of AI alignment such as Rob Miles’ 20-minute Intro to AI Safety talk or the first 23 minutes of Vael Gates' Risks from Advanced AI talk.
- Lead group discussion through a few compelling prompts in the style of a reading group.
- Express the need for talented students across a wide range of skills (not just technical) and show some exciting recent work in the field
- Call for the action of joining an AI alignment fundamentals program by simply signing up their name and email on a simple form or paper.
We now have a pool of students wanting to learn more about AI alignment! From here, we can send a follow-up message to ask for their availability at a few candidate times and their focus area preferences (e.g. technical vs governance), match them to reading groups, and send them down the Funnel.
2.2 Roots: The Funnel
The second phase’s role is to funnel down that pool of students by exposing them to high-bandwidth communication about AI alignment and testing their fit for understanding the ideas of and participating in the work in the field.
A reading group seems like the best way to facilitate this. Many student groups have run local reading groups of the EA Cambridge AGI Safety Fundamentals (AGISF) Technical Alignment or AI Governance curricula, the Columbia EA AI Safety Discussion Group curriculum, or a custom AI alignment and policy research curriculum (e.g. we might want to design one focused on empirical AI alignment research aimed at empirical ML researchers). By default, we’ll put everyone through the AGISF Technical Alignment Curriculum, or the AGISF AI Governance curriculum for groups of students who think they would better fit governance work (e.g. public policy students).
Although we think it’s generally good for everyone to go through some kind of reading group (e.g. future field-builders who won’t be doing direct work still benefit from knowing the problems and nature of the field), we will also have skip-connections around the Funnel for certain exceptions. These special cases might include students who have already read a lot about AI alignment and were planning to work on it anyway (but might still need a community and support structure to help them get there), or in rarer cases, perhaps people who are really good fits for non-technical roles and already motivated by AI alignment but who don’t necessarily need an in-depth reading group.
Naturally, we expect a significant number of students to be selected-out through this phase, primarily due to them realizing they don't want to work in AI safety. This isn’t a bad thing—such a broad outreach strategy is bound to catch many false positives, and we don’t want to force those people into careers they aren’t good fits for. However, we do want to minimize promising students dropping out because they feel they don’t belong, are too busy, get bored, or forgot to do readings. To protect against that, we’ll use a mixture of careful matching (e.g. putting non-technical students together, putting higher-context students together), direct accountability measures (e.g. sending reminder messages to everyone 1-2 days before each meeting), peer accountability systems (e.g. “When you’re done with the readings, write 1 question next to your name in this spreadsheet that the rest of your cohort can see”), and direct incentives (e.g. free snacks/drinks and guest alignment experts at certain meetings).
2.3 Trunk: The Pipeline
All students who graduate from the reading group Funnel, are still interested, and seem promising to SAIA organizers are then promoted to Core Member status and move on to the third phase. In this Pipeline, we are no longer funneling down the numbers and testing students for fit—rather, we are aiming to retain a high percentage of these talented students as we efficiently up-skill and deploy them into impactful careers. Our various AI-related activities at Stanford EA were aimed at different audiences and didn’t quite have a unifying telos, but here the end goal is clear: If you are a Core Member in the Pipeline, we are going to do everything we can to get you an impactful job in AI alignment.
For most students, this will generally involve up-skilling in a particular career track like theoretical research science, empirical research science, research engineering, governance, field-building, etc. We’re lucky to be in a time when people are publishing great career guides specific to AI alignment subfields, so if we could just have them work through those, that might get the job done.
Of course, the reality isn’t so simple. Many students need specific advice about their specific situations, a space to talk through their uncertainties, or just someone to act as a mentor and push their “start” button, so we’ll connect Core Members to regular (e.g. every week/2 weeks/month) 1-on-1 meetings with SAIA organizers or people in our networks who are engaged in AI alignment work. Additionally, we’ll run regular (i.e. weekly) coworking sessions for students to block out time to get things done while making regular accountability pledges to the group (likely with free food as an incentive). As a group at a university, we can use our resources to suggest our students register for specific useful classes (e.g. in ML/policy/economics) and form study groups around them. And to build a sense of community, we will adopt and expand Stanford EA’s AI alignment activities like workshops, retreats, and VIP guest events as invite-only SAIA events. One of these activities should feel like a consistent central commitment that people can prioritize as the core of their activities in the group—for SAIA, that will be the weekly coworking sessions, but other groups have had success running a core paper reading group.
Additionally, the hope of the Pipeline is not only to get students into impactful work but to also be a space for Core Members to do impactful work while they are still students. For example, we’ll encourage theorists to produce distillations and proposed research directions, engineers to carry out AI safety experiments and publish empirical results, governance students to conduct policy and strategy research, entrepreneurs to start new projects, and field-builders to get involved in helping SAIA.
2.4 Branches: The Network
Eventually, Core Members will move out of the pipeline and into impactful roles beyond our student community (e.g. by graduating and working, participating in summer pre-professional opportunities, or conducting independent research). While this is where the work of SAIA organizers mostly stops, we don’t want this to be a one-way street—rather, it should be an interconnected intersection full of grand collisions in the most wonderful of ways!
I think there could be an even tighter relationship between students and professionals, so we’ll highly encourage our alumni and professionals in the Network to re-engage with and support Core Members through volunteering for 1-on-1 mentorship, coming as guests for special events, sharing job and apprenticeship opportunities, or just being great role models. The particular arrows in this phase of the theory of change diagram as presented don’t especially matter—the idea is more to cultivate a vibrant and cooperative community of AI alignment professionals in and out of university.
There might be several different ways this could fail in practice. To reduce that risk, here are some concerns that I or others have identified and responses to them.
3.1 How do we measure success?
There’s no perfect solution here, but the best I can come up with is a combination of a quantitative output metric and a qualitative effectiveness metric. Quantitatively, our main measure will be the number of students we counterfactually get to steer their careers into impactful AI alignment work. However, maximizing person-count alone is probably a bad strategy, so we’ll balance this by qualitatively evaluating how skilled and effective those students are after they graduate, the sense of community we’ve built through our group, and our group’s reputation amongst other factions at our university (e.g. the general student body or AI professors).
3.2 Don’t we need a few exceptional people, not many good people?
I think university groups are currently missing many exceptional students who never hear about AI alignment and go on to work in non-impactful roles in finance/tech/consulting/etc., and this broad outreach strategy within a university is a way to miss fewer of them (i.e. sampling more from heavy-tailed distributions gets your more samples from the tail). Note that this depends on the crux that not all exceptional AI alignment workers will discover AI alignment on their own (which I think is especially true of non-technical-research roles).
Also, while this may be true for research scientists right now, I don’t think this is true for other neglected disciplines in governance, operations, security, engineering, entrepreneurship, or field-building. Not to mention, the field as a whole 4 years from now (the usual length of a U.S. undergraduate degree) might look very different and be able to absorb even more talent for shovel-ready projects.
3.3 This seems like a lot of work for students!
It does look complicated when viewed all at once, but hopefully what this looks like to an individual student over the course of several terms/years is “Go to an info session with free food” → “Join an interesting reading group” → “Participate in a welcoming and focused community where I learn cool skills” → “Oh, you really think I’m ready to apply for a job at that AI alignment org?!”
Regarding time, I worry about promising students having to drop out of things because they’re too busy with classes/extracurriculars/life. To address this, we'll have accountability and support processes in each phase, we’ll support varying levels of commitment in the Pipeline (e.g. perhaps the best thing for some members is to spend a lot of time on classes and research and minimally engage with our group), and we’ll run the introductory session and start the reading group very early in the quarter (i.e. within the first 2ish weeks) to get students to commit to this before committing to other things.
3.4 This seems like a lot of work for organizers!
Yeah… I probably wouldn’t recommend this to newer communities without an established EA group or at least 4 part-time AI alignment organizers. However, with a dedicated group of people who really want to make this happen, I think it’s very doable. Most of the critical work comes before each academic term to make the introductory session happen and start the reading group—beyond that, it’s mostly facilitating reading group discussions, doing 1-on-1s, and running some core events.
As a very rough calculation, if an organizer each week led 2 reading groups of 8 students (4 hours), ran 5 30-minute 1-on-1s (2.5 hours), and organized 1 event per week (3.5 hours), then with 10 hours of total work they could support 16 reading group fellows and 5 Core Members (or 64 fellows and 20 Core Members with 4 organizers). Not to mention some of these responsibilities could be shared with or outsourced to EA groups, locally hired EA operations managers, or professionals in the Network.
3.5 How will this be funded?
Organizing a program like this for an academic term would likely invoke costs for:
- Food/books for the Intro Session
- Snacks/drinks for certain Reading Group meetings
- Food for Core Coworking Sessions
- Other positive accountability incentives
- Rideshares/travel for guests
- Organizer/facilitator pay
When you add it up, the numbers are significant (though well within the range of a standard EA Infrastructure Fund grant). But if such a funding opportunity can produce just a few talented people working on AI alignment per year (and potentially generating millions of dollars worth of impact), then judiciously ambitious funders would be remiss to pass it up. Additionally, several new AI alignment infrastructure organizations are currently scaling up to provide funding as well as specific services like free books, professional networks, and community organizers to impactful groups.
3.6 How reputable will this group be?
Such a group might need a strong reputation to easily interact with academia and certain professional organizations. Unfortunately, it’s also the case that people often need a sense of status to be motivated. The hope is that by directly producing impactful work in the Pipeline (e.g. publishing research to mainstream ML conferences, advertising members winning thousands of dollars in prizes for open challenges like ELK), signaling that we’re very serious about AI alignment (e.g. hosting events with professional guests, working with external AI alignment organizations and funders), and demonstrating a path to a rewarding career (e.g. graduating several students each year into well-paid positions at top alignment organizations), such a group can eventually be held about as high as the top entrepreneurial and tech groups within a university. At Stanford specifically, we will try to incorporate SAIA as a child organization of the Stanford Existential Risks Initiative (a major contributor of work/opportunities in AI alignment) or even the Stanford Center for AI Safety (a leading research group that is the first Google search result for “AI safety”) in order to inherit some of their reputations and connections.
3.7 What about the risks of exposing many people to AI alignment?
There are attention hazards associated with broadly sharing information about the dangers and possibilities of AGI. These risks are genuinely concerning for a new, largely-untested approach to university AI alignment field-building, so organizers should thoughtfully consider possible failure modes and consult other field-builders for review before following plans like these.
Optimistically, I think the highly-supervised nature of this group will help us filter and steer most the of students whom we send through the whole up-skilling Pipeline away from AI capabilities and towards AI safety (e.g. by selecting for safety-focused students after the Funnel before offering up-skilling or discussing the risks of capabilities-advancing job opportunities in 1-on-1s). But even under a pessimistic view that this process will produce just as many capabilities researchers as safety researchers, for example, I still think the marginal upside risks of having an additional talented person work on AI alignment significantly outweigh the marginal downside risks of having an additional talented person work on AI capabilities. This is because many more people are already working on capabilities in industry (compared to a neglected community working on AI alignment), and only a small fraction of capabilities research is likely to be good enough to speed up AGI counterfactually (compared to a lot of potential for low-hanging fruit and future shovel-ready work in AI alignment).
There are also reputational hazards associated with building a public AI alignment community among people who disagree that AI alignment is worth working on. For this reason, we’ll prioritize students and avoid targeted outreach to unaligned AI professors (besides asking to briefly talk at the start of a class) until we can establish a more solid reputation and perhaps until the divide between the professional alignment and capabilities communities is narrowed.
Whew, those were a lot of plans! As I said before, only parts of this have been solidly tested (mostly the reading group as a Funnel at other universities and the various accountability/advising measures of the Pipeline at Stanford EA). The rest are mostly educated guesses, so if you’ve tested some of this or have ideas about what to improve, I’d love for you to let me know! We at SAIA will keep you posted on how this goes over the next year.
And if you’re an organizer at another university group or non-academic local group, definitely feel free to duplicate, modify, and test as much of this as you like. My hope in publishing this is that we as a community can form solid systems that leverage the full potentials of local AI alignment groups and make AGI go as best as it possibly can!
Many thanks to EJ Watkins, Eleanor Peng, Esben Kran, Jonathan Rystrøm, Madhu Sriram, Michael Chen, Nicole Nohemi, Nikola Jurković, Oliver Zhang, Thomas Woodside, and Victor Warlop for valuable feedback and suggestions!