How to Diversify Conceptual AI Alignment: the Model Behind Refine

adamShimi

This is a linkpost for https://www.alignmentforum.org/posts/5uiQkyKdejX3aEHLM/how-to-diversify-conceptual-alignment-the-model-behind

This work has been done while at Conjecture

Tl;dr: We need far more conceptual AI alignment research approaches than we have now if we want to increase our chances to solve the alignment problem. However, the conceptual alignment field remains hard to access, and what feedback and mentorship there is focuses around few existing research directions rather than stimulating new ideas. This model lead to the creation of Refine, a research incubator for potential conceptual alignment researchers funded by the LTFF and hosted by Conjecture. Its goal is to help conceptual alignment research grow in both number and variety, through some minimal teaching and a lot of iteration and feedback on incubatees’ ideas. The first cohort has been selected, and will run from August to October 2022. In the bigger picture, Refine is an experiment within Conjecture to find ways of increasing the number of conceptual researchers and improve the rate at which the field is making productive mistakes.

The Problem: Not Enough Varied Conceptual Research

I believe that in order to solve the alignment problem, we need significantly more people attacking it from a lot different angles.

Why? First because none of the current approaches appears to yield a full solution. I expect many of them to be productive mistakes we can and should build on, but they don't appear sufficient, especially with shorter timelines.

In addition, the history of science teaches us that for many important discoveries, especially in difficult epistemic situations, the answers don't come from one lone genius seeing through the irrelevant details, but instead from bits of evidence revealed by many different takes and operationalizations^[1] (possibly unified and compressed together at the end). And we should expect alignment to be hard based on epistemological vigilance.

So if we accept that we need more people tackling alignment in more varied ways, why are we falling short of that ideal? Note that I will focus here on conceptual researchers, as they are the source of most variations on the problem, and because they are so hard to come by.

I see three broad issues with getting more conceptual alignment researchers working on wildly different approaches:

(Built-in Ontological Commitments) Almost all current attempts to create more conceptual alignment researchers (SERI MATS, independent mentoring...) rely significantly on mentorship by current conceptual researchers. Although this obviously comes with many benefits, it also leads to many ontological commitments being internalized when one is learning the field. As such, it's hard to go explore a vastly different approach because the way you see the problem has been moulded by this early mentorship.
(Misguided Requirements) I see many incorrect assumptions about what it takes to be a good conceptual researcher floating around, both from field-builders and from potential candidates. Here's a non-exhaustive list of the most frustrating ones
- You need to know all previous literature on alignment (the field has more breadth than depth, and so getting a few key ideas is more important than knowing everything)
- You need to master maths and philosophy (a lot of good conceptual work only uses basic maths and philosophy)
- You need to have an ML background (you can pick up the relevant part and just work on approaches different to pure prosaic alignment)
(No Feedback) If you want to start on your own, you will have trouble getting any feedback at all. The AF doesn't provide much feedback even for established researchers, and it has almost nothing in store for newcomers. Really, the main source of feedback in the field is asking other researchers, but when you start you usually don't know anyone. And without feedback, it's hard to stay motivated and ensure your work is relevant to the core problem.

Refine, the incubator for conceptual researchers and research bets that I'm running at Conjecture, aims at addressing these issues.

Description of Refine

Research Incubator

Refine is a research incubator: that is, a program for helping potential conceptual researchers improve and create relevant ideas and research. It's inspired by startup incubators like Y combinator, but with a focus on research. As such, the point is not to make participants work on already trusted research directions, but to give them all the help they need to create exciting and relevant new research questions and ideas that are highly relevant to alignment.

In broad strokes, Refine starts with two weeks focused around studying and discussing core ideas in the History and Philosophy of Science and in the Epistemology of Alignment, followed by 10 weeks of intense idea-generation-feedback-writing loops (for a total of 3 months).

At the end, the research produced will be evaluated by established conceptual researchers, and we'll help the incubatees get funding or get hired (at Conjecture or other places).

In more details, the first cohort of Refine will follow this process:

Selection: by order of priority (more details in the call for participants)
- Relentlessly resourceful
- Access to weird and different ideas and frames
- Understanding of the alignment problem (by default applicants have a minimum understanding to even care to apply)
Initial power-up (2 weeks): the program begins with two weeks of reading, presentations, discussions and debates about core ideas in the epistemology of alignment. The goal is to give people tools and keys for thinking about the problem and bias them towards the core questions while still leaving them a lot of margin for innovation.
- Before start of cohort: reading group of posts presenting different takes on alignment
  1. What Multipolar Failures Look Like by Andrew Critch
  2. Why Agent Foundations? An Overly Abstract Explanation by John Wentworth
  3. How do we become confident in the safety of a machine learning system? by Evan Hubinger
  4. My research methodology by Paul Christiano
  5. A central AI alignment problem: capabilities generalization, and the sharp left turn by Nate Soares
- Week 1: History and Philosophy of Science and Models of Progress
  1. Productive Mistakes
  2. Epistemological Vigilance
  3. Mosaic and Palimpsests
  4. Pluralism (Posts about it in the works)
- Week 2: Epistemology of Alignment
  1. High-level Map of Conceptual Alignment Research
  2. Unbounded Atomic Optimization (Posts about it in the works)
Intense iteration (10 weeks):
- Incubatee generates and explores idea
- We discuss the ideas, along a bunch of lines
  1. Assumptions made
  2. Interesting parts of the productive mistake
  3. Failings/limits
- Based on the discussion and feedback, the idea is either closed (because no clear way to improve upon it, or relevant but not priority now, or not relevant, or no clear ways of extending it) or open
- If closed idea, then produce an artifact about it and go back to step 1) with new direction
- If open idea, then go back to step 1) but about the directions that came from questioning the idea
Evaluation
- Final write-up
- Help them write grant applications and get funding/jobs
- Gather feedback from established conceptual alignment researchers

Generalist Mentors

Rather than having current researchers act as PhD advisors on their own topics, Refine aims at leveraging more generalist mentors (currently me) who can see value and issues in almost all approaches, while understanding the problem deeply enough to give relevant feedback. The hope is that this kind of support will minimize ontological commitments while still biasing the work towards the hard problem.

In addition, generalist mentors avoid the overuse of the scarce resource of conceptual researchers, and might be a great fit for thinkers focused on the sort of epistemological work I'm doing at Conjecture.

Selection and Respect

(The Black Swan, Nassim Nicholas Taleb, 2007)

Many people labor in life under the impression that they are doing something right, yet they may not show solid results for a long time. They need a capacity for continuously adjourned gratification to survive a steady diet of peer cruelty without becoming demoralized. They look like idiots to their cousins, they look like idiots to their peers, they need courage to continue. No confirmation comes to them, no validation, no fawning students, no Nobel, no Shnobel. “How was your year?” brings them a small but containable spasm of pain deep inside, since almost all of their years will seem wasted to someone looking at their life from the outside. Then bang, the lumpy event comes that brings the grand vindication. Or it may never come.
Believe me, it is tough to deal with the social consequences of the appearance of continuous failure. We are social animals; hell is other people.
[...]
We favor the sensational and the extremely visible. This affects the way we judge heroes. There is little room in our consciousness for heroes who do not deliver visible results—or those heroes who focus on process rather than results.
[...]
But this does not mean that the person insulated from materialistic pursuits becomes impervious to other pains, those issuing from disrespect. Often these Black Swan hunters feel shame, or are made to feel shame, at not contributing. “You betrayed those who had high hopes for you,” they are told, increasing their feeling of guilt. The problem of lumpy payoffs is not so much in the lack of income they entail, but the pecking order, the loss of dignity, the subtle humiliations near the watercooler.
It is my great hope someday to see science and decision makers rediscover what the ancients have always known, namely that our highest currency is respect

Building and running a program like Refine leads to a conundrum. On the one hand, there are obviously tests and evaluations involved: at the beginning to select people, during the program, and at the end to decide if the program was successful. On the other hand, the anxiety of being always judged and evaluated is corrosive, as Taleb expresses so clearly.

I don't have a perfect solution. The dark world is that both need to be taken into account for the program to succeed.

My current choice is to use these two different frames in distinct contexts. During the selection process, and when making the post-mortem, I should take an evaluative frame, while remembering that historical progress is incredibly more subtle than the parody we often make of it. And during the actual running of the program, I shouldn't be in an evaluative mindset, but only focus on how to help the participants do the best they can.

Difference with Other Programs

With more and more programs around alignment in the last few years, it makes sense to ask if the problem we're tackling with Refine has not been addressed already. I'm definitely excited about all these programs; yet they all target different enough problems that I don't think they are addressing the lack of varied conceptual research completely.

SERI MATS attacks the problem of creating more researchers for already established agendas — what I call the accelerated PhD model. As such, its participants are heavily directed and biased towards the current ontological commitments, rather than pushed to try completely new things.
AI Safety Camp has been shifting around recently, but the earlier editions lacked the detailed feedback of generalist mentors, while the most recent edition (which I was involved with) was a form of the accelerated PhD model and thus had the same issues as MATS for generating new takes.
PIBBSS aims at diversification, not directly creating new conceptual researchers or even new approaches necessarily. Still, the PIBBSS fellows could definitely constitute a strong group to select future cohorts from.
AGI Safety Fundamentals focuses on education rather than production of research, and is strongly colored by the ontological commitments of Richard Ngo.

Some Concrete Details

The first cohort of Refine, funded by the Long-Term Future Fund, will happen from August to October 2022. The ops are managed by Conjecture, and it will happen in France initially (for administrative reasons), then in London at Conjecture's offices. We pay incubatees a stipend, and also cover all their travel and housing.

The first cohort is composed of Alexander Gietelink Oldenziel, Chin Ze Shen, Tamsin Leake, Linda Linsefors, and Paul Bricman. In terms of statistics, it's interesting to notice that none of the participants are British or American: 4 out of 5 are from continental Europe, and one is from Southeast Asia. In terms of knowledge of alignment, 2 have a deep interaction with the field, 2 have thought independently about it a lot, and one is relatively new to it.

For the final evaluation, Steve Byrnes, Vanessa Kosoy, Evan Hubinger, Ramana Kumar, and John Wentworth all committed to look and evaluate the output of at least a few participants, and give judgment on whether they are excited by the research produced.

The Long View: Refine and Conjecture

The idea for Refine mostly came from my own frustrations with the small growth of conceptual alignment research, and from a project of an independent lab with Jessica Cooper.

Yet Conjecture management has been excited about it since even before I joined officially, and Refine fits well within the core mission of Conjecture: to improve and scale alignment research by finding many angles of attack on the problem and then supporting researchers to do the best possible work.

In this perspective, Refine is an experiment to find ways of diversifying alignment research and making more productive mistakes. It's a tentative way of converting resources into more varied and unexplored alignment research directions, and generally to help create more and better conceptual alignment researchers.

If Refine is successful at producing exciting new research and researchers, then finding ways to replicate it, improve it, and scale it (maybe in a decentralized way) will become one of Conjecture's priorities. If it isn't successful, then we will learn the most we can from the failure and iterate on other options to create great and varied conceptual alignment research.

I also see a strong synergy between the needs of Refine-like programs and the epistemology team I'm leading at Conjecture. More specifically, researchers focused on the History and Philosophy of Science and the Epistemology of Alignment seem like great fits for generalist mentors, because they are steeped in the details of progress and alignment enough to provide useful and subtle feedback while minimizing ontological commitments.

^{^}
I will dig into this in future posts, but if you want pointers now, you can see my post on productive mistakes, Chapter 2 (on electrolysis) and Chapter 3 (on chemical atomism) of Is Water H2O? by Hasok Chang, and Rock, Bone, and Ruin by Adrian Currie.

Effective Altruism Forum
EA Forum