Hiring engineers and researchers to help align GPT-3

by Paul_Christiano3 min read1st Oct 202019 comments

106

Job listing (closed)OpenAIAI alignmentExistential riskCommunity
Frontpage

My team at OpenAI, which works on aligning GPT-3, is hiring ML engineers and researchers. Apply here for the ML engineer role and here for the ML researcher role.

GPT-3 is similar enough to "prosaic" AGI that we can work on key alignment problems without relying on conjecture or speculative analogies. And because GPT-3 is already being deployed in the OpenAI API, its misalignment matters to OpenAI’s bottom line — it would be much better if we had an API that was trying to help the user instead of trying to predict the next word of text from the internet.

I think this puts our team in a great place to have an impact:

  • If our research succeeds I think it will directly reduce existential risk from AI. This is not meant to be a warm-up problem, I think it’s the real thing.
  • We are working with state of the art systems that could pose an existential risk if scaled up, and our team’s success actually matters to the people deploying those systems.
  • We are working on the whole pipeline from “interesting idea” to “production-ready system,” building critical skills and getting empirical feedback on whether our ideas actually work.

We have the real-world problems to motivate alignment research, the financial support to hire more people, and a research vision to execute on. We are bottlenecked by excellent researchers and engineers who are excited to work on alignment.

What the team does

In the past Reflection focused on fine-tuning GPT-3 using a reward function learned from human feedback. Our most recent results are here, and had the unusual virtue of simultaneously being exciting enough to ML researchers to be accepted at NeurIPS while being described by Eliezer as “directly, straight-up relevant to real alignment problems.”

We’re currently working on three things:

  • [20%] Applying basic alignment approaches to the API, aiming to close the gap between theory and practice.
  • [60%] Extending existing approaches to tasks that are too hard for humans to evaluate; in particular, we are training models that summarize more text than human trainers have time to read. Our approach is to use weaker ML systems operating over shorter contexts to help oversee stronger ones over longer contexts. This is conceptually straightforward but still poses significant engineering and ML challenges.
  • [20%] Conceptual research on domains that no one knows how to oversee and empirical work on debates between humans (see our 2019 writeup). I think the biggest open problem is figuring out how and if human overseers can leverage “knowledge” the model acquired during training (see an example here).

If successful, ideas will eventually move up this list, from the conceptual stage to ML prototypes to real deployments. We’re viewing this as practice for integrating alignment into transformative AI deployed by OpenAI or another organization.

What you’d do

Most people on the team do a subset of these core tasks:

  • Design+build+maintain code for experimenting with novel training strategies for large language models. This infrastructure needs to support a diversity of experimental changes that are hard to anticipate in advance, work as a solid base to build on for 6-12 months, and handle the complexity of working with large language models. Most of our code is maintained by 1-3 people and consumed by 2-4 people (all on the team).
  • Oversee ML training. Evaluate how well models are learning, figure out why they are learning badly, and identify+prioritize+implement changes to make them learn better. Tune hyperparameters and manage computing resources. Process datasets for machine consumption; understand datasets and how they affect the model’s behavior.
  • Design and conduct experiments to answer questions about our models or our training strategies.
  • Design+build+maintain code for delegating work to ~70 people who provide input to training. We automate workflows like sampling text from books, getting multiple workers’ answers to questions about that text, running a language model on those answers, then showing the results to someone else for evaluation. It also involves monitoring worker throughput and quality, automating decisions about what tasks to delegate to whom, and making it easy to add new work or change what people are working on.
  • Participate in high-level discussion about what the team should be working on, and help brainstorm and prioritize projects and approaches. Complicated projects seem to go more smoothly if everyone understands why they are doing what they are doing, is on the lookout for things that might slip through the cracks, is thinking about the big picture and helping prioritize, and cares about the success of the whole project.

If you are excited about this work, apply here for the ML engineer role and here for the ML researcher role.

106

19 comments, sorted by Highlighting new comments since Today at 4:43 PM
New Comment

Very interesting role, but my understanding was that job posts were not meant to be posted on the forum

My process was to check the "About the forum" link on the left hand side, see that there was a section on "What we discourage" that made no mention of hiring, then search for a few job ads posted on the forum and check that no disapproval was expressed in the comments of those posts.

[-]Aaron Gertler1y Moderator Comment6

That's not my understanding. As the lead moderator, here's what I've told people who ask about job posts:

If we start to have a lot of them such that they're getting in the way of more discussion-ready content, I'd want to keep them off the frontpage. Right now, we only get them very occasionally, and I'm generally happy to have them be more visible...

...especially if it's a post like this one which naturally leads to a bunch of discussion of an org's actual work. (If the job were something like "we need a copyeditor to work on grant reports," it's less likely that good discussion follows, and I'd again consider sorting the content differently.)

If I said something at some point that gave you a different impression of our policy here, my apologies!

Before the revamp of the forum, I was asked to take down job ads, but maybe things have changed since then. I personally don't think it would be good for the forum to become a jobs board, since the community already has several places to post jobs.

Yeah. Well, not that they cannot be posted, but that they will not be frontpaged by the mods, and instead kept in the personal blog / community section, which has less visibility.

Added: As it currently says on the About page:

Community posts

Posts that focus on the EA community itself are given a "community" tag. By default, these posts will be hidden from the list of posts on the Forum's front page. You can change how these posts are displayed by using...

Ok, the post is still labelled as 'front page' in that case, which seems like it should be changed

To clarify, Halstead, "Community" is now a tag, not a category on the level of "Frontpage". Posts tagged "Community" will still either be "Frontpage" or "Personal Blog".

This comment is a bit out of date (though I think it was made before I made this edit). The current language is:

Posts that focus on the EA community itself are given a "community" tag. By default, these posts will have a weighting of "-25" on the Forum's front page (see below), appearing only if they have a lot of upvotes. 

We don't hide all "community" posts by default, but they will generally be less prominent on the front page unless a user changes the weighting themselves.

Thank you for posting this, Paul. I have questions about two different aspects.

In the beginning of your post you suggest that this is "the real thing" and that these systems "could pose an existential risk if scaled up".
I personally, and I believe other members of the community, would like to learn more about your reasoning.
In particular, do you think that GPT-3 specifically could pose an existential risk (for example if it falls into the wrong hands, or scaled up sufficiently)? If so, why, and what is a plausible mechanism by which it poses an x-risk?

On a different matter, what does aligning GPT-3 (or similar systems) mean for you concretely? What would the optimal result of your team's work look like?
(This question assumes that GPT-3 is indeed a "prosaic" AI system, and that we will not gain a fundamental understanding of intelligence by this work.)

Thanks again!

I think that a scaled up version of GPT-3 can be directly applied to problems like "Here's a situation. Here's the desired result. What action will achieve that result?" (E.g. you can already use it to get answers like "What copy will get the user to subscribe to our newsletter?" and we can improve performance by fine-tuning on data about actual customer behavior or by combining GPT-3 with very simple search algorithms.)

I think that if GPT-3 was more powerful then many people would apply it to problems like that. I'm concerned that such systems will then be much better at steering the future than humans are, and that none of these systems will be actually trying to help people get what they want.

A bunch of people have written about this scenario and whether/how it could be risky. I wish that I had better writing to refer people to. Here's a post I wrote last year to try to communicate what I'm concerned about.

Thanks for the response.
I believe this answers the first part, why GPT-3 poses an x-risk specifically.

Did you or anyone else ever write what aligning a system like GPT-3 looks like? I have to admit that it's hard for me to even have a definition of being (intent) aligned for a system GPT-3, which is not really an agent on its own. How do you define or measure something like this?

Quick question - are these positions relevant as remote positions (not in the US)?

(I wrote this comment separately, because I think it will be interesting to a different, and probably smaller, group of people than the other one.)

Hires would need to be able to move to the US.

Hi, quick question, not sure this is the best place for it but curious:
 

Does work to "align GTP-3" include work to identify the most egregious uses for GTP-3 and develop countermeasures?

Cheers

No, I'm talking somewhat narrowly about intent alignment, i.e. ensuring that our AI system is "trying" to do what we want. We are a relatively focused technical team, and a minority of the organization's investment in safety and preparedness.

The policy team works on identifying misuses and developing countermeasures, and the applied team thinks about those issues as they arise today.

Hi Paul, I messaged you privately.