Hide table of contents

My team at OpenAI, which works on aligning GPT-3, is hiring ML engineers and researchers. Apply here for the ML engineer role and here for the ML researcher role.

GPT-3 is similar enough to "prosaic" AGI that we can work on key alignment problems without relying on conjecture or speculative analogies. And because GPT-3 is already being deployed in the OpenAI API, its misalignment matters to OpenAI’s bottom line — it would be much better if we had an API that was trying to help the user instead of trying to predict the next word of text from the internet.

I think this puts our team in a great place to have an impact:

  • If our research succeeds I think it will directly reduce existential risk from AI. This is not meant to be a warm-up problem, I think it’s the real thing.
  • We are working with state of the art systems that could pose an existential risk if scaled up, and our team’s success actually matters to the people deploying those systems.
  • We are working on the whole pipeline from “interesting idea” to “production-ready system,” building critical skills and getting empirical feedback on whether our ideas actually work.

We have the real-world problems to motivate alignment research, the financial support to hire more people, and a research vision to execute on. We are bottlenecked by excellent researchers and engineers who are excited to work on alignment.

What the team does

In the past Reflection focused on fine-tuning GPT-3 using a reward function learned from human feedback. Our most recent results are here, and had the unusual virtue of simultaneously being exciting enough to ML researchers to be accepted at NeurIPS while being described by Eliezer as “directly, straight-up relevant to real alignment problems.”

We’re currently working on three things:

  • [20%] Applying basic alignment approaches to the API, aiming to close the gap between theory and practice.
  • [60%] Extending existing approaches to tasks that are too hard for humans to evaluate; in particular, we are training models that summarize more text than human trainers have time to read. Our approach is to use weaker ML systems operating over shorter contexts to help oversee stronger ones over longer contexts. This is conceptually straightforward but still poses significant engineering and ML challenges.
  • [20%] Conceptual research on domains that no one knows how to oversee and empirical work on debates between humans (see our 2019 writeup). I think the biggest open problem is figuring out how and if human overseers can leverage “knowledge” the model acquired during training (see an example here).

If successful, ideas will eventually move up this list, from the conceptual stage to ML prototypes to real deployments. We’re viewing this as practice for integrating alignment into transformative AI deployed by OpenAI or another organization.

What you’d do

Most people on the team do a subset of these core tasks:

  • Design+build+maintain code for experimenting with novel training strategies for large language models. This infrastructure needs to support a diversity of experimental changes that are hard to anticipate in advance, work as a solid base to build on for 6-12 months, and handle the complexity of working with large language models. Most of our code is maintained by 1-3 people and consumed by 2-4 people (all on the team).
  • Oversee ML training. Evaluate how well models are learning, figure out why they are learning badly, and identify+prioritize+implement changes to make them learn better. Tune hyperparameters and manage computing resources. Process datasets for machine consumption; understand datasets and how they affect the model’s behavior.
  • Design and conduct experiments to answer questions about our models or our training strategies.
  • Design+build+maintain code for delegating work to ~70 people who provide input to training. We automate workflows like sampling text from books, getting multiple workers’ answers to questions about that text, running a language model on those answers, then showing the results to someone else for evaluation. It also involves monitoring worker throughput and quality, automating decisions about what tasks to delegate to whom, and making it easy to add new work or change what people are working on.
  • Participate in high-level discussion about what the team should be working on, and help brainstorm and prioritize projects and approaches. Complicated projects seem to go more smoothly if everyone understands why they are doing what they are doing, is on the lookout for things that might slip through the cracks, is thinking about the big picture and helping prioritize, and cares about the success of the whole project.

If you are excited about this work, apply here for the ML engineer role and here for the ML researcher role.

107

0
0

Reactions

0
0

More posts like this

Comments19


Sorted by Click to highlight new comments since:
[anonymous]27
0
0

Very interesting role, but my understanding was that job posts were not meant to be posted on the forum

My process was to check the "About the forum" link on the left hand side, see that there was a section on "What we discourage" that made no mention of hiring, then search for a few job ads posted on the forum and check that no disapproval was expressed in the comments of those posts.

Aaron Gertler 🔸
Moderator Comment6
0
0

That's not my understanding. As the lead moderator, here's what I've told people who ask about job posts:

If we start to have a lot of them such that they're getting in the way of more discussion-ready content, I'd want to keep them off the frontpage. Right now, we only get them very occasionally, and I'm generally happy to have them be more visible...

...especially if it's a post like this one which naturally leads to a bunch of discussion of an org's actual work. (If the job were something like "we need a copyeditor to work on grant reports," it's less likely that good discussion follows, and I'd again consider sorting the content differently.)

If I said something at some point that gave you a different impression of our policy here, my apologies!

[anonymous]12
0
0

Before the revamp of the forum, I was asked to take down job ads, but maybe things have changed since then. I personally don't think it would be good for the forum to become a jobs board, since the community already has several places to post jobs.

Yeah. Well, not that they cannot be posted, but that they will not be frontpaged by the mods, and instead kept in the personal blog / community section, which has less visibility.

Added: As it currently says on the About page:

Community posts

Posts that focus on the EA community itself are given a "community" tag. By default, these posts will be hidden from the list of posts on the Forum's front page. You can change how these posts are displayed by using...

[anonymous]9
0
0

Ok, the post is still labelled as 'front page' in that case, which seems like it should be changed

To clarify, Halstead, "Community" is now a tag, not a category on the level of "Frontpage". Posts tagged "Community" will still either be "Frontpage" or "Personal Blog".

This comment is a bit out of date (though I think it was made before I made this edit). The current language is:

Posts that focus on the EA community itself are given a "community" tag. By default, these posts will have a weighting of "-25" on the Forum's front page (see below), appearing only if they have a lot of upvotes. 

We don't hide all "community" posts by default, but they will generally be less prominent on the front page unless a user changes the weighting themselves.

Thank you for posting this, Paul. I have questions about two different aspects.

In the beginning of your post you suggest that this is "the real thing" and that these systems "could pose an existential risk if scaled up".
I personally, and I believe other members of the community, would like to learn more about your reasoning.
In particular, do you think that GPT-3 specifically could pose an existential risk (for example if it falls into the wrong hands, or scaled up sufficiently)? If so, why, and what is a plausible mechanism by which it poses an x-risk?

On a different matter, what does aligning GPT-3 (or similar systems) mean for you concretely? What would the optimal result of your team's work look like?
(This question assumes that GPT-3 is indeed a "prosaic" AI system, and that we will not gain a fundamental understanding of intelligence by this work.)

Thanks again!

I think that a scaled up version of GPT-3 can be directly applied to problems like "Here's a situation. Here's the desired result. What action will achieve that result?" (E.g. you can already use it to get answers like "What copy will get the user to subscribe to our newsletter?" and we can improve performance by fine-tuning on data about actual customer behavior or by combining GPT-3 with very simple search algorithms.)

I think that if GPT-3 was more powerful then many people would apply it to problems like that. I'm concerned that such systems will then be much better at steering the future than humans are, and that none of these systems will be actually trying to help people get what they want.

A bunch of people have written about this scenario and whether/how it could be risky. I wish that I had better writing to refer people to. Here's a post I wrote last year to try to communicate what I'm concerned about.

Thanks for the response.
I believe this answers the first part, why GPT-3 poses an x-risk specifically.

Did you or anyone else ever write what aligning a system like GPT-3 looks like? I have to admit that it's hard for me to even have a definition of being (intent) aligned for a system GPT-3, which is not really an agent on its own. How do you define or measure something like this?

Quick question - are these positions relevant as remote positions (not in the US)?

(I wrote this comment separately, because I think it will be interesting to a different, and probably smaller, group of people than the other one.)

Hires would need to be able to move to the US.

Hi, quick question, not sure this is the best place for it but curious:
 

Does work to "align GTP-3" include work to identify the most egregious uses for GTP-3 and develop countermeasures?

Cheers

No, I'm talking somewhat narrowly about intent alignment, i.e. ensuring that our AI system is "trying" to do what we want. We are a relatively focused technical team, and a minority of the organization's investment in safety and preparedness.

The policy team works on identifying misuses and developing countermeasures, and the applied team thinks about those issues as they arise today.

Hi Paul, I messaged you privately.

Curated and popular this week
abrahamrowe
 ·  · 9m read
 · 
This is a Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked.  Commenting and feedback guidelines:  I'm posting this to get it out there. I'd love to see comments that take the ideas forward, but criticism of my argument won't be as useful at this time, in part because I won't do any further work on it. This is a post I drafted in November 2023, then updated for an hour in March 2025. I don’t think I’ll ever finish it so I am just leaving it in this draft form for draft amnesty week (I know I'm late). I don’t think it is particularly well calibrated, but mainly just makes a bunch of points that I haven’t seen assembled elsewhere. Please take it as extremely low-confidence and there being a low-likelihood of this post describing these dynamics perfectly. I’ve worked at both EA charities and non-EA charities, and the EA funding landscape is unlike any other I’ve ever been in. This can be good — funders are often willing to take high-risk, high-reward bets on projects that might otherwise never get funded, and the amount of friction for getting funding is significantly lower. But, there is an orientation toward funders (and in particular staff at some major funders), that seems extremely unusual for charitable communities: a high degree of deference to their opinions. As a reference, most other charitable communities I’ve worked in have viewed funders in a much more mixed light. Engaging with them is necessary, yes, but usually funders (including large, thoughtful foundations like Open Philanthropy) are viewed as… an unaligned third party who is instrumentally useful to your organization, but whose opinions on your work should hold relatively little or no weight, given that they are a non-expert on the direct work, and often have bad ideas about how to do what you are doing. I think there are many good reasons to take funders’ perspectives seriously, and I mostly won’t cover these here. But, to
Jim Chapman
 ·  · 12m read
 · 
By Jim Chapman, Linkedin. TL;DR: In 2023, I was a 57-year-old urban planning consultant and non-profit professional with 30 years of leadership experience. After talking with my son about rationality, effective altruism, and AI risks, I decided to pursue a pivot to existential risk reduction work. The last time I had to apply for a job was in 1994. By the end of 2024, I had spent ~740 hours on courses, conferences, meetings with ~140 people, and 21 job applications. I hope that by sharing my experiences, you can gain practical insights, inspiration, and resources to navigate your career transition, especially for those who are later in their career and interested in making an impact in similar fields. I share my experience in 5 sections - sparks, take stock, start, do, meta-learnings, and next steps. [Note - as of 03/05/2025, I am still pursuing my career shift.] Sparks – 2022 During a Saturday bike ride, I admitted to my son, “No, I haven’t heard of effective altruism.” On another ride, I told him, “I'm glad you’re attending the EAGx Berkely conference." Some other time, I said, "Harry Potter and Methods of Rationality sounds interesting. I'll check it out." While playing table tennis, I asked, "What do you mean ChatGPT can't do math? No calculator? Next token prediction?" Around tax-filing time, I responded, "You really think retirement planning is out the window? That only 1 of 2 artificial intelligence futures occurs – humans flourish in a post-scarcity world or humans lose?" These conversations intrigued and concerned me. After many more conversations about rationality, EA, AI risks, and being ready for something new and more impactful, I decided to pivot my career to address my growing concerns about existential risk, particularly AI-related. I am very grateful for those conversations because without them, I am highly confident I would not have spent the last year+ doing that. Take Stock - 2023 I am very concerned about existential risk cause areas in ge
 ·  · 3m read
 · 
Written anonymously because I work in a field where there is a currently low but non-negligible and possibly high future risk of negative consequences for criticizing Trump and Trumpism. This post is an attempt to cobble together some ideas about the current situation in the United States and its impact on EA. I invite discussion on this, not only from Americans, but also those with advocacy experience in countries that are not fully liberal democracies (especially those countries where state capacity is substantial and autocratic repression occurs).  I've deleted a lot of text from this post in various drafts because I find myself getting way too in the weeds discoursing on comparative authoritarian studies, disinformation and misinformation (this is a great intro, though already somewhat outdated), and the dangers of the GOP.[1] I will note that I worry there is still a tendency to view the administration as chaotic and clumsy but retaining some degree of good faith, which strikes me as quite naive.  For the sake of brevity and focus, I will take these two things to be true, and try to hypothesize what they mean for EA. I'm not going to pretend these are ironclad truths, but I'm fairly confident in them.[2]  1. Under Donald Trump, the Republican Party (GOP) is no longer substantially committed to democracy and the rule of law. 1. The GOP will almost certainly continue to engage in measures that test the limits of constitutional rule as long as Trump is alive, and likely after he dies. 2. The Democratic Party will remain constrained by institutional and coalition factors that prevent it from behaving like the GOP. That is, absent overwhelming electoral victories in 2024 and 2026 (and beyond), the Democrats' comparatively greater commitment to rule of law and democracy will prevent systematic purging of the GOP elites responsible for democratic backsliding; while we have not crossed the Rubicon yet, it will get much worse before things get better. 2. T