Hide table of contents

Comment Permalink

Thank you for posting this, Paul. I have questions about two different aspects.

In the beginning of your post you suggest that this is "the real thing" and that these systems "could pose an existential risk if scaled up".
I personally, and I believe other members of the community, would like to learn more about your reasoning.
In particular, do you think that GPT-3 specifically could pose an existential risk (for example if it falls into the wrong hands, or scaled up sufficiently)? If so, why, and what is a plausible mechanism by which it poses an x-risk?

On a different matter, what does aligning GPT-3 (or similar systems) mean for you concretely? What would the optimal result of your team's work look like?
(This question assumes that GPT-3 is indeed a "prosaic" AI system, and that we will not gain a fundamental understanding of intelligence by this work.)

Thanks again!

Paul_Christiano

I think that a scaled up version of GPT-3 can be directly applied to problems like "Here's a situation. Here's the desired result. What action will achieve that result?" (E.g. you can already use it to get answers like "What copy will get the user to subscribe to our newsletter?" and we can improve performance by fine-tuning on data about actual customer behavior or by combining GPT-3 with very simple search algorithms.) I think that if GPT-3 was more powerful then many people would apply it to problems like that. I'm concerned that such systems will then be much better at steering the future than humans are, and that none of these systems will be actually trying to help people get what they want. A bunch of people have written about this scenario and whether/how it could be risky. I wish that I had better writing to refer people to. Here's a post I wrote last year to try to communicate what I'm concerned about.

ShayBenMoshe4y8

Thanks for the response.
I believe this answers the first part, why GPT-3 poses an x-risk specifically.

Did you or anyone else ever write what aligning a system like GPT-3 looks like? I have to admit that it's hard for me to even have a definition of being (intent) aligned for a system GPT-3, which is not really an agent on its own. How do you define or measure something like this?

See in context

Hiring engineers and researchers to help align GPT-3

by Paul_Christiano

Oct 1 20203 min read 19

107

AI safetyCareer choiceCause prioritizationExistential riskGlobal health & developmentLongtermismTheory of change

Frontpage

Hiring engineers and researchers to help align GPT-3

What the team does

What you’d do

19 comments

My team at OpenAI, which works on aligning GPT-3, is hiring ML engineers and researchers. Apply here for the ML engineer role and here for the ML researcher role.

GPT-3 is similar enough to "prosaic" AGI that we can work on key alignment problems without relying on conjecture or speculative analogies. And because GPT-3 is already being deployed in the OpenAI API, its misalignment matters to OpenAI’s bottom line — it would be much better if we had an API that was trying to help the user instead of trying to predict the next word of text from the internet.

I think this puts our team in a great place to have an impact:

If our research succeeds I think it will directly reduce existential risk from AI. This is not meant to be a warm-up problem, I think it’s the real thing.
We are working with state of the art systems that could pose an existential risk if scaled up, and our team’s success actually matters to the people deploying those systems.
We are working on the whole pipeline from “interesting idea” to “production-ready system,” building critical skills and getting empirical feedback on whether our ideas actually work.

We have the real-world problems to motivate alignment research, the financial support to hire more people, and a research vision to execute on. We are bottlenecked by excellent researchers and engineers who are excited to work on alignment.

What the team does

In the past Reflection focused on fine-tuning GPT-3 using a reward function learned from human feedback. Our most recent results are here, and had the unusual virtue of simultaneously being exciting enough to ML researchers to be accepted at NeurIPS while being described by Eliezer as “directly, straight-up relevant to real alignment problems.”

We’re currently working on three things:

[20%] Applying basic alignment approaches to the API, aiming to close the gap between theory and practice.
[60%] Extending existing approaches to tasks that are too hard for humans to evaluate; in particular, we are training models that summarize more text than human trainers have time to read. Our approach is to use weaker ML systems operating over shorter contexts to help oversee stronger ones over longer contexts. This is conceptually straightforward but still poses significant engineering and ML challenges.
[20%] Conceptual research on domains that no one knows how to oversee and empirical work on debates between humans (see our 2019 writeup). I think the biggest open problem is figuring out how and if human overseers can leverage “knowledge” the model acquired during training (see an example here).

If successful, ideas will eventually move up this list, from the conceptual stage to ML prototypes to real deployments. We’re viewing this as practice for integrating alignment into transformative AI deployed by OpenAI or another organization.

What you’d do

Most people on the team do a subset of these core tasks:

Design+build+maintain code for experimenting with novel training strategies for large language models. This infrastructure needs to support a diversity of experimental changes that are hard to anticipate in advance, work as a solid base to build on for 6-12 months, and handle the complexity of working with large language models. Most of our code is maintained by 1-3 people and consumed by 2-4 people (all on the team).
Oversee ML training. Evaluate how well models are learning, figure out why they are learning badly, and identify+prioritize+implement changes to make them learn better. Tune hyperparameters and manage computing resources. Process datasets for machine consumption; understand datasets and how they affect the model’s behavior.
Design and conduct experiments to answer questions about our models or our training strategies.
Design+build+maintain code for delegating work to ~70 people who provide input to training. We automate workflows like sampling text from books, getting multiple workers’ answers to questions about that text, running a language model on those answers, then showing the results to someone else for evaluation. It also involves monitoring worker throughput and quality, automating decisions about what tasks to delegate to whom, and making it easy to add new work or change what people are working on.
Participate in high-level discussion about what the team should be working on, and help brainstorm and prioritize projects and approaches. Complicated projects seem to go more smoothly if everyone understands why they are doing what they are doing, is on the lookout for things that might slip through the cracks, is thinking about the big picture and helping prioritize, and cares about the success of the whole project.

If you are excited about this work, apply here for the ML engineer role and here for the ML researcher role.

Effective Altruism Forum
EA Forum

Hiring engineers and researchers to help align GPT-3

107

What the team does

What you’d do

107

Reactions

More posts like this