Rohin Shah

4176 karmaJoined May 2015


Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.

I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.

In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.


Oh I see, sorry for misinterpreting you.

So I'm not really seeing anything "bad" here.

I didn't say your proposal was "bad", I said it wasn't "conservative".

My point is just that, if GHD were to reorient around "reliable global capacity growth", it would look very different, to the point where I think your proposal is better described as "stop GHD work, and instead do reliable global capacity growth work", rather than the current framing of "let's reconceptualize the existing bucket of work".

I'll suggest a reconceptualization that may seem radical in theory but is conservative in practice.

It doesn't seem conservative in practice? Like Vasco, I'd be surprised if aiming for reliable global capacity growth would look like the current GHD portfolio. For example:

  1. Given an inability to help everyone, you'd want to target interventions based on people's future ability to contribute. (E.g. you should probably stop any interventions that target people in extreme poverty.)
  2. You'd either want to stop focusing on infant mortality, or start interventions to increase fertility. (Depending on whether population growth is a priority.)
  3. You'd want to invest more in education than would be suggested by typical metrics like QALYs or income doublings.

I'd guess most proponents of GHD would find (1) and (2) particularly bad.

Research Scientist and Research Engineer roles in AI Safety and Alignment at Google DeepMind.

Location: Hybrid (3 days/week in the office) in San Francisco / Mountain View / London.

Application deadline: We don't have a final deadline yet, but will keep the roles open for at least another two weeks (i.e. until March 1, 2024), and likely longer.

For further details, see the roles linked above. You may also find my FAQ useful.

(Fyi, I probably won't engage more here, due to not wanting to spend too much time on this)

Jonas's comment is a high level assessment that is only useful insofar as you trust his judgment.

This is true, but I trust basically any random commenter a non-zero amount (unless their comment itself gives me reasons not to trust them). I agree you can get more trust if you know the person better. But even the amount of trust for "literally a random person I've never heard of" would be enough for the evidence to matter to me.

I'm only saying that I think large updates based on Jonas's statement are a mistake for people who already know Owen was an EA leader in good standing for many years and had many highly placed friends.

SBF was an EA leader in good standing for many years and had many highly placed friends. It's pretty notable to me that there weren't many comments like Jonas's for SBF, while there are for Owen.

In contrast, lyra's comment contains a lot of details I can use to inform my own reasoning.

It seems so noisy to compare karma counts on two different counts. There are all sorts of things we could be failing to miss about why people voted the way they did. Maybe people are voting Jonas's comment up more because they liked that it went more out of its way to acknowledge that the past behavior was bad and that a temporary ban is good.

It seems like a mistake to treat karma as "the community's estimate of the evidence that the comment would provide to a new reader who knows that Owen was a leader in good standing but otherwise doesn't know anything about what's going on". I agree you'll find all sorts of ways that karma counts don't reflect that.

The evidence Jonas provides is equally consistent with “Owen has a flaw he has healed” and “Owen is a skilled manipulator who charms men, and harasses women”.

Surely there are a lot of other hypotheses as well, and Jonas's evidence is relevant to updating on those?

More broadly, I don't think there's any obvious systemic error going on here. Someone who knows the person reasonably well, giving a model for what the causes of the behavior were, that makes predictions about future instances, clearly seems like evidence one should take into account.

(I do agree the comment would be more compelling with more object-level details, but I don't think that makes it a systemic error to be happy with the comment that exists.)

Yeah, I don't think it's accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don't think that it makes sense to cite my review of Human Compatible to support that claim.

The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think this reflects positively on Stuart.

I'll also note that in addition to the "Learning to Interactively Learn and Assist" paper that does CIRL with deep RL which Daniel cited above, I also wrote a paper with several CHAI colleagues that applied deep RL to solve assistance games.

My position is that you can roughly decompose the overall problem into two subproblems: (1) in theory, what should an AI system do? (2) Given a desire for what the AI system should do, how do we make it do that?

The formalization of assistance games is more about (1), saying that AI systems should behave more like assistants than like autonomous agents (basically the point of my paper linked above). These are mostly independent. Since deep learning is an answer to (2) while assistance games are an answer to (1), you can use deep learning to solve assistance games.

I'd also say that the current form factor of ChatGPT, Claude, Bard etc is very assistance-flavored, which seems like a clear success of prediction at least. On the other hand, it seems unlikely that CHAI's work on CIRL had much causal impact on this, so in hindsight it looks less useful to have done this research.

All this being said, I view (2) as the more pressing problem for alignment, and so I spend most of my time on that, which implies not working on assistance games as much any more. So I think it's overall reasonable to take me as mildly against work on assistance games (but not to take me as saying that it is irrelevant to modern deep learning).

Fyi, the list you linked doesn't contain most of what I would consider the "small" orgs in AI, e.g. off the top of my head I'd name ARC, Redwood Research, Conjecture, Ought, FAR AI, Aligned AI, Apart, Apollo, Epoch, Center for AI Safety, Bluedot, Ashgro, AI Safety Support and Orthogonal. (Some of these aren't even that small.) Those are the ones I'd be thinking about if I were to talk about merging orgs.

Maybe the non-AI parts of that list are more comprehensive, but my guess is that it's just missing most of the tiny orgs that OP is talking about (e.g. OP's own org, QURI, isn't on the list).

(EDIT: Tbc I'm really keen on actually doing the exercise of naming concrete examples -- great suggestion!)

:) I'm glad we got to agreement!

(Or at least significantly closer, I'm sure there are still some minor differences.)

On hits-based research: I certainly agree there are other factors to consider in making a funding decision. I'm just saying that you should talk about those directly instead of criticizing the OP for looking at whether their research was good or not.

(In your response to OP you talk about a positive case for the work on simulators, SVD, and sparse coding -- that's the sort of thing that I would want to see, so I'm glad to see that discussion starting.)

On VCs: Your position seems reasonable to me (though so does the OP's position).

On recommendations: Fwiw I also make unconditional recommendations in private. I don't think this is unusual, e.g. I think many people make unconditional recommendations not to go into academia (though I don't).

I don't really buy that the burden of proof should be much higher in public. Reversing the position, do you think the burden of proof should be very high for anyone to publicly recommend working at lab X? If not, what's the difference between a recommendation to work at org X vs an anti-recommendation (i.e. recommendation not to work at org X)? I think the three main considerations I'd point to are:

  1. (Pro-recommendations) It's rare for people to do things (relative to not doing things), so we differentially want recommendations vs anti-recommendations, so that it is easier for orgs to start up and do things.
  2. (Anti-recommendations) There are strong incentives to recommend working at org X (obviously org X itself will do this), but no incentives to make the opposite recommendation (and in fact usually anti-incentives). Similarly I expect that inaccuracies in the case for the not-working recommendation will be pointed out (by org X), whereas inaccuracies in the case for working will not be pointed out. So we differentially want to encourage the opposite recommendations in order to get both sides of the story by lowering our "burden of proof".
  3. (Pro-recommendations) Recommendations have a nice effect of getting people excited and positive about the work done by the community, which can make people more motivated, whereas the same is not true of anti-recommendations.

Overall I think point 2 feels most important, and so I end up thinking that the burden of proof on critiques / anti-recommendations should be lower than the burden of proof on recommendations -- and the burden of proof on recommendations is approximately zero. (E.g. if someone wrote a public post recommending Conjecture without any concrete details of why -- just something along the lines of "it's a great place doing great work" -- I don't think anyone would say that they were using their power irresponsibly.)

I would actually prefer a higher burden of proof on recommendations, but given the status quo if I'm only allowed to affect the burden of proof on anti-recommendations I'd probably want it to go down to ~zero. Certainly I'd want it to be well below the level that this post meets.

Load more