Rohin Shah

3969 karmaJoined May 2015


Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.

I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.

In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.


Yeah, I don't think it's accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don't think that it makes sense to cite my review of Human Compatible to support that claim.

The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think this reflects positively on Stuart.

I'll also note that in addition to the "Learning to Interactively Learn and Assist" paper that does CIRL with deep RL which Daniel cited above, I also wrote a paper with several CHAI colleagues that applied deep RL to solve assistance games.

My position is that you can roughly decompose the overall problem into two subproblems: (1) in theory, what should an AI system do? (2) Given a desire for what the AI system should do, how do we make it do that?

The formalization of assistance games is more about (1), saying that AI systems should behave more like assistants than like autonomous agents (basically the point of my paper linked above). These are mostly independent. Since deep learning is an answer to (2) while assistance games are an answer to (1), you can use deep learning to solve assistance games.

I'd also say that the current form factor of ChatGPT, Claude, Bard etc is very assistance-flavored, which seems like a clear success of prediction at least. On the other hand, it seems unlikely that CHAI's work on CIRL had much causal impact on this, so in hindsight it looks less useful to have done this research.

All this being said, I view (2) as the more pressing problem for alignment, and so I spend most of my time on that, which implies not working on assistance games as much any more. So I think it's overall reasonable to take me as mildly against work on assistance games (but not to take me as saying that it is irrelevant to modern deep learning).

Fyi, the list you linked doesn't contain most of what I would consider the "small" orgs in AI, e.g. off the top of my head I'd name ARC, Redwood Research, Conjecture, Ought, FAR AI, Aligned AI, Apart, Apollo, Epoch, Center for AI Safety, Bluedot, Ashgro, AI Safety Support and Orthogonal. (Some of these aren't even that small.) Those are the ones I'd be thinking about if I were to talk about merging orgs.

Maybe the non-AI parts of that list are more comprehensive, but my guess is that it's just missing most of the tiny orgs that OP is talking about (e.g. OP's own org, QURI, isn't on the list).

(EDIT: Tbc I'm really keen on actually doing the exercise of naming concrete examples -- great suggestion!)

:) I'm glad we got to agreement!

(Or at least significantly closer, I'm sure there are still some minor differences.)

On hits-based research: I certainly agree there are other factors to consider in making a funding decision. I'm just saying that you should talk about those directly instead of criticizing the OP for looking at whether their research was good or not.

(In your response to OP you talk about a positive case for the work on simulators, SVD, and sparse coding -- that's the sort of thing that I would want to see, so I'm glad to see that discussion starting.)

On VCs: Your position seems reasonable to me (though so does the OP's position).

On recommendations: Fwiw I also make unconditional recommendations in private. I don't think this is unusual, e.g. I think many people make unconditional recommendations not to go into academia (though I don't).

I don't really buy that the burden of proof should be much higher in public. Reversing the position, do you think the burden of proof should be very high for anyone to publicly recommend working at lab X? If not, what's the difference between a recommendation to work at org X vs an anti-recommendation (i.e. recommendation not to work at org X)? I think the three main considerations I'd point to are:

  1. (Pro-recommendations) It's rare for people to do things (relative to not doing things), so we differentially want recommendations vs anti-recommendations, so that it is easier for orgs to start up and do things.
  2. (Anti-recommendations) There are strong incentives to recommend working at org X (obviously org X itself will do this), but no incentives to make the opposite recommendation (and in fact usually anti-incentives). Similarly I expect that inaccuracies in the case for the not-working recommendation will be pointed out (by org X), whereas inaccuracies in the case for working will not be pointed out. So we differentially want to encourage the opposite recommendations in order to get both sides of the story by lowering our "burden of proof".
  3. (Pro-recommendations) Recommendations have a nice effect of getting people excited and positive about the work done by the community, which can make people more motivated, whereas the same is not true of anti-recommendations.

Overall I think point 2 feels most important, and so I end up thinking that the burden of proof on critiques / anti-recommendations should be lower than the burden of proof on recommendations -- and the burden of proof on recommendations is approximately zero. (E.g. if someone wrote a public post recommending Conjecture without any concrete details of why -- just something along the lines of "it's a great place doing great work" -- I don't think anyone would say that they were using their power irresponsibly.)

I would actually prefer a higher burden of proof on recommendations, but given the status quo if I'm only allowed to affect the burden of proof on anti-recommendations I'd probably want it to go down to ~zero. Certainly I'd want it to be well below the level that this post meets.

I'm not very compelled by this response.

It seems to me you have two points on the content of this critique. The first point:

I think it's bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don't find a lot until you hit.

I'm pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursuing a hits-based approach to research, should EA funders be obligated to fund them?

Presumably you would want to say "the team will be good at hits-based research such that we can expect a future hit, for X, Y and Z reasons". I think you should actually say those X, Y and Z reasons so that the authors of the critique can engage with them; I assume that the authors are implicitly endorsing a claim like "there aren't any particularly strong reasons to expect Conjecture to do more impactful work in the future".

The second point:

Your statements about the VCs seem unjustified to me. How do you know they are not aligned? [...] I haven't talked to the VCs either, but I've at least asked people who work(ed) at Conjecture.

Hmm, it seems extremely reasonable to me to take as a baseline prior that the VCs are profit-motivated, and the authors explicitly say

We have heard credible complaints of this from their interactions with funders. One experienced technical AI safety researcher recalled Connor saying that he will tell investors that they are very interested in making products, whereas the predominant focus of the company is on AI safety.

The fact that people who work(ed) at Conjecture say otherwise means that (probably) someone is wrong, but I don't see a strong reason to believe that it's the OP who is wrong.

At the meta level you say:

I do not understand where the confidence with which you write the post (or at least how I read it) comes from.

And in your next comment:

I think we should really make sure that we say true things when we criticize people, quantify our uncertainty, differentiate between facts and feelings and do not throw our epistemics out of the window in the process

But afaict, the only point where you actually disagree with a claim made in the OP (excluding recommendations) is in your assessment of VCs? (And in that case I feel very uncompelled by your argument.)

In what way has the OP failed to say true things? Where should they have had more uncertainty? What things did they present as facts which were actually feelings? What claim have they been confident about that they shouldn't have been confident about?

(Perhaps you mean to say that the recommendations are overconfident. There I think I just disagree with you about the bar for evidence for making recommendations, including ones as strong as "alignment researchers shouldn't work at organization X". I've given recommendations like this to individual people who asked me for a recommendation in the past, on less evidence than collected in this post.)

Wait, you think the reason we can't do brain improvement is because we can't change the weights of individual neurons?

That seems wrong to me. I think it's because we don't know how the neurons work.

Did you read the link to Cold Takes above? If so, where do you disagree with it?

(I agree that we'd be able to do even better if we knew how the neurons work.)

Similarly I'd be surprised if you thought that beings as intelligent as humans could recursively improve NNs. Cos currently we can't do that, right?

Humans can improve NNs? That's what AI capabilities research is?

(It's not "recursive" improvement but I assume you don't care about the "recursive" part here.)

I think it's within the power of beings equally as intelligent as us (similarly as mentioned above I think recursive improvement in humans would accelerate if we had similar abilities).

I thought yes, but I'm a bit unhappy about that assumption (I forgot it was there). If you go by the intended spirit of the assumption (see the footnote) I'm probably on board, but it seems ripe for misinterpretation ("well if you had just deployed GPT-5 it really could have run an automated company, even though in practice we didn't do that because we were worried about safety and/or legal liability and/or we didn't know how to prompt it etc").

You could look at these older conversations. There's also Where I agree and disagree with Eliezer (see also my comment) though I suspect that won't be what you're looking for.

Mostly though I think you aren't going to get what you're looking for because it's a complicated question that doesn't have a simple answer.

(I think this regardless of whether you frame the question as "do we die?" or "do we live?", if you think the case for doom is straightforward I think you are mistaken. All the doom arguments I know of seem to me like they establish plausibility, not near-certainty, though I'm not going to defend that here.)

Would you be willing to put this in numerical form (% chance) as a rough expectation?

Idk, I don't really want to make claims about GPT-5 / GPT-6, since that depends on OpenAI's naming decisions. But I'm at < 5% (probably < 1%, but I'd want to think about it) on "the world will be transformed" (in the TAI sense) within the next 3 years.

First off, let me say that I'm not accusing you specifically of "hype", except inasmuch as I'm saying that for any AI-risk-worrier who has ever argued for shorter timelines (a class which includes me), if you know nothing else about that person, there's a decent chance their claims are partly "hype". Let me also say that I don't believe you are deliberately benefiting yourself at others' expense.

That being said, accusations of "hype" usually mean an expectation that the claims are overstated due to bias. I don't really see why it matters if the bias is survival motivated vs finance motivated vs status motivated. The point is that there is bias and so as an observer you should discount the claims somewhat (which is exactly how it was used in the original comment).

what do you make of Connor Leahy's take that LLMs are basically "general cognition engines" and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid "System 2" type thinking, which are freely being offered by the AutoGPT crowd)?

Could happen, probably won't, though it depends what is meant by "a generation or two", and what is meant by "full AGI" (I'm thinking of a bar like transformative AI).

(I haven't listened to the podcast but have thought about this idea before. I do agree it's good to think of LLMs as general cognition engines, and that plugins / other similar approaches will be a big deal.)

Load more