Rohin Shah

4251 karmaJoined May 2015

Bio

Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.

I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.

In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.

http://rohinshah.com

Posts
11

Sorted by New

Rohin Shah's Quick takes

Rohin Shah

· 4y ago · 1m read

Person-affecting intuitions can often be money pumped

Rohin Shah

· 3y ago · 4m read

102

DeepMind is hiring for the Scalable Alignment and Alignment Teams

Rohin Shah

· 3y ago · 11m read

Shah and Yudkowsky on alignment failures

EliezerYudkowsky

· 3y ago · 110m read

Conversation on technology forecasting and gradualism

RobBensinger

· 3y ago · 37m read

[AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin Shah

· 5y ago · 12m read

Summary of Stuart Russell's new book, "Human Compatible"

Rohin Shah

· 5y ago · 18m read

Alignment Newsletter One Year Retrospective

Rohin Shah

· 6y ago · 25m read

Thoughts on the "Meta Trap"

Rohin Shah

· 8y ago · 13m read

EA Berkeley Spring 2016 Retrospective

Rohin Shah

· 9y ago · 2m read

Comments
463

Habryka [Deactivated]'s Quick takes

Rohin Shah1mo17

Of course, it's true that they could ignore serious criticism is they wanted to, but my sense is that people actually quite often feel unable to ignore criticism.

As someone sympathetic to many of Habryka's positions, while also disagreeing with many of Habryka's positions, my immediate reaction to this was "well that seems like a bad thing", c.f.

shallow criticism often gets valorized

I'd feel differently if you had said "people feel obliged to take criticism seriously if it points at a real problem" or something like that, but I agree with you that the mechanism is more like "people are unable to ignore criticism irrespective of its quality" (the popularity of the criticism matters, but sadly that is only weakly correlated with quality).

Notes on risk compensation

Rohin Shah3mo2

Tbc if the preferences are written in words like "expected value of the lightcone" I agree it would be relatively easy to tell which was which, mainly by identifying community shibboleths. My claim is that if you just have the input/output mapping of (safety level of AI, capabilities level of AI) --> utility, then it would be challenging. Even longtermists should be willing to accept some risk, just because AI can help with other existential risks (and of course many safety researchers -- probably the majority at this point -- are not longtermists).

Notes on risk compensation

Rohin Shah3mo4

What you call the "lab's" utility function isn't really specific to the lab; it could just as well apply to safety researchers. One might assume that the parameters would be set in such a way as to make the lab more C-seeking (e.g. it takes less C to produce 1 util for the lab than for everyone else).

But at least in the case of AI safety, I don't think this is the case. I doubt I could easily distinguish a lab capabilities researcher (or lab leadership, or some "aggregate lab utility function") from an external safety researcher if you just gave me their utility functions over C and S. (AI safety has significant overlap with transhumanism; relative to the rest of humanity they are way more likely to think there are huge benefits to development of safe AGI.) In practice it seems like the issue is more like epistemic disagreement.

You could still recover many of the conclusions in this post by positing that an increase to S leads to a proportional decrease in probability of non-survival, and the proportion is the same between the lab and everyone else, but the absolute numbers aren't. I'd still feel like this was a poor model of the real situation though.

EA "Worldviews" Need Rethinking

Rohin Shah6mo6

I agree reductions in infant mortality likely have better long-run effects on capacity growth than equivalent levels of population growth while keeping infant mortality rates constant, which could mean that you still want to focus on infant mortality while not prioritizing increasing fertility.

I would just be surprised if the decision from the global capacity growth perspective ended up being "continue putting tons of resources into reducing infant mortality, but not much into increasing fertility" (which I understand to be the status quo for GHD), because:

Probably the dominant consideration for importance is how good / bad it is to grow the population, and it is unlikely that the differential effects from reducing infant mortality vs increasing fertility end up changing the decision
Probably it is easier / cheaper to increase fertility than to reduce infant mortality, because very little effort has been put into increasing fertility (to my knowledge)

That said, it's been many years since I closely followed the GHD space, and I could easily be wrong about a lot of this.

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Rohin Shah9mo9

?? It's the second bullet point in the cons list, and reemphasized in the third bullet?

If you're saying "obviously this is the key determinant of whether you should work at a leading AI company so there shouldn't even be a pros / cons table", then obviously 80K disagrees given they recommend some such roles (and many other people (e.g. me) also disagree so this isn't 80K ignoring expert consensus). In that case I think you should try to convince 80K on the object level rather than applying political pressure.

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Rohin Shah9mo12

... That paragraph doesn't distinguish at all between OpenAI and, say, Anthropic. Surely you want to include some details specific to the OpenAI situation? (Or do your object-level views really not distinguish between them?)

Updates on the EA catastrophic risk landscape

Rohin Shah1y7

There’s currently very little work going into issues that arise even if AI is aligned, including the deployment problem

The deployment problem (as described in that link) is a non-problem if you know that AI is aligned.

Rohin Shah

Bio

Posts 11

Comments463

Posts
11

Comments
463