A conversation with Rohin Shah

AI Impacts

A conversation with Rohin Shah

AI Impacts

40 min readNov 12, 2019

Comments 8

Sorted by

New & upvoted

Pablo

I can’t really imagine what would cause me to believe that AI systems will actually do a treacherous turn without ever trying to deceive us before that. But there might be something there. I don’t really know what evidence would move me, any sort of plausible evidence I could see that would move me in that direction.

Perhaps this?

Rohin Shah

In that sentence I meant "a treacherous turn that leads to an existential catastrophe", so I don't think the example you link updates me strongly on that.

While Luke talks about that scenario as an example of a treacherous turn, you could equally well talk about it as an example of "deception", since the evolved creatures are "artificially" reducing their rates of reproduction to give the supervisor / algorithm a "false belief" that they are bad at reproducing. Another example along these lines is when a robot hand "deceives" its human overseer into thinking that it has grasped a ball, when it is in fact in front of the ball.

I think really though these examples aren't that informative because it doesn't seem reasonable to say that the AI system is "trying" to do something in these examples, or that it does some things "deliberately". These behaviors were learned through trial and error. An existential catastrophe style treacherous turn would presumably not happen through trial and error. (Even if it did, it seems like there must have been at least some cases where it tried and failed to take over the world, which seems like a clear and obvious warning shot, that we for some reason completely ignored.)

(If it isn't clear, the thing that I care about is something like "will there be some 'warning shot' that greatly increases the level of concern people have about AI systems, before it is too late".)

Pablo

That makes sense. Thanks for the comment!

cole_haus

Shah thinks there are several things that could change his beliefs, including:

If he learned that evolution actually baked a lot into humans (‘nativism’), he would lengthen the amount of time he thinks there will be before AGI.

Tooby and Cosmides are big advocates for the "massive modularity" view--a huge amount of human cognition takes place in specialized, task-tailored modules rather than on one big, domain-general "computer". Common examples of these sorts of modules are:

Chomsky's universal grammar: There's not enough language data for children to learn languages in the absence of inductive biases.
Social exchange: People perform much better at the Wason selection task when the domain is social exchange rather than fully abstract.

Unfortunately, I don't know of any review collecting and examining evidence for the massive modularity view.

(Not sure how much of this Shah already knows.)

Rohin Shah

(Not sure how much of this Shah already knows.)

Not much, sadly. I don't actually intend to learn about it in the near future, because I don't think timelines are particularly decision-relevant to me (though they are to others, especially funders). Thanks for the links!

Tooby and Cosmides are big advocates for the "massive modularity" view--a huge amount of human cognition takes place in specialized, task-tailored modules rather than on one big, domain-general "computer".

On my view, babies would learn a huge amount about the structure of the world simply by interacting with it (pushing over an object can in principle teach you a lot about objects, causality, intuitive physics, etc), and this leads to general patterns that we later call "inductive biases" for more complex tasks. For example, hierarchy is a very useful way to understand basically any environment we are ever in; perhaps babies develop a sense of "hierarchy" which then gets applied to language, explaining how children learn languages so fast.

From the Wikipedia page you linked, challenges to a "rationality" based view:

1. Evolutionary theories using the idea of numerous domain-specific adaptions have produced testable predictions that have been empirically confirmed; the theory of domain-general rational thought has produced no such predictions or confirmations.

I wish they said what these predictions were. I'm not going to chase down this reference.

2. The rapidity of responses such as jealousy due to infidelity indicates a domain-specific dedicated module rather than a general, deliberate, rational calculation of consequences.

This is a good point; in general emotions are probably not learned, for the most part. I'm not sure what's going on there.

3. Reactions may occur instinctively (consistent with innate knowledge) even if a person has not learned such knowledge.

I agree that reflexes are "built-in" and not learned; reflexes are also pretty different from e.g. language. Obviously not everything our bodies do is "learned", reflexes, breathing, digestion, etc. all fall into the "built-in" category. I don't think this says much about what leads humans to be good at chess, language, plumbing, soccer, gardening, etc, which is what I'm more interested in.

It seems likely to me that you might need the equivalent of reflexes, breathing, digestion, etc. if you want to design a fully autonomous agent that learns without any human support whatsoever, but we will probably instead design an agent that (initially) depends on us to keep the electricity flowing, to fix any wiring issues, to keep up the Internet connection, etc. (In contrast, human parents can't ensure that the child keeps breathing, so you need an automatic, built-in system for that.)

Rohin Shah

perhaps babies develop a sense of "hierarchy" which then gets applied to language, explaining how children learn languages so fast.

Though if we are to believe this paper at face value (I haven't evaluated it), babies start learning in the womb. (The paper claims that the biases depend on which language is spoken around the pregnant mother, which suggests that it must be learned, rather than being "built-in".)

Lukas_Gloor

Chomsky's universal grammar: There's not enough language data for children to learn languages in the absence of inductive biases.

I think there's more recent work in computational linguistics that challenges this. Unfortunately I can't summarize it since I only took an overview course a long time ago. I've been wondering whether I should read up on language evolution at some point. Mostly because it seems really interesting, but also because it's a field I haven't seen being discussed in EA circles, and it seems potentially useful to have this background when it comes to evaluating/interpreting AI milestones and so on. In any case, if someone understands computational linguistics, language evolution and how it relates to the nativism debate, I'd be extremely interested in a summary!

richard_ngo

For reference, here's the post on realism about rationality that Rohin mentioned several times.

Comments

More from the author

Discontinuous progress in history: an update

AI Impacts·6y ago·29m read

Primates vs birds: Is one brain architecture better than the other?

AI Impacts·7y ago·4m read

Atari early

AI Impacts·6y ago·6m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 1d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

166

The first video from Giving What We Can's new channel is out now!

JustinPortela·3d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·5d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·1d ago·1m read

Seeking feedback and collaborators for an AI welfare project

Juliana Grant·17h ago·2m read

PauseCon London '26: Applications now open

Jonathan@PauseAI·16h ago·1m read

Rohin Shah

(Not sure how much of this Shah already knows.)

Tooby and Cosmides are big advocates for the "massive modularity" view--a huge amount of human cognition takes place in specialized, task-tailored modules rather than on one big, domain-general "computer".

From the Wikipedia page you linked, challenges to a "rationality" based view:

1. Evolutionary theories using the idea of numerous domain-specific adaptions have produced testable predictions that have been empirically confirmed; the theory of domain-general rational thought has produced no such predictions or confirmations.

I wish they said what these predictions were. I'm not going to chase down this reference.

2. The rapidity of responses such as jealousy due to infidelity indicates a domain-specific dedicated module rather than a general, deliberate, rational calculation of consequences.

This is a good point; in general emotions are probably not learned, for the most part. I'm not sure what's going on there.

3. Reactions may occur instinctively (consistent with innate knowledge) even if a person has not learned such knowledge.

A conversation with Rohin Shah

A conversation with Rohin Shah

Participants

Summary

Transcript