This is a linkpost for https://theinsideview.ai/blake

(crossposted from LW)

I have recently interviewed Blake Richards, an Assistant Professor in the Montreal Neurological Institute and the School of Computer Science at McGill University and a Core Faculty Member at MiLA. Below you will find some quotes summarizing his takes on AGI.

Blake is not really concerned about existential risk from AI. Like Yann LeCun, he finds that AGI is not a coherent concept, and that it would be impossible for an AI to be truly general (even if we restrict the no free lunch theorem to economically valuable tasks).

Why I Interviewed Blake

Although I do not agree with everything he says, I think there is value in trying to interact with AI researchers outside of the AI Alignment bubble, understanding exactly what arguments they buy and do not buy, eventually nailing down some cruxes that would convince them that AI existential risk is worth thinking about.

Better understanding LeCun's position has been valuable for many on LessWrong (see for instance the 2019 debate with Bengio and Russell), and Blake thinking is close to Yann's, given they are part of a similar philosophical bent.

Why you Might Want to Talk to Skeptics

Another exercise I found insightful was (mostly incorrectly) assessing people's views on AI Alignment and AI timelines, which made me understand better (thanks Cunningham's law!) the views of optimists (they turned out to be pretty close to Richard Ngo's reasons for optimism at 11:36 here).

In any case, I recommend to people who are in touch with ML researchers or practitioners to 1) get to a level where they feel comfortable steelmanning them 2) do a write-up of their positions on LW/EAF. That would help nail down the community's understanding of what arguments are convincing or not, and what would make them change their mind.

To that end, here are what Blake has to say about his position on AGI and what could make his change his mind about existential risk.

 

Generalizing to "All Sort of Tasks We Might Want It To do"

"We know from the no free lunch theorem that you cannot have a learning algorithm that outperforms all other learning algorithms across all tasks. [...] Because the set of all possible tasks will include some really bizarre stuff that we certainly don’t need our AI systems to do. And in that case, we can ask, “Well, might there be a system that is good at all the sorts of tasks that we might want it to do?” Here, we don’t have a mathematical proof, but again, I suspect Yann’s intuition is similar to mine, which is that you could have systems that are good at a remarkably wide range of things, but it’s not going to cover everything you could possibly hope to do with AI or want to do with AI."

Contra Transfer Learning from Scaling

"What’s happened with scaling laws is that we’ve seen really impressive ability to transfer to related tasks. So if you train a large language model, it can transfer to a whole bunch of language-related stuff, very impressively. And there’s been some funny work that shows that it can even transfer to some out-of-domain stuff a bit, but there hasn’t been any convincing demonstration that it transfers to anything you want. And in fact, I think that the recent paper… The Gato paper from DeepMind actually shows, if you look at their data, that they’re still getting better transfer effects if you train in domain than if you train across all possible tasks."

On Recursive Self-Improvement

"Per this specificity argument, my intuition is that an AI that is good at writing AI code  might not have other types of intelligence. And so this is where I’m less concerned about the singularity because if I have an AI system that’s really good at coding, I’m not convinced that it’s going to be good at other things. [...] Instead, what I can imagine is that you have an AI that’s really good at writing code, it generates other AI that might be good at other things. And if it generates another AI that’s really good at code, that new one is just going to be that: an AI that’s good at writing code."

Scaling is "Something" You Need

"Will scale be literally all you need? No, I don’t think so. In so far as… I think that right off the bat, in addition to scale, you’re going to need careful consideration of the data that you train it on. And you’re never going to be able to escape that. So human-like decisions on the data you need is something you cannot put aside totally. But the other thing is, I suspect that architecture is going to matter in the long run.

I think we’re going to find that systems that have appropriate architectures for solving particular types of problems will again outperform those that don’t have the appropriate architectures for those problems. [...] my personal bet is that we will find new ways of doing transformers or self-attention plus other stuff that again makes a big step change in our capabilities."

On the Bitter Lessons being Half True

"For RL meta-learning systems have yet to outperform other systems that are trained specifically using model-free components. [...] a lot of the current models are based on diffusion stuff, not just bigger transformers. If you didn’t have diffusion models and you didn’t have transformers, both of which were invented in the last five years, you wouldn’t have GPT-3 or DALL-E. And so I think it’s silly to say that scale was the only thing that was necessary because that’s just clearly not true."

On the Difficulty of Long-term Credit Assignment

"One of the questions that I already alluded to earlier is the issue of really long-term credit assignment. So, if you take an action and then the outcome of that action is felt a month later, how do you connect that? How do you make the connection to those things? Current AI systems can’t do that."

"the reason Montezuma’s revenge was so difficult for standard RL algorithms is, if you just do random exploration in Montezuma’s revenge, it’s garbage, you die constantly. Because there’s all sorts of ways to die. And so you can’t take that approach. You need to basically take that approach of like, “Okay up to here is good. Let’s explore from this point on.” Which is basically what Uber developed."

On What Would Make him Change his Mind

"I suppose what would change my mind on this is, if we saw that with increasing scale, but not radically changing the way that we train the… Like the data we train them on or the architectures we use. And I even want to take out the word radically without changing the architectures or the way we feed data. And if what we saw were systems that really… You couldn’t find weird behaviors, no matter how hard you tried. It always seemed to be doing intelligent things. Then I would really buy it. I think what’s interesting about the existing systems, is they’re very impressive and it’s pretty crazy what they can do, but it doesn’t take that much probing to also find weird silly behaviors still. Now maybe those silly behaviors will disappear in another couple orders of magnitude in which case I will probably take a step back and go, “Well, maybe scale is all you need”."

(disclaimer for commenters: even if you disagree about the reasoning, remember that those are just intuitions from a podcast whose sole purpose is to inform about why ML researchers are not really concerned about existential risk from AI).

63

13 comments, sorted by Click to highlight new comments since: Today at 12:27 PM
New Comment

Good post!

I think in an ideal world, you could confidently say that "there is value in trying to interact with AI researchers outside of the AI Alignment bubble", not only so you can figure out cruxes and better convince them, but actually because you might learn they are right and we are wrong. I don't know whether you believe that, but it seems not only true but also follows very strongly from our movements epistemic ideals about being open-minded to follow evidence and reason where it leads.

If you felt that you would get pushback on suggesting that there's an outside view where AGI Alignment cause area sceptics might be right, I hope you are wrong, but if there are many other people who feel that way, it indicates some kind of epistemic problem in our movement.

Any time we're in a place where someone feels there's something critical they can't say, even when speaking in good faith, to best use evidence and reason to do the most good, that's a potential epistemic failure mode we need to guard against.

Thanks for the reminder on the open-minded epistemics ideal of the movement. To clarify, I do spend a lot of time reading posts from people who are concerned about AI Alignment, and talking to multiple "skeptics" made me realize things that I had not properly considered before, learning where AI Alignment arguments might be wrong or simply overconfident.

(FWIW I did not feel any pushback in suggesting that skeptics might be right on the EAF, and, to be clear, that was not my intention. The goal was simply to showcase a methodology to facilitate a constructive dialogue between the Machine Learning and AI Alignment community.)

I think I just straightforwardly agree with all of these quotes and I don't really see why any of this should make us skeptical of x-risk from AI.

I maybe disagree somewhat on the position on scale? I expect that you could get GPT-3 with older architectures but significantly more parameters + data + compute. I'd still agree with "we wouldn't have GPT-3" because the extra cost would mean we wouldn't have trained it by now, but plausibly Blake thinks "even with additional parameters + data + compute beyond what we would have had, other architectures wouldn't have worked"? So I don't disagree with any of the explicitly stated positions but it's plausible I disagree with the worldview.

I maybe also disagree with the position on long-term credit assignment? I again agree with the stated point that current RL algorithms mostly can't do long-term credit assignment when trained from scratch on an environment. But maybe Blake thinks this is important to get powerful AI systems, whereas I wouldn't say that.

I think he would agree with "we wouldn't have GPT-3 from an economical perspective".  I am not sure whether he would agree with a theoretical impossibility. From the transcript:

"Because a lot of the current models are based on diffusion stuff, not just bigger transformers. If you didn’t have diffusion models [and] you didn’t have transformers, both of which were invented in the last five years, you wouldn’t have GPT-3 or DALL-E. And so I think it’s silly to say that scale was the only thing that was necessary because that’s just clearly not true."

To be clear, the part about the credit assignment problem was mostly when discussing the research at his lab, and he did not explicitly argue that the long-term credit assignment problem was evidence that training powerful AI systems is hard. I included the quote because it was relevant, but it was not an "argument" per se.

Thanks Rohin, I second almost all of this.

Interested to hear more about why long-term credit assignment isn't needed for powerful AI. I think it depends how you quantify those things and I'm pretty unsure about this myself.

Is it because there is already loads of human-generated data which implicitly embody or contain enough long-term credit assignment? Or is it that long-term credit assignment is irrelevant for long-term reasoning? Or maybe long-term reasoning isn't needed for 'powerful AI'?

We're tackling the problem "you tried out a long sequence of actions, and only at the end could you tell whether the outcomes were good or not, and now you have to figure out which actions ".

Some approaches to this that don't involve "long-term credit assignment" as normally understood by RL practitioners:

  • Have humans / other AI systems tell you which of the actions were useful. (One specific way this could be achieved is to use humans / AI systems to provide a dense reward, kinda like in summarizing books from human feedback.)
  • Supervise the AI system's reasoning process rather than the outcomes it gets (e.g. like chain-of-thought prompting but with more explicit supervision).
  • Just don't even bother, do regular old self-supervised learning on a hard task; in order to get good performance maybe the model has to develop "general intelligence" (i.e. something akin to the algorithms humans use in order to do long-term planning; after all our long-term planning doesn't work via trial and error).

I think it's also plausible that (depending on your definitions) long-term reasoning isn't needed for powerful AI.

Here's two quotes you might disagree with. If true, they seem like they would make us slightly more skeptical of x-risk from AI, though not countering the entire argument. 

Richards argues that lack of generality will make recursive self-improvement more difficult:

I’m less concerned about the singularity because if I have an AI system that’s really good at coding, I’m not convinced that it’s going to be good at other things. And so it’s not the case that if it produces a new AI system, that’s even better at coding, that that new system is now going to be better at other things. And that you get this runaway train of the singularity. 

Instead, what I can imagine is that you have an AI that’s really good at writing code, it generates other AI that might be good at other things. And if it generates another AI that’s really good at code, that new one is just going to be that: an AI that’s good at writing code. And maybe we can… So to some extent, we can keep getting better and better and better at producing AI systems with the help of AI systems. But a runaway train of a singularity is not something that concerns me...

The problem with that argument is that the claim is that the smarter version of itself is going to be just smarter across the board. Right? And so that’s where I get off the train. I’m like, “No, no, no, no. It’s going to be better at say programming or better at protein folding or better at causal reasoning. That doesn’t mean it’s going to be better at everything.”

He also argues that lack of generality will also make deception more difficult: 

One of the other key things for the singularity argument that I don’t buy, is that you would have an AI that then also knows how to avoid people’s potential control over it. Right? Because again, I think you’d have to create an AI that specializes in that. Or alternatively, if you’ve got the master AI that programs other AIs, it would somehow also have to have some knowledge of how to manipulate people and avoid their powers over it. Again, if it’s really good at programming, I don’t think it’s going to be able to be particularly good at manipulating people. 

These arguments at least indicate that generality is a risk factor for AI x-risk.  Forecasting whether superintelligent systems will be general or narrow seems more difficult but not impossible. Language models have already shown strong potential for both writing code and persuasion, which is a strike in favor of generality. Ditto for Gato's success across multiple domains (EDIT: Or is it? See below). More outside view arguments about the benefits or costs of using the one model for many different tasks seem mixed and don't sway my opinion much.  Curious to hear other considerations. 

Very glad to see this interview and the broader series. Engaging with more ML researchers seems like a good way to popularize AI safety and learn something in the process. 

I still feel mostly in agreement with those quotes (though less so than the ones in the original post).

On the first, I mostly agree that if you make an AI that's better at coding, it will be better at coding but not necessarily anything else. The one part I disagree with is that this means "no singularity": I don't think this really affects the argument for a singularity, which according to me is primarily about the more ideas -> more output -> more "people" -> more ideas positive feedback loop. I also don't think the singularity argument or recursive self-improvement argument is that important for AI risk, as long as you believe that AI systems will become significantly more capable than humanity (see also here).

On the second, it seems very plausible that your first coding AIs are not very good at manipulating people. But it doesn't necessarily need to manipulate people; a coding AI could hack into other servers that are not being monitored as heavily and run copies of itself there; those copies could then spend time learning and planning their next moves. (This requires some knowledge / understanding of humans, like that they would not like it if you achieved your goals, and that they are monitoring your server, but it doesn't seem to require anywhere near human-level understanding of how to manipulate humans.)

Thanks for the quotes and the positive feedback on the interview/series!

Re Gato: we also mention it as a reason why training across multiple domains does not increase performance in narrow domains, so there is also evidence against generality (in the sense of generality being useful). From the transcript:

"And there’s been some funny work that shows that it can even transfer to some out-of-domain stuff a bit, but there hasn’t been any convincing demonstration that it transfers to anything you want. And in fact, I think that the recent paper… The Gato paper from DeepMind actually shows, if you look at their data, that they’re still getting better transfer effects if you train in domain than if you train across all possible tasks."

I wrote something similar (with more detail) about the Gato paper at the time.

I don't think this is any evidence at all against AI risk though? It is maybe weak evidence against 'scaling is all you need' or that sort of thing.

This gave me an idea for an experiment/argument. Posting here, in case somebody wants come up with a more thought-out version of it and do it.

[On describing what would change his mind:] You couldn’t find weird behaviors [in the AI], no matter how hard you tried.

People like to take an AI, poke it, and then argue "it is doing [all these silly mistakes], therefore [not AGI/not something to worry about/...]". Now, the conclusion might be right, but the argument is wrong --- even dangerous things can be stupid is some settings. Nevertheless, the argument seems convincing. 

My prediction is that people make a lot of mistakes[1] that would seem equally laughable if it was AI that made them. Except that we are so used to them that we don't appreciate it.  So if one buys the argument above, they should conclude that humans are also [not general intelligence/something to worry about/...]. So perhaps if we presented the human mistakes right, it could become a memorable counterargument to "AI makes silly mistakes, hence no need to worry about it".

Some example formats:

  • "Look at these silly AI mistakes! Surprise, that's normal people." or
  • "Quizz: AI mistake or human mistake?"

(uhm, or "Quizz: AI or Trump?"; wouldn't mention this, except bots on that guy already exist).

Obligatory disclaimer: It might turn out that humans really don't make [any sorts of] [silly mistakes current AI makes], or make [so few that it doesn't matter]. If you could operationalize this, that would also be valuable.

  1. ^

    What is "these mistakes"? I don't know . Exercise for the reader.

Nice post! Thanks for sharing.