Ngo and Yudkowsky on scientific reasoning and pivotal acts

EliezerYudkowsky; richard_ngo

Ngo and Yudkowsky on scientific reasoning and pivotal acts

EliezerYudkowsky,

Comments 1

Sorted by

New & upvoted

Jackson Wagner

People interested in the analogy between AGI and how the superpowers managed to keep a lid on nuclear technology, might be interested in my post brainstorming how the world might have looked if nukes had been easier to build than they were in real life, but not so terrifyingly easy as in Nick Bostrom's "Vulnerable World Hypothesis": https://forum.effectivealtruism.org/posts/FtEPgeoThqpSMsnn6/nuclear-strategy-in-a-semi-vulnerable-world

But it does seem plausible to me that AI algorithmic improvements mean that the threshold level of control needed to ensure nonproliferation falls from "control semiconductor factories and supply chains" down to "control all individual GPUs", which is equivalent to the difficulty level of the original Bostrom scenario.

Comments

Ngo and Yudkowsky on scientific reasoning and pivotal acts — EA Forum

AI safety

AI alignment

Frontpage

This is a transcript of a conversation between Richard Ngo and Eliezer Yudkowsky, facilitated by Nate Soares (and with some comments from Carl Shulman). This transcript continues the Late 2021 MIRI Conversations sequence, following Ngo's view on alignment difficulty.

Color key:

Chat by Richard and Eliezer

Other chat

14. October 4 conversation

14.1. Predictable updates, threshold functions, and the human cognitive range

[Ngo][15:05]

Two questions which I'd like to ask Eliezer:

1. How strongly does he think that the "shallow pattern-memorisation" abilities of GPT-3 are evidence for Paul's view over his view (if at all)

2. How does he suggest we proceed, given that he thinks directly explaining his model of the chimp-human difference would be the wrong move?

[Yudkowsky][15:07]

1 - I'd say that it's some evidence for the Dario viewpoint which seems close to the Paul viewpoint. I say it's some evidence for the Dario viewpoint because Dario seems to be the person who made something like an advance prediction about it. It's not enough to make me believe that you can straightforwardly extend the GPT architecture to 3e14 parameters and train it on 1e13 samples and get human-equivalent performance.

[Ngo][15:09]

Did you make any advance predictions, around the 2008-2015 period, of what capabilities we'd have before AGI?

[Yudkowsky][15:10]

not especially that come to mind? on my model of the future this is not particularly something I am supposed to know unless there is a rare flash of predictability.

[Ngo][15:11]

1 - I'd say that it's some evidence for the Dario viewpoint which seems close to the Paul viewpoint. I say it's some evidence for the Dario viewpoint because Dario seems to be the person who made something like an advance prediction about it. It's not enough to make me believe that you can straightforwardly extend the GPT architecture to 3e14 parameters and train it on 1e13 samples and get human-equivalent performance.

For the record I remember Paul being optimistic about language when I visited OpenAI in summer 2018. But I don't know how advanced internal work on GPT-2 was by then.

[Yudkowsky][15:13]

2 - in lots of cases where I learned more specifics about X, and updated about Y, I had the experience of looking back and realizing that knowing anything specific about X would have predictably produced a directional update about Y. like, knowing anything in particular about how the first AGI eats computation, would cause you to update far away from thinking that biological analogies to the computation consumed by humans were a good way to estimate how many computations an AGI needs to eat. you know lots of details about how humans consume watts of energy, and you know lots of details about how modern AI consumes watts, so it's very visible that these quantities are so incredibly different and go through so many different steps that they're basically unanchored from each other.

I have specific ideas about how you get AGI that isn't just scaling up Stack More Layers, which lead me to think that the way to estimate the computational cost of it is not "3e14 parameters trained at 1e16 ops per step for 1e13 steps, because that much computation and parameters seems analogous to human biology and 1e13 steps is given by past scaling laws", a la recent OpenPhil publication. But it seems to me that it should be possible to have the abstract insight that knowing more about general intelligence in AGIs or in humans would make the biological analogy look less plausible, because you wouldn't be matching up an unknown key to an unknown lock.

Unfortunately I worry that this depends on some life experience with actual discoveries to get something this abstract-sounding on a gut level, because people basically never seem to make abstract updates of this kind when I try to point to them as predictable directional updates?

But, in principle, I'd hope there would be aspects of this where I could figure out how to show that any knowledge of specifics would probably update you in a predictable direction, even if it doesn't seem best for Earth for me to win that argument by giving specifics conditional on those specifics actually being correct, and it doesn't seem especially sound to win that argument by giving specifics that are wrong.

[Ngo][15:17]

I'm confused by this argument. Before I thought much about the specifics of the chimpanzee-human transition, I found the argument "humans foomed (by biological standards) so AIs will too" fairly compelling. But after thinking more about the specifics, it seems to me that the human foom was in part caused by a factor (sharp cultural shift) that won't be present when we train AIs.

[Yudkowsky][15:17]

sure, and other factors will be present in AIs but not in humans

[Ngo][15:17]

This seems like a case where more specific knowledge updated me away from your position, contrary to what you're claiming.

[Yudkowsky][15:18]

eg, human brains don't scale and mesh, while it's far more plausible that with AI you could just run more and more of it

that's a huge factor leading one to expect AI to scale faster than human brains did

it's like communication between humans, but squared!

this is admittedly a specific argument and I'm not sure how it would abstract out to any specific argument

[Ngo][15:20]

Again, this is an argument that I believed less after looking into the details, because right now it's pretty difficult to throw more compute at neural networks at runtime.

Which is not to say that it's a bad argument, the differences in compute-scalability between humans and AIs are clearly important. But I'm confused about the structure of your argument that knowing more details will predictably update me in a certain direction.

[Yudkowsky][15:21]

I suppose the genericized version of my actual response to that would be, "architectures that have a harder time eating more compute are architectures which, for this very reason, are liable to need better versions invented of them, and this in particular seems like something that plausibly happens before scaling to general intelligence is practically possible"

[Soares][15:23]

(Eliezer, I see Richard as requesting that you either back down from, or clarify, your claim that any specific observations about how much compute AI systems require will update him in a predictable direction.)

[Ngo: 👍]

[Yudkowsky][15:24]

I'm not saying I know how to make that abstractized argument for exactly what Richard cares about, in part because I don't understand Richard's exact model, just that it's one way to proceed past the point where the obvious dilemma crops up of, "If a theory about AGI capabilities is true, it is a disservice to Earth to speak it, and if a theory about AGI capabilities is false, an argument based on it is not sound."

[Ngo][15:25]

Ah, I see.

[Yudkowsky][15:26]

possible viewpoint to try: that systems in general often have threshold functions as well as smooth functions inside them.

only in ignorance, then, do we imagine that the whole thing is one smooth function.

the history of humanity has a threshold function of, like, communication or culture or whatever.

the correct response to this is not, "ah, so this was the unique, never-to-be-seen-again sort of fact which cropped up in the weirdly complicated story of humanity in particular, which will not appear in the much simpler story of AI"

this only sounds plausible because you don't know the story of AI so you think it will be a simple story

the correct generalization is "guess some weird thresholds will also pop up in whatever complicated story of AI will appear in the history books"

[Ngo][15:28]

Here's a quite general argument about why we shouldn't expect too many threshold functions in the impact of AI: because at any point, humans will be filling in the gaps of whatever AIs can't do. (The lack of this type of smoothing is, I claim, why culture was a sharp threshold for humans - if there had been another intelligent species we could have learned culture from, then we would have developed more gradually.)

[Yudkowsky][15:30]

something like this indeed appears in my model of why I expect not much impact on GDP before AGI is powerful enough to bypass human economies entirely

during the runup phase, pre-AGI won't be powerful to do "whole new things" that depend on doing lots of widely different things that humans can't do

just marginally new things that depend on doing one thing humans can't do, or can do but a bunch worse

[Ngo][15:31]

Okay, that's good to know.

Would this also be true in a civilisation of village idiots?

[Yudkowsky][15:32]

there will be sufficient economic reward for building out industries that are mostly human plus one thing that pre-AGI does, and people will pocket those economic rewards, go home, and not be more ambitious than that. I have trouble empathically grasping why almost all the CEOs are like this in our current Earth, because I am very much not like that myself, but observationally, the current Earth sure does seem to behave like rich people would almost uniformly rather not rock the boat too much.

I did not understand the whole thing about village idiots actually

do you want to copy and paste the document, or try rephrasing the argument?

[Ngo][15:35]

Rephrasing:

Claim 1: AIs will be better at doing scientific research (and other similar tasks) than village idiots, before we reach AGI.

Claim 2: Village idiots still have the core of general intelligence (which you claim chimpanzees don't have).

Claim 3: It would be surprising if narrow AI's research capabilities fell specifically into the narrow gap between village idiots and Einsteins, given that they're both general intelligences and are very similar in terms of architecture, algorithms, etc.

(If you deny claim 2, then we can substitute, say, someone at the 10th percentile of human intelligence - I don't know what specific connotations "village idiot" has to you.)

[Yudkowsky][15:37]

My models do not have an easy time of visualizing "as generally intelligent as a chimp, but specialized to science research, gives you superhuman scientific capability and the ability to make progress in novel areas of science".

(this is a reference back to the pre-rephrase in the document)

it seems like, I dunno, "gradient descent can make you generically good at anything without that taking too much general intelligence" must be a core hypothesis there?

[Ngo][15:39]

I mean, we both agree that gradient descent can produce some capabilities without also producing much general intelligence. But claim 1 plus your earlier claims that narrow AIs won't surpass humans at scientific research, lead to the implication that the limitations of gradient-descent-without-much-general-intelligence fall in a weirdly narrow range.

[Yudkowsky][15:42]

I do credit the Village Idiot to Einstein Interval with being a little broader as a target than I used to think, since the Alpha series of Go-players took a couple of years to go from pro to world-beating even once they had a scalable algorithm. Still seems to me that, over time, the wall clock time to traverse those ranges has been getting shorter, like Go taking less time than Chess. My intuitions still say that it'd be quite weird to end up hanging out for a long time with AGIs that conduct humanlike conversations and are ambitious enough to run their own corporations while those AGIs are still not much good at science.

But on my present model, I suspect the limitations of "gradient-descent-without-much-general-intelligence" to fall underneath the village idiot side?

[Ngo][15:43]

Oh, interesting.

That seems like a strong prediction

[Yudkowsky][15:43]

Your model, as I understand it, is saying, "But surely, GD-without-GI must suffice to produce better scientists than village idiots, by specializing chimps on science" and my current reply, though it's not a particular question I've thought a lot about before, is, "That... does not quite seem to me like a thing that should happen along the mainline?"

though, as always, in the limit of superintelligences doing things, or our having the Textbook From The Future, we could build almost any kind of mind on purpose if we knew how, etc.

[Ngo][15:44]

For example, I expect that if I prompt GPT-3 in the right way, it'll say some interesting and not-totally-nonsensical claims about advanced science.

Whereas it would be very hard to prompt a village idiot to do the same.

[Yudkowsky][15:44]

eg, a superintelligence could load up chimps with lots of domain-specific knowledge they were not generally intelligent enough to learn themselves.

ehhhhhh, it is not clear to me that GPT-3 is better than a village idiot at advanced science, even in this narrow sense, especially if the village idiot is allowed some training

[Ngo][15:46]

It's not clear to me either. But it does seem plausible, and then it seems even more plausible that this will be true of GPT-4

[Yudkowsky][15:46]

I wonder if we're visualizing different village idiots

my choice of "village idiot" originally was probably not the best target for visualization, because in a lot of cases, a village idiot - especially the stereotype of a village idiot - is, like, a damaged general intelligence with particular gears missing?

[Ngo][15:47]

I'd be happy with "10th percentile intelligence"

[Yudkowsky][15:47]

whereas it seems like what you want is something more like "Homo erectus but it has language"

oh, wow, 10th percentile intelligence?

that's super high

GPT-3 is far far out of its league

[Ngo][15:49]

I think GPT-3 is far below this person's league in a lot of ways (including most common-sense reasoning) but I become much less confident when we're talking about abstract scientific reasoning.

[Yudkowsky][15:51]

I think that if scientific reasoning were as easy as you seem to be imagining(?), the publication factories of the modern world would be much more productive of real progress.

[Ngo][15:51]

Well, a 10th percentile human is very unlikely to contribute to real scientific progress either way

[Yudkowsky][15:53]

Like, on my current model of how the world really works, China pours vast investments into universities and sober-looking people with PhDs and classes and tests and postdocs and journals and papers; but none of this is the real way of Science which is actually, secretly, unbeknownst to China, passed down in rare lineages and apprenticeships from real scientist mentor to real scientist student, and China doesn't have much in the way of lineages so the extra money they throw at stuff doesn't turn into real science.

[Ngo][15:52]

Can you think of any clear-cut things that they could do and GPT-3 can't?

[Yudkowsky][15:53]

Like... make sense... at all? Invent a handaxe when nobody had ever seen a handaxe before?

[Ngo][15:54]

You're claiming that 10th percentile humans invent handaxes?

[Yudkowsky][15:55]

The activity of rearranging scientific sentences into new plausible-sounding paragraphs is well within the reach of publication factories, in fact, they often use considerably more semantic sophistication than that, and yet, this does not cumulate into real scientific progress even in quite large amounts.

I think GPT-3 is basically just Not Science Yet to a much greater extent than even these empty publication factories.

If 10th percentile humans don't invent handaxes, GPT-3 sure as hell doesn't.

[Ngo][15:55]

I don't think we're disagreeing. Publication factories are staffed with people who do better academically than 90+% of all humans.

If 90th-percentile humans are very bad at science, then of course GPT-3 and 10th-percentile humans are very very bad at science. But it still seems instructive to compare them (e.g. on tasks like "talk cogently about a complex abstract topic")

[Yudkowsky][15:58]

I mean, while it is usually weird for something to be barely within a species's capabilities while being within those capabilities at all, such that only relatively smarter individual organisms can do it, in the case of something that a social species has only very recently started to do collectively, it's plausible that the thing appeared at the point where it was barely accessible to the smartest members. Eg, it wouldn't be surprising if it would have taken a long time or forever for humanity to invent science from scratch, if all the Francis Bacons and Newtons and even average-intelligence people were eliminated leaving only the bottom 10%. Because our species just started doing that, at the point where our species was barely able to start doing that, meaning, at the point where some rare smart people could spearhead it, historically speaking. It's not obvious whether or not less smart people can do it over a longer time.

I'm not sure we disagree much about the human part of this model.

My guess is that our disagreement is more about GPT-3.

"Talk 'cogently' about a complex abstract topic" doesn't seem like much of anything significant to me, if GPT-3 is 'cogent'. It fails to pass the threshold for inventing science and, I expect, for most particular sciences.

[Ngo][16:00]

How much training do you think a 10th-percentile human would need in a given subject matter (say, economics) before they could answer questions as well as GPT-3 can?

(Right now I think GPT-3 does better by default because it at least recognises the terminology, whereas most humans don't at all.)

[Yudkowsky][16:01]

I also expect that if you offer a 10th-percentile human lots of money, they can learn to talk more cogently than GPT-3 about narrower science areas. GPT-3 is legitimately more well-read at its lower level of intelligence, but train the 10-percentiler in a narrow area and they will become able to write better nonsense about that narrow area.

[Ngo][16:01]

This sounds like an experiment we can actually run.

[Yudkowsky][16:02]

Like, what we've got going on here is a real breadth advantage that GPT-3 has in some areas, but the breadth doesn't add up because it lacks the depth of a 10%er.

[Ngo][16:02]

If we asked them to read a single introductory textbook and then quiz both them and GPT-3 about items covered in that textbook, do you expect that the human would come out ahead?

[Yudkowsky][16:02]

AI has figured out how to do a subhumanly shallow kind of thinking, and it is to be expected that when AI can do anything at all, it can soon do more of that thing than the whole human species could do.

No, that's nothing remotely like giving the human the brief training the human needs to catch up to GPT-3's longer training.

A 10%er does not learn in an instant - they learn faster than GPT-3, but not in an instant.

This is more like a scenario of paying somebody to, like, sit around for a year with an editor, learning how to mix-and-match economics sentences until they can learn to sound more like they're making an argument than GPT-3 does, despite still not understanding any economics.

A lot of the learning would just go into producing sensible-sounding nonsense at all, since lots of 10%ers have not been to college and have not learned how to regurgitate rearranged nonsense for college teachers.

[Ngo][16:05]

What percentage of humans do you think could learn to beat GPT-3's question-answering by reading a single textbook over, say, a period of a month?

[Yudkowsky][16:06]

¯\_(ツ)_/¯

[Ngo][16:06]

More like 0.5 or 5 or 50?

[Yudkowsky][16:06]

Humans cannot in general pass the Turing Test for posing as AIs!

What percentage of humans can pass as a calculator by reading an arithmetic textbook?

Zero!

[Ngo][16:07]

I'm not asking them to mimic GPT-3, I'm asking them to produce better answers.

[Yudkowsky][16:07]

Then it depends on what kind of answers!

I think a lot of 10%ers could learn to do wedding-cake multiplication, if sufficiently well-paid as adults rather than being tortured in school, out to 6 digits, thus handily beating the current GPT-3 at 'multiplication'.

[Ngo][16:08]

For example: give them an economics textbook to study for a month, then ask them what inflation is, whether it goes up or down if the government prints more money, whether the price of something increases or decreases when the supply increases.

[Yudkowsky][16:09]

GPT-3 did not learn to produce its responses by reading textbooks.

You're not matching the human's data to GPT-3's data.

[Ngo][16:10]

I know, this is just the closest I can get in an experiment that seems remotely plausible to actually run.

[Yudkowsky][16:10]

You would want to collect, like, 1,000 Reddit arguments about inflation, and have the human read that, and have the human produce their own Reddit arguments, and have somebody tell them whether they sounded like real Reddit arguments or not.

The textbook is just not the same thing at all.

I'm not sure we're at the core of the argument, though.

To me it seems like GPT-3 is allowed to be superhuman at producing remixed and regurgitated sentences about economics, because this is about as relevant to Science talent as a calculator being able to do perfect arithmetic, only less so.

[Ngo][16:15]

Suppose that the remixed and regurgitated sentences slowly get more and more coherent, until GPT-N can debate with a professor of economics and sustain a reasonable position.

[Yudkowsky][16:15]

Are these points that GPT-N read elsewhere on the Internet, or are they new good points that no professor of economics on Earth has ever made before?

[Ngo][16:15]

I guess you don't expect this to happen, but I'm trying to think about what experiments we could run to get evidence for or against it.

The latter seems both very hard to verify, and also like a very high bar - I'm not sure if most professors of economics have generated new good arguments that no other professor has ever made before.

So I guess the former.

[Yudkowsky][16:18]

Then I think that you can do this without being able to do science. It's a lot like if somebody with a really good memory was lucky enough to have read that exact argument on the Internet yesterday, and to have a little talent for paraphrasing. Not by coincidence, having this ability gives you - on my model - no ability to do science, invent science, be the first to build handaxes, or design nanotechnology.

I admit, this does reflect my personal model of how Science works, presumably not shared by many leading bureaucrats, where in fact the papers full of regurgitated scientific-sounding sentences are not accomplishing much.

[Ngo][16:20]

So it seems like your model doesn't rule out narrow AIs producing well-reviewed scientific papers, since you don't trust the review system very much.

[Yudkowsky][16:23]

I'm trying to remember whether or not I've heard of that happening, like, 10 years ago.

My vague recollection is that things in the Sokal Hoax genre where the submissions succeeded, used humans to hand-generate the nonsense rather than any submissions in the genre having been purely machine-generated.

[Ngo][16:24]

Which doesn't seem like an unreasonable position, but it does make it harder to produce tests that we have opposing predictions on.

[Yudkowsky][16:24]

Obviously, that doesn't mean it couldn't have been done 10 years ago, because 10 years ago it's plausibly a lot easier to hand-generate passing nonsense than to write an AI program that does it.

oh, wait, I'm wrong!

https://news.mit.edu/2015/how-three-mit-students-fooled-scientific-journals-0414

In April of 2005 the team’s submission, “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy,” was accepted as a non-reviewed paper to the World Multiconference on Systemics, Cybernetics and Informatics (WMSCI), a conference that Krohn says is known for “being spammy and having loose standards.”

in 2013 IEEE and Springer Publishing removed more than 120 papers from their sites after a French researcher’s analysis determined that they were generated via SCIgen

[Ngo][16:26]

Oh, interesting

Meta note: I'm not sure where to take the direction of the conversation at this point. Shall we take a brief break?

[Yudkowsky][16:27]

The creators continue to get regular emails from computer science students proudly linking to papers they’ve snuck into conferences, as well as notes from researchers urging them to make versions for other disciplines.

Sure! Resume 5p?

[Ngo][16:27]

Yepp

14.2. Domain-specific heuristics and nanotechnology

[Soares][16:41]

A few takes:

1. It looks to me like there's some crux in "how useful will the 'shallow' stuff get before dangerous things happen". I would be unsurprised if this spiraled back into the gradualness debate. I'm excited about attempts to get specific and narrow disagreements in this domain (not necessarily bettable; I nominate distilling out specific disagreements before worrying about finding bettable ones).

2. It seems plausible to me we should have some much more concrete discussion about possible ways things could go right, according to Richard. I'd be up for playin the role of beeping when things seem insufficiently concrete.

3. It seems to me like Richard learned a couple things about Eliezer's model in that last bout of conversation. I'd be interested to see him try to paraphrase his current understanding of it, and to see Eliezer produce beeps where it seems particularly off.

[Yudkowsky][17:00]

👋

[Ngo][17:02]

Hmm, I'm not sure that I learned too much about Eliezer's model in this last round.

[Soares][17:03]

(dang :-p)

[Ngo][17:03]

It seems like Eliezer thinks that the returns of scientific investigation are very heavy-tailed.

Which does seem pretty plausible to me.

But I'm not sure how useful this claim is for thinking about the development of AI that can do science.

I attempted in my document to describe some interventions that would help things go right.

And the levels of difficulty involved.

[Yudkowsky][17:07]

(My model is something like: there are some very shallow steps involved in doing science, lots of medium steps, occasional very deep steps, assembling the whole thing into Science requires having all the lego blocks available. As soon as you look at anything with details, it ends up 'heavy-tailed' because it has multiple pieces and says how things don't work if all the pieces aren't there.)

[Ngo][17:08]

Eliezer, do you have an estimate of how much slower science would proceed if everyone's IQs were shifted down by, say, 30 points?

[Yudkowsky][17:10]

It's not obvious to me that science proceeds significantly past its present point. I would not have the right to be surprised if Reality told me the correct answer was that a civilization like that just doesn't reach AGI, ever.

[Ngo][17:12]

Doesn't your model take a fairly big hit from predicting that humans just happen to be within 30 IQ points of not being able to get any more science?

It seems like a surprising coincidence.

Or is this dependent on the idea that doing science is much harder now than it used to be?

And so if we'd been dumber, we might have gotten stuck before newtonian mechanics, or else before relativity?

[Yudkowsky][17:13]

No, humanity is exactly the species that finds it barely possible to do science.

[Ngo][17:14]

It seems to me like humanity is exactly the species that finds it barely possible to do civilisation.

[Yudkowsky][17:14]

If it were possible to do it with less intelligence, we'd be having this conversation over the Internet that we'd developed with less intelligence.

[Ngo][17:15]

And it seems like many of the key inventions that enabled civilisation weren't anywhere near as intelligence-bottlenecked as modern science.

[Yudkowsky][17:15]

Yes, it does seem that there's quite a narrow band between "barely smart enough to develop agriculture" and "barely smart enough to develop computers"! Though there were genuinely fewer people in the preagricultural world, with worse nutrition and no Ashkenazic Jews, and there's the whole question about to what degree the reproduction of the shopkeeper class over several centuries was important to the Industrial Revolution getting started.

[Ngo][17:15]

(e.g. you'd get better spears or better plows or whatever just by tinkering, whereas you'd never get relativity just by tinkering)

[Yudkowsky][17:17]

I model you as taking a lesson from this which is something like... you can train up a villager to be John von Neumann by spending some evolutionary money on giving them science-specific brain features, since John von Neumann couldn't have been much more deeply or generally intelligent, and you could spend even more money and make a chimp a better scientist than John von Neumann.

My model is more like, yup, the capabilities you need to invent aqueducts sure do generalize the crap out of things, though also at the upper end of cognition there are compounding returns which can bring John von Neumann into existence, and also also there's various papers suggesting that selection was happening really fast over the last few millennia and real shifts in cognition shouldn't be ruled out. (This last part is an update to what I was thinking when I wrote Intelligence Explosion Microeconomics, and is from my own perspective a more gradualist line of thinking, because it means there's a wider actual target to traverse before you get to von Neumann.)

[Ngo][17:20]

It's not that "von Neumann isn't much more deeply generally intelligent", it's more like "domain-specific heuristics and instincts get you a long way". E.g. soccer is a domain where spending evolutionary money on specific features will very much help you beat von Neumann, and so is art, and so is music.

[Yudkowsky][17:20]

My skepticism here is that there's a version of, like, "invent nanotechnology" which routes through just the shallow places, which humanity stumbles over before we stumble over deep AGI.

[Ngo][17:21]

Would you be comfortable publicly discussing the actual cognitive steps which you think would be necessary for inventing nanotechnology?

[Yudkowsky][17:23]

It should not be overlooked that there's a very valid sibling of the old complaint "Anything you can do ceases to be AI", which is that "Things you can do with surprisingly-to-your-model shallow cognition are precisely the things that Reality surprises you by telling you that AI can do earlier than you expected." When we see GPT-3, we were getting some amount of real evidence about AI capabilities advancing faster than I expected, and some amount of evidence about GPT-3's task being performable using shallower cognition than expected.

Many people were particularly surprised by Go because they thought that Go was going to require deeper real thought than chess.

And I think AlphaGo probably was thinking in a legitimately deeper way than Deep Blue. Just not as much deeper as Douglas Hofstadter thought it would take.

Conversely, people thought a few years ago that driving cars really seemed to be the sort of thing that machine learning would be good at, and were unpleasantly surprised by how the last 0.1% of driving conditions were resistant to shallow techniques.

Despite the inevitable fact that some surprises of this kind now exist, and that more such surprises will exist in the future, it continues to seem to me that science-and-engineering on the level of "invent nanotech" still seems pretty unlikely to be easy to do with shallow thought, by means that humanity discovers before AGI tech manages to learn deep thought?

What actual cognitive steps? Outside-the-box thinking, throwing away generalizations that governed your previous answers and even your previous questions, inventing new ways to represent your questions, figuring out which questions you need to ask and developing plans to answer them; these are some answers that I hope will be sufficiently useless to AI developers that it is safe to give them, while still pointing in the direction of things that have an un-GPT-3-like quality of depth about them.

Doing this across unfamiliar domains that couldn't be directly trained in by gradient descent because they were too expensive to simulate a billion examples of

If you have something this powerful, why is it not also noticing that the world contains humans? Why is it not noticing itself?

[Ngo][17:30]

If humans were to invent this type of nanotech, what do you expect the end intellectual result to be?

E.g. consider the human knowledge involved in building cars

There are thousands of individual parts, each of which does a specific thing

[Yudkowsky][17:30]

Uhhhh... is there a reason why "Eric Drexler's Nanosystems but, like, the real thing, modulo however much Drexler did not successfully Predict the Future about how to do that, which was probably a lot" is not the obvious answer here?

[Ngo][17:31]

And some deep principles governing engines, but not really very crucial ones to actually building (early versions of) those engines

[Yudkowsky][17:31]

that's... not historically true at all?

getting a grip on quantities of heat and their flow was critical to getting steam engines to work

it didn't happen until the math was there

[Ngo][17:32]

Ah, interesting

[Yudkowsky][17:32]

maybe you can be a mechanic banging on an engine that somebody else designed, around principles that somebody even earlier invented, without a physics degree

but, like, engineers have actually needed math since, like, that's been a thing, it wasn't just a prestige trick

[Ngo][17:34]

Okay, so you expect there to be a bunch of conceptual work in finding equations which govern nanosystems.

Uhhhh... is there a reason why "Eric Drexler's Nanosystems but, like, the real thing, modulo however much Drexler did not successfully Predict the Future about how to do that, which was probably a lot" is not the obvious answer here?

This may in fact be the answer; I haven't read it though.

[Yudkowsky][17:34]

or other abstract concepts than equations, which have never existed before

like, maybe not with a type signature unknown to humanity, but with specific instances unknown to present humanity

that's what I'd expect to see from humanly designed nanosystems

[Ngo][17:35]

So something like AlphaFold is only doing a very small proportion of the work here, since it's not able to generate new abstract concepts (of the necessary level of power)

[Yudkowsky][17:35]

yeeeessss, that is why DeepMind did not take over the world last year

it's not just that AlphaFold lacks the concepts but that it lacks the machinery to invent those concepts and the machinery to do anything with such concepts

[Ngo][17:38]

I think I find this fairly persuasive, but I also expect that people will come up with increasingly clever ways to leverage narrow systems so that they can do more and more work.

(including things like: if you don't have enough simulations, then train another narrow system to help fix that, etc)

[Yudkowsky][17:39]

(and they will accept their trivial billion-dollar-payouts and World GDP will continue largely undisturbed, on my mainline model, because it will be easiest to find ways to make money by leveraging narrow systems on the less regulated, less real parts of the economy, instead of trying to build houses or do medicine, etc.)

real tests being expensive, simulation being impossibly expensive, and not having enough samples to train your civilization's current level of AI technology, is not a problem you can solve by training a new AI to generate samples, because you do not have enough samples to train your civilization's current level of AI technology to generate more samples

[Ngo][17:41]

Thinking about nanotech makes me more sympathetic to the argument that developing general intelligence will bring a sharp discontinuity. But it also makes me expect longer timelines to AGI, during which there's more time to do interesting things with narrow AI. So I guess it weighs more against Dario's view, less against Paul's view.

[Yudkowsky][17:41]

well, I've been debating Paul about that separately in the timelines channel, not sure about recapitulating it here

but in broad summary, since I expect the future to look like it was drawn from the "history book" barrel and not the "futurism" barrel, I expect huge barriers to doing huge things with narrow AI in small amounts of time; you can sell waifutech because it's unregulated and hard to regulate, but that doesn't feed into core mining and steel production.

we could already have double the GDP if it was legal to build houses and hire people, etc., and the change brought by pre-AGI will perhaps be that our GDP could quadruple instead of just double if it was legal to do things, but that will not make it legal to do things, and why would anybody try to do things and probably fail when there are easier $36 billion profits to be made in waifutech.

14.3. Relatively shallow cognition, Go, and math

[Ngo][17:45]

I'd be interested to see Paul's description of how we would train AIs to solve hard scientific problems. I think there's some prediction that's like "we train it on arxiv and fine-tune it until it starts to output credible hypotheses about nanotech". And this seems like it has a step that's quite magical to me, but perhaps that'll be true of any prediction that I make before fully understanding how intelligence works.

[Yudkowsky][17:46]

my belief is not so much that this training can never happen, but that this probably means the system was trained beyond the point of safe shallowness

not in principle over all possible systems a superintelligence could build, but in practice when it happens on Earth

my only qualm about this is that current techniques make it possible to buy shallowness in larger quantities than this Earth has ever seen before, and people are looking for surprising ways to make use of that

so I weigh in my mind the thought of Reality saying Gotcha! by handing me a headline I read tomorrow about how GPT-4 has started producing totally reasonable science papers that are actually correct

and I am pretty sure that exact thing doesn't happen

and I ask myself about GPT-5 in a few more years, which had the same architecture as GPT-3 but more layers and more training, doing the same thing

and it's still largely "nope"

then I ask myself about people in 5 years being able to use the shallow stuff in any way whatsoever to produce the science papers

and of course the answer there is, "okay, but is it doing that without having shallowly learned stuff that adds up to deep stuff which is why it can now do science"

and I try saying back "no, it was born of shallowness and it remains shallow and it's just doing science because it turns out that there is totally a way to be an incredibly mentally shallow skillful scientist if you think 10,000 shallow thoughts per minute instead of 1 deep thought per hour"

and my brain is like, "I cannot absolutely rule it out but it really seems like trying to call the next big surprise in 2014 and you guess self-driving cars instead of Go because how the heck would you guess that Go was shallower than self-driving cars"

like, that is an imaginable surprise

[Ngo][17:52]

On that particular point it seems like the very reasonable heuristic of "pick the most similar task" would say that go is like chess and therefore you can do it shallowly.

[Yudkowsky][17:52]

but there's a world of difference between saying that a surprise is imaginable, and that it wouldn't surprise you

[Ngo][17:52]

I wasn't thinking that much about AI at that point, so you're free to call that post-hoc.

[Yudkowsky][17:52]

the Chess techniques had already failed at Go

actual new techniques were required

the people around at the time had witnessed sudden progress on self-driving cars a few years earlier

[Ngo][17:53]

My advance prediction here is that "math is like go and therefore can be done shallowly".

[Yudkowsky][17:53]

self-driving cars were of obviously greater economic interest as well

my recollection is that talk of the time was about self-driving

heh! I have the same sense.

that is, math being shallower than science.

though perhaps not as shallow as Go, and you will note that Go has fallen and Math has not

[Ngo][17:54]

right

I also expect that we'll need new techniques for math (although not as different from the go techniques as the go techniques were from chess techniques)

But I guess we're not finding strong disagreements here either.

[Yudkowsky][17:57]

if Reality came back and was like "Wrong! Keeping up with the far reaches of human mathematics is harder than being able to develop your own nanotech," I would be like "What?" to about the same degree as being "What?" on "You can build nanotech just by thinking trillions of thoughts that are too shallow to notice humans!"

[Ngo][17:58]

Perhaps let's table this topic and move on to one of the others Nate suggested? I'll note that walking through the steps required to invent a science of nanotechnology does make your position feel more compelling, but I'm not sure how much of that is the general "intelligence is magic" intuition I mentioned before.

[Yudkowsky][17:59]

How do you suspect your beliefs would shift if you had any detailed model of intelligence?

Consider trying to imagine a particular wrong model of intelligence and seeing what it would say differently?

(not sure this is a useful exercise and we could indeed try to move on)

[Ngo][18:01]

I think there's one model of intelligence where scientific discovery is more actively effortful - as in, you need to be very goal-directed in determining hypotheses, testing hypotheses, and so on.

And there's another in which scientific discovery is more constrained by flashes of insight, and the systems which are producing those flashes of insight are doing pattern-matching in a way that's fairly disconnected from the real-world consequences of those insights.

[Yudkowsky][18:05]

The first model is true and the second one is false, if that helps. You can tell this by contemplating where you would update if you learned any model, by considering that things look more disconnected when you can't see the machinery behind them. If you don't know what moves the second hand on a watch and the minute hand on a watch, they could just be two things that move at different rates for completely unconnected reasons; if you can see inside the watch, you'll see that the battery is shared and the central timing mechanism is shared and then there's a few gears to make the hands move at different rates.

Like, in my ontology, the notion of "effortful" doesn't particularly parse as anything basic, because it doesn't translate over into paperclip maximizers, which are neither effortful nor effortless.

But in a human scientist you've got thoughts being shoved around by all sorts of processes behind the curtains, created by natural selection, some of them reflecting shards of Consequentialism / shadowing paths through time

The flashes of insight come to people who were looking in nonrandom places

If they didn't plan deliberately and looked on pure intuition, they looked with an intuition trained by past success and failure

Somebody walking doesn't plan to walk, but long ago as a baby they learned from falling over, and their ancestors who fell over more didn't reproduce

[Ngo][18:09]

I think the first model is probably more true for humans in the domain of science. But I'm uncertain about the extent to which this because humans have not been optimised very much for doing science. If we consider the second model in a domain that humans have actually been optimised very hard for (say, physical activity) - then maybe we can use the analogy of a coach and a player. The coach can tell the player what to practice, but almost all the work is done by the player practicing in a way which updates their intuitions.

This has become very abstract, though.

14.4. Pivotal acts and historical precedents

14.5. Past ANN progress

[Ngo][18:46]

I don't expect another paradigm shift like that

(in part because I'm not sure the paradigm shift actually happened in the first place - it seems like neural networks were improving pretty continuously over many decades)

[Yudkowsky][18:47]

I've noticed that opinion around OpenPhil! It makes sense if you have short timelines and expect the world to end before there's another paradigm shift, but OpenPhil doesn't seem to expect that either.

Yeah, uh, there was kinda a paradigm shift in AI between say 2000 and now. There really, really was.

[Ngo][18:49]

What I mean is more like: it's not clear to me that an extrapolation of the trajectory of neural networks is made much better by incorporating data about the other people who weren't using neural networks.

[Yudkowsky][18:49]

Would you believe that at one point Netflix ran a prize contest to produce better predictions of their users' movie ratings, with a $1 million prize, and this was one of the largest prizes ever in AI and got tons of contemporary ML people interested, and neural nets were not prominent on the solutions list at all, because, back then, people occasionally solved AI problems not using neural nets?

I suppose that must seem like a fairy tale, as history always does, but I lived it!

[Ngo][18:50]

(I wasn't denying that neural networks were for a long time marginalised in AI)

I'd place much more credence on future revolutions occurring if neural networks had actually only been invented recently.

(I have to run in 2 minutes)

[Yudkowsky][18:51]

The world might otherwise end before the next paradigm shift, but if the world keeps on ticking for 10 years, 20 years, there will not always be the paradigm of training massive networks by even more massive amounts of gradient descent; I do not think that is actually the most efficient possible way to turn computation into intelligence.

Neural networks stayed stuck at only a few layers for a long time, because the gradients would explode or die out if you made the networks any deeper.

There was a critical moment in 2006(?) where Hinton and Salakhutdinov(?) proposed training Restricted Boltzmann machines unsupervised in layers, and then 'unrolling' the RBMs to initialize the weights in the network, and then you could do further gradient descent updates from there, because the activations and gradients wouldn't explode or die out given that initialization. That got people to, I dunno, 6 layers instead of 3 layers or something? But it focused attention on the problem of exploding gradients as the reason why deeply layered neural nets never worked, and that kicked off the entire modern field of deep learning, more or less.

[Ngo][18:56]

Okay, so are you claiming that that neural networks were mostly bottlenecked by algorithmic improvements, not compute availability, for a significant part of their history?

[Yudkowsky][18:56]

If anybody goes back and draws a graph claiming the whole thing was continuous if you measure the right metric, I am not really very impressed unless somebody at the time was using that particular graph and predicting anything like the right capabilities off of it.

[Ngo][18:56]

If so this seems like an interesting question to get someone with more knowledge of ML history than me to dig into; I might ask around.

[Yudkowsky][18:57]

[Okay, so are you claiming that that neural networks were mostly bottlenecked by algorithmic improvements, not compute availability, for a significant part of their history?]

Er... yeah? There was a long time when, even if you threw a big neural network at something, it just wouldn't work.

Good night, btw?

[Ngo][18:57]

Let's call it here; thanks for the discussion.

[Soares][18:57]

Thanks, both!

[Ngo][18:57]

I'll be interested to look into that claim, it doesn't fit with the impressions I have of earlier bottlenecks.

I think the next important step is probably for me to come up with some concrete governance plans that I'm excited about.

I expect this to take quite a long time

[Soares][18:58]

We can coordinate around that later. Sorry for keeping you so late already, Richard.

[Ngo][18:59]

No worries

My proposal would be that we should start on whatever work is necessary to convert the debate into a publicly accessible document now

In some sense coming up with concrete governance plans is my full-time job, but I feel like I'm still quite a way behind in my thinking on this, compared with people who have been thinking about governance specifically for longer

[Soares][19:01]

(@RobBensinger is already on it 🙂)

[Bensinger: ✅]

[Yudkowsky][19:03]

Nuclear plants might be like narrow AI in this analogy; some designs potentially contribute to proliferation, and you can get more economic wealth by building more of them, but they have no Unlabeled Doom Dial where you can get more and more wealth out of them by cranking them up until at some unlabeled point the atmosphere ignites.

Also a thought: I don't think you just want somebody with more knowledge of AI history, I think you might need to ask an actual old fogey who was there at the time, and hasn't just learned an ordered history of just the parts of the past that are relevant to the historian's theory about how the present happened.

Two of them, independently, to see if the answers you get are reliable-as-in-statistical-reliability.

[Soares][19:19]

My own quick take, for the record, is that it looks to me like there are two big cruxes here.

One is about whether "deep generality" is a good concept, and in particular whether it pushes AI systems quickly from "nonscary" to "scary" and whether we should expect human-built AI systems to acquire it in practice (before the acute risk period is ended by systems that lack it). The other is about how easy it will be to end the acute risk period (eg by use of politics or nonscary AI systems alone).

I suspect the latter is the one that blocks on Richard thinking about governance strategies. I'd be interested in attempting further progress on the former point, though it's plausible to me that that should happen over in #timelines instead of here.