This is the second post in a series of transcribed conversations about AGI forecasting and alignment. See the first post for prefaces and more information about the format.
|Chat by Richard Ngo and Eliezer Yudkowsky||Other chat||Inline comments|
5. September 14 conversation
5.1. Recursive self-improvement, abstractions, and miracles
Good morning / good evening.
So it seems like the obvious thread to pull today is your sense that I'm wrong about recursive self-improvement and consequentialism in a related way?
Right. And then another potential thread (probably of secondary importance) is the question of what you mean by utility functions, and digging more into the intuitions surrounding those.
But let me start by fleshing out this RSI/consequentialism claim.
I claim that your early writings about RSI focused too much on a very powerful abstraction, of recursively applied optimisation; and too little on the ways in which even powerful abstractions like this one become a bit... let's say messier, when they interact with the real world.
In particular, I think that Paul's arguments that there will be substantial progress in AI in the leadup to a RSI-driven takeoff are pretty strong ones.
(Just so we're on the same page: to what extent did those arguments end up shifting your credences?)
I don't remember being shifted by Paul on this at all. I sure shifted a lot over events like Alpha Zero and the entire deep learning revolution. What does Paul say that isn't encapsulated in that update - does he furthermore claim that we're going to get fully smarter-than-human in all regards AI which doesn't cognitively scale much further either through more compute or through RSI?
Ah, I see. In that case, let's just focus on the update from the deep learning revolution.
I'll also remark that I see my foreseeable mistake there as having little to do with "abstractions becoming messier when they interact with the real world" - this truism tells you very little of itself, unless you can predict directional shifts in other variables just by contemplating the unknown messiness relative to the abstraction.
Rather, I'd see it as a neighboring error to what I've called the Law of Earlier Failure, where the Law of Earlier Failure says that, compared to the interesting part of the problem where it's fun to imagine yourself failing, you usually fail before then, because of the many earlier boring points where it's possible to fail.
The nearby reasoning error in my case is that I focused on an interesting way that AI capabilities could scale and the most powerful argument I had to overcome Robin's objections, while missing the way that Robin's objections could fail even earlier through rapid scaling and generalization in a more boring way.
|It doesn't mean that my arguments about RSI were false about their domain of supposed application, but that other things were also true and those things happened first on our timeline. To be clear, I think this is an important and generalizable issue with the impossible task of trying to forecast the Future, and if I am wrong about other things it sure would be plausible if I was wrong in similar ways.|
Then the analogy here is something like: there is a powerful abstraction, namely consequentialism; and we both agree that (like RSI) a large amount of consequentialism is a very dangerous thing. But we disagree on the question of how much the strategic landscape in the leadup to highly-consequentialist AIs is affected by other factors apart from this particular abstraction.
"this truism tells you very little of itself, unless you can predict directional shifts in other variables just by contemplating the unknown messiness relative to the abstraction"
I disagree with this claim. It seems to me that the predictable direction in which the messiness pushes is away from the applicability of the high-level abstraction.
The real world is messy, but good abstractions still apply, just with some messiness around them. The Law of Earlier Failure is not a failure of the abstraction being messy, it's a failure of the subject matter ending up different such that the abstractions you used were about a different subject matter.
When a company fails before the exciting challenge where you try to scale your app across a million users, because you couldn't hire enough programmers to build your app at all, the problem is not that you had an unexpectedly messy abstraction about scaling to many users, but that the key determinants were a different subject matter than "scaling to many users".
Throwing 10,000 TPUs at something and actually getting progress - not very much of a famous technological idiom at the time I was originally arguing with Robin - is not a leak in the RSI abstraction, it's just a way of getting powerful capabilities without RSI.
To me the difference between these two things seems mainly semantic; does it seem otherwise to you?
If I'd been arguing with somebody who kept arguing in favor of faster timescales, maybe I'd have focused on that different subject matter and gotten a chance to be explicitly wrong about it. I mainly see my ur-failure here as letting myself be influenced by the whole audience that was nodding along very seriously to Robin's arguments, at the expense of considering how reality might depart in either direction from my own beliefs, and not just how Robin might be right or how to persuade the audience.
Also, "throwing 10,000 TPUs at something and actually getting progress" doesn't seem like an example of the Law of Earlier Failure - if anything it seems like an Earlier Success
it's an Earlier Failure of Robin's arguments about why AI wouldn't scale quickly, so my lack of awareness of this case of the Law of Earlier Failure is why I didn't consider why Robin's arguments could fail earlier
though, again, this is a bit harder to call if you're trying to call it in 2008 instead of 2018
but it's a valid lesson that the future is, in fact, hard to predict, if you're trying to do it in the past
and I would not consider it a merely "semantic" difference as to whether you made a wrong argument about the correct subject matter, or a correct argument about the wrong subject matter
these are like... very different failure modes that you learn different lessons from
but if you're not excited by these particular fine differences in failure modes or lessons to learn from them, we should perhaps not dwell upon that part of the meta-level Art
Okay, so let me see if I understand your position here.
Due to the deep learning revolution, it turned out that there were ways to get powerful capabilities without RSI. This isn't intrinsically a (strong) strike against the RSI abstraction; and so, unless we have reason to expect another similarly surprising revolution before reaching AGI, it's not a good reason to doubt the consequentialism abstraction.
Consequentialism and RSI are very different notions in the first place. Consequentialism is, in my own books, significantly simpler. I don't see much of a conceptual connection between the two myself, except insofar as they both happen to be part of the connected fabric of a coherent worldview about cognition.
It is entirely reasonable to suspect that we may get another surprising revolution before reaching AGI. Expecting a particular revolution that gives you particular miraculous benefits is much more questionable and is an instance of conjuring expected good from nowhere, like hoping that you win the lottery because the first lottery ball comes up 37. (Also, if you sincerely believed you actually had info about what kind of revolution might lead to AGI, you should shut up about it and tell very few carefully selected people, not bake it into a public dialogue.)
On this point: the implicit premise of "and also nothing else will break this abstraction or render it much less relevant" turns a correct argument about the wrong subject matter into an incorrect argument.
Though I'd also note that there's an important lesson of technique where you learn to say things like that out loud instead of keeping them "implicit".
Learned lessons like that are one reason why I go through your summary documents of our conversation and ask for many careful differences of wording about words like "will happen" and so on.
So I claim that:
1. A premise like this is necessary for us to believe that your claims about consequentialism lead to extinction.
2. A surprising revolution would make it harder to believe this premise, even if we don't know which particular revolution it is.
3. If we'd been told back in 2008 that a surprising revolution would occur in AI, then we should have been less confident in the importance of the RSI abstraction to understanding AGI and AGI risk.
Suppose I put to you that this claim is merely subsumed by all of my previous careful qualifiers about how we might get a "miracle" and how we should be trying to prepare for an unknown miracle in any number of places. Why suspect that place particularly for a model-violation?
I also think that you are misinterpreting my old arguments about RSI, in a pattern that matches some other cases of your summarizing my beliefs as "X is the one big ultra-central thing" rather than "X is the point where the other person got stuck and Eliezer had to spend a lot of time arguing".
I was always claiming that RSI was a way for AGI capabilities to scale much further once they got far enough, not the way AI would scale to human-level generality.
|This continues to be a key fact of relevance to my future model, in the form of the unfalsified original argument about the subject matter it previously applied to: if you lose control of a sufficiently smart AGI, it will FOOM, and this fact about what triggers the metaphorical equivalent of a full nuclear exchange and a total loss of the gameboard continues to be extremely relevant to what you have to do to obtain victory instead.|
Perhaps we're interpreting the word "miracle" in quite different ways.
|I think of it as an event with negligibly small probability.|
Events that actually have negligibly small probability are not much use in plans.
Which I guess doesn't fit with your claims that we should be trying to prepare for a miracle.
But I'm not recalling off the top of my head where you've claimed that.
I'll do a quick search of the transcript
"You need to hold your mind open for any miracle and a miracle you didn't expect or think of in advance, because at this point our last hope is that in fact the future is often quite surprising."
Okay, I see. The connotations of "miracle" seemed sufficiently strong to me that I didn't interpret "you need to hold your mind open" as practical advice.
What sort of probability, overall, do you assign to us being saved by what you call a miracle?
It's not a place where I find quantitative probabilities to be especially helpful.
And if I had one, I suspect I would not publish it.
Can you leak a bit of information? Say, more or less than 10%?
Though a lot of that is dominated, not by the probability of a positive miracle, but by the extent to which we seem unprepared to take advantage of it, and so would not be saved by one.
Yeah, I see.
5.2. The idea of expected utility
Okay, I'm now significantly less confident about how much we actually disagree.
At least about the issues of AI cognition.
You seem to suspect we'll get a particular miracle having to do with "consequentialism", which means that although it might be a miracle to me, it wouldn't be a miracle to you.
There is something forbidden in my model that is not forbidden in yours.
I think that's partially correct, but I'd call it more a broad range of possibilities in the rough direction of you being wrong about consequentialism.
Well, as much as it may be nicer to debate when the other person has a specific positive expectation that X will work, we can also debate when I know that X won't work and the other person remains ignorant of that. So say more!
That's why I've mostly been trying to clarify your models rather than trying to make specific claims of my own.
Which I think I'd prefer to continue doing, if you're amenable, by asking you about what entities a utility function is defined over - say, in the context of a human.
I think that to contain the concept of Utility as it exists in me, you would have to do homework exercises I don't know how to prescribe. Maybe one set of homework exercises like that would be showing you an agent, including a human, making some set of choices that allegedly couldn't obey expected utility, and having you figure out how to pump money from that agent (or present it with money that it would pass up).
Like, just actually doing that a few dozen times.
Maybe it's not helpful for me to say this? If you say it to Eliezer, he immediately goes, "Ah, yes, I could see how I would update that way after doing the homework, so I will save myself some time and effort and just make that update now without the homework", but this kind of jumping-ahead-to-the-destination is something that seems to me to be... dramatically missing from many non-Eliezers. They insist on learning things the hard way and then act all surprised when they do. Oh my gosh, who would have thought that an AI breakthrough would suddenly make AI seem less than 100 years away the way it seemed yesterday? Oh my gosh, who would have thought that alignment would be difficult?
Utility can be seen as the origin of Probability within minds, even though Probability obeys its own, simpler coherence constraints.
that is, you will have money pumped out of you, unless you weigh in your mind paths through time according to some quantitative weight, which determines how much resources you're willing to spend on preparing for them
this is why sapients think of things as being more or less likely
Suppose that this agent has some high-level concept - say, honour - which leads it to pass up on offers of money.
then there's two possibilities:
Right, I see.
Hmm, but it seems like humans often don't see concepts as helping to navigate a path in time to a destination. (E.g. the deontological instinct not to kill.)
And yet those concepts were in fact optimised into existence by evolution.
You're describing a defect of human reflectivity about their consequentialist structure, not a departure from consequentialist structure. 🙂
(Sorry, internet was slightly buggy; switched to a better connection now.)
But yes, from my perspective, it creates a very large conceptual gap that I can stare at something for a few seconds and figure out how to parse it as navigating paths through time, while others think that "consequentialism" only happens when their minds are explicitly thinking about "well, what would have this consequence" using language.
Similarly, when it comes to Expected Utility, I see that any time something is attaching relative-planning-weights to paths through time, not when a human is thinking out loud about putting spoken numbers on outcomes
Human consequentialist structure was optimised by evolution for a different environment. Insofar as we are consequentialists in a new environment, it's only because we're able to be reflective about our consequentialist structure (or because there are strong similarities between the environments).
It just generalized out-of-distribution because the underlying coherence of the coherent behaviors was simple.
When you have a very simple pattern, it can generalize across weak similarities, not "strong similarities".
The human brain is large but the coherence in it is simple.
The idea, the structure, that explains why the big thing works, is much smaller than the big thing.
So it can generalize very widely.
Taking this example of the instinct not to kill people - is this one of the "very simple patterns" that you're talking about?
"Reflectivity" doesn't help per se unless on some core level a pattern already generalizes, I mean, either a truth can generalize across the data or it can't? So I'm a bit puzzled about why you're bringing up "reflectivity" in this context.
An instinct not to kill doesn't even seem to me like a plausible cross-cultural universal. 40% of deaths among Yanomami men are in intratribal fights, iirc.
Ah, I think we were talking past each other. When you said "this concept of honor is something that you can see as helping to navigate a path through time to a destination" I thought you meant "you" as in the agent in question (as you used it in some previous messages) not "you" as in a hypothetical reader.
it would not have occurred to me to ascribe that much competence to an agent that wasn't a superintelligence.
even I don't have time to think about why more than
I might now try to throw a high-level (but still inchoate) disagreement at you and see how that goes. But while I'm formulating that, I'm curious what your thoughts are on where to take the discussion.
Actually, let's spend a few minutes deciding where to go next, and then take a break
I'm thinking that, at this point, there might be more value in moving onto geopolitics
Some of my current thoughts are a reiteration of old despair: It feels to me like the typical Other within EA has no experience with discovering unexpected order, with operating a generalization that you can expect will cover new cases even when that isn't immediately obvious, with operating that generalization to cover those new cases correctly, with seeing simple structures that generalize a lot and having that be a real and useful and technical experience; instead of somebody blathering in a non-expectation-constraining way about how "capitalism is responsible for everything wrong with the world", and being able to extend that to lots of cases.
I could try to use much simpler language in hopes that people actually look-at-the-water Feynman-style, like "navigating a path through time" instead of Consequentialism which is itself a step down from Expected Utility.
But you actually do lose something when you throw away the more technical concept. And then people still think that either you instantly see in the first second how something is a case of "navigating a path through time", or that this is something that people only do explicitly when visualizing paths through time using that mental terminology; or, if Eliezer says that it's "navigating time" anyways, this must be an instance of Eliezer doing that thing other people do when they talk about how "Capitalism is responsible for all the problems of the world". They have no experience operating genuinely useful, genuinely deep generalizations that extend to nonobvious things.
And in fact, being able to operate some generalizations like that is a lot of how I know what I know, in reality and in terms of the original knowledge that came before trying to argue that knowledge with people. So trying to convey the real source of the knowledge feels doomed. It's a kind of idea that our civilization has lost, like that college class Feynman ran into.
My own sense (having been back for about 20min) is that one of the key cruxes is in "is it possible that non-scary cognition will be able to end the acute risk period", or perhaps "should we expect a longish regime of pre-scary cognition, that we can study and learn to align in such a way that by the time we get scary cognition we can readily align it".
Some potential prompts for that:
I also have a bit of a sense that there's a bit more driving to do on the "perhaps EY is just wrong about the applicability of the consequentialism arguments" (in a similar domain), and would be happy to try articulating a bit of what I think are the not-quite-articulated-to-my-satisfaction arguments on that side.
I also had a sense - maybe mistaken - that RN did have some specific ideas about how "consequentialism" might be inapplicable. though maybe I accidentally refuted that in passing because the idea was "well, what if it didn't know what consequentialism was?" and then I explained that reflectivity was not required to make consequentialism generalize. but if so, I'd like RN to say explicitly what specific idea got refuted that way. or failing that, talk about the specific idea that didn't get refuted.
That wasn't my objection, but I do have some more specific ideas, which I could talk about.
And I'd also be happy for Nate to try articulating some of the arguments he mentioned above.
I have a general worry that this conversation has gotten too general, and that it would be more productive, even of general understanding, to start from specific ideas and shoot those down specifically.
The other thing is that, for pedagogical purposes, I think it'd be useful for you to express some of your beliefs about how governments will respond to AI
I think I have a rough guess about what those beliefs are, but even if I'm right, not everyone who reads this transcript will be
Why would I be expected to know that? I could talk about weak defaults and iterate through an unending list of possibilities.
Thinking that Eliezer thinks he knows that to any degree of specificity feels like I'm being weakmanned!
I'm not claiming you have any specific beliefs
I suppose I have skepticism when other people dream up elaborately positive and beneficial reactions apparently drawn from some alternate nicer political universe that had an absolutely different response to Covid-19, and so on.
But I'd guess that your models rule out, for instance, the US and China deeply cooperating on AI before it's caused any disasters
"Deeply"? Sure. That sounds like something that has never happened, and I'm generically skeptical about political things that go better than any political thing has ever gone before.
I guess we could talk about that? It doesn't seem like the most productive area, but maybe it lies upstream of more technical disagreements because we disagree about what AGI would actually have to do to have the world not end.
Cool. I claim it's time for a break, and then I nominate a little Eliezer gov't-response-overview followed by specific maybe-consequentialism-based-worries-aren't-a-problem-in-practice ideas from Richard.
See you in 28mins
5.3. Epistemology, and assessing the idea of expected utility
Ooops, didn't see this comment earlier. With respect to discovering unexpected order, one point that seems relevant is the extent to which that order provides predictive power. To what extent do you think that predictive successes in economics are important evidence for expected utility theory being a powerful formalism? (Or are there other ways in which it's predictively powerful that provide significant evidence?)
I'd be happy with a quick response to that, and then on geopolitics, here's a prompt to kick us off:
I think that the Apollo space program is much deeper evidence for Utility. Observe, if you train protein blobs to run around the savanna, they also go to the moon!
If you think of "utility" as having something to do with the human discipline called "economics" then you are still thinking of it in a much much much more narrow way than I do.
I'm not asking about evidence for utility as an abstraction in general, I'm asking for evidence based on successful predictions that have been made using it.
That doesn't tend to happen a lot, because all of the deep predictions that it makes are covered by shallow predictions that people made earlier.
Consider the following prediction of evolutionary psychology: Humans will enjoy activities associated with reproduction!
"What," says Simplicio, "you mean like dressing up for dates? I don't enjoy that part."
"No, you're overthinking it, we meant orgasms," says the evolutionary psychologist.
"But I already knew that, that's just common sense!" replies Simplicio.
"And yet it is very specifically a prediction of evolutionary psychology which is not made specifically by any other theory of human minds," replies the evolutionary psychologist.
"Not an advance prediction, just-so story, too obvious," replies Simplicio.
Yepp, I agree that most of its predictions won't be new. Yet evolution is a sufficiently powerful theory that people have still come up with a range of novel predictions that derive from it.
Insofar as you're claiming that expected utility theory is also very powerful, then we should expect that it also provides some significant predictions.
An advance prediction of the notion of Utility, I suppose, is that if you train an AI which is otherwise a large blob of layers - though this may be inadvisable for other reasons - to the point where it starts solving lots of novel problems, that AI will tend to value aspects of outcomes with weights, and weight possible paths through time (the dynamic progress of the environment), and use (by default, usually, roughly) the multiplication of these weights to allocate limited resources between mutually conflicting plans.
Again, I'm asking for evidence in the form of successful predictions.
I predict that people will want some things more than others, think some possibilities are more likely than others, and prefer to do things that lead to stuff they want a lot through possibilities they think are very likely!
It would be very strange to me if a theory which makes such strong claims about things we can't yet verify can't shed light on anything which we are in a position to verify.
If you think I'm deriving my predictions of catastrophic alignment failure through something more exotic than that, you're missing the reason why I'm so worried. It doesn't take intricate complicated exotic assumptions.
It makes the same kind of claims about things we can't verify yet as it makes about things we can verify right now.
But that's very easy to do! Any theory can do that.
For example, if somebody wants money, and you set up a regulation which prevents them from making money, it predicts that the person will look for a new way to make money that bypasses the regulation.
And yes, of course fitting previous data is important evidence in favour of a theory
False! Any theory can do that in the hands of a fallible agent which invalidly, incorrectly derives predictions from the theory.
Well, indeed. But the very point at hand is whether the predictions you base on this theory are correctly or incorrectly derived.
It is not the case that every theory does an equally good job of predicting the past, given valid derivations of predictions.
Well, hence the analogy to evolutionary psychology. If somebody doesn't see the blatant obviousness of how sexual orgasms are a prediction specifically of evolutionary theory, because it's "common sense" and "not an advance prediction", what are you going to do? We can, in this case, with a lot more work, derive more detailed advance predictions about degrees of wanting that correlate in detail with detailed fitness benefits. But that's not going to convince anybody who overlooked the really blatant and obvious primary evidence.
What they're missing there is a sense of counterfactuals, of how the universe could just as easily have looked if the evolutionary origins of psychology were false: why should organisms want things associated with reproduction, why not instead have organisms running around that want things associated with rolling down hills?
Similarly, if optimizing complicated processes for outcomes hard enough, didn't produce cognitive processes that internally mapped paths through time and chose actions conditional on predicted outcomes, human beings would... not think like that? What am I supposed to say here?
Let me put it this way. There are certain traps that, historically, humans have been very liable to fall into. For example, seeing a theory, which seems to match so beautifully and elegantly the data which we've collected so far, it's very easy to dramatically overestimate how much that data favours that theory. Fortunately, science has a very powerful social technology for avoiding this (i.e. making falsifiable predictions) which seems like approximately the only reliable way to avoid it - and yet you don't seem concerned at all about the lack of application of this technology to expected utility theory.
This is territory I covered in the Sequences, exactly because "well it didn't make a good enough advance prediction yet!" is an excuse that people use to reject evolutionary psychology, some other stuff I covered in the Sequences, and some very predictable lethalities of AGI.
With regards to evolutionary psychology: yes, there are some blatantly obvious ways in which it helps explain the data available to us. But there are also many people who have misapplied or overapplied evolutionary psychology, and it's very difficult to judge whether they have or have not done so, without asking them to make advance predictions.
I talked about the downsides of allowing humans to reason like that, the upsides, the underlying theoretical laws of epistemology (which are clear about why agents that reason validly or just unbiasedly would do that without the slightest hiccup), etc etc.
In the case of the theory "people want stuff relatively strongly, predict stuff relatively strongly, and combine the strengths to choose", what kind of advance prediction that no other theory could possibly make, do you expect that theory to make?
In the worlds where that theory is true, how should it be able to prove itself to you?
I expect deeper theories to make more and stronger predictions.
I'm currently pretty uncertain if expected utility theory is a deep or shallow theory.
But deep theories tend to shed light in all sorts of unexpected places.
The fact is, when it comes to AGI (general optimization processes), we have only two major datapoints in our dataset, natural selection and humans. So you can either try to reason validly about what theories predict about natural selection and humans, even though we've already seen the effects of those; or you can claim to give up in great humble modesty while actually using other implicit theories instead to make all your predictions and be confident in them.
I'm familiar with your writings on this, which is why I find myself surprised here. I could understand a perspective of "yes, it's unfortunate that there are no advanced predictions, it's a significant weakness, I wish more people were doing this so we could better understand this vitally important theory". But that seems very different from your perspective here.
Oh, I'd love to be making predictions using a theory that made super detailed advance predictions made by no other theory which had all been borne out by detailed experimental observations! I'd also like ten billion dollars, a national government that believed everything I honestly told them about AGI, and a drug that raises IQ by 20 points.
The very fact that we have only two major datapoints is exactly why it seems like such a major omission that a theory which purports to describe intelligent agency has not been used to make any successful predictions about the datapoints we do have.
This is making me think that you imagine the theory as something much more complicated and narrow than it is.
Just look at the water.
Not very special water with an index.
Just regular water.
People want stuff. They want some things more than others. When they do stuff they expect stuff to happen.
|These are predictions of the theory. Not advance predictions, but predictions nonetheless.|
I'm accepting your premise that it's something deep and fundamental, and making the claim that deep, fundamental theories are likely to have a wide range of applications, including ones we hadn't previously thought of.
|Do you disagree with that premise, in general?|
I don't know what you really mean by "deep fundamental theory" or "wide range of applications we hadn't previously thought of", especially when it comes to structures that are this simple. It sounds like you're still imagining something I mean by Expected Utility which is some narrow specific theory like a particular collection of gears that are appearing in lots of places.
Are numbers a deep fundamental theory?
Is addition a deep fundamental theory?
Is probability a deep fundamental theory?
Is the notion of the syntax-semantics correspondence in logic and the notion of a generally semantically valid reasoning step, a deep fundamental theory?
Yes to the first three, all of which led to very successful novel predictions.
What's an example of a novel prediction made by the notion of probability?
Most applications of the central limit theorem.
Then I should get to claim every kind of optimization algorithm which used expected utility, as a successful advance prediction of expected utility? Optimal stopping and all the rest? Seems cheap and indeed invalid to me, and not particularly germane to whether these things appear inside AGIs, but if that's what you want, then sure.
I agree that it is a prediction of the theory. And yet it's also the case that smarter people than either of us have been dramatically mistaken about how well theories fit previously-collected data. (Admittedly we have advantages which they didn't, like a better understanding of cognitive biases - but it seems like you're ignoring the possibility of those cognitive biases applying to us, which largely negates those advantages.)
I'm not ignoring it, just adjusting my confidence levels and proceeding, instead of getting stuck in an infinite epistemic trap of self-doubt.
I don't live in a world where you either have the kind of detailed advance experimental predictions that should convince the most skeptical scientist and render you immune to all criticism, or, alternatively, you are suddenly in a realm beyond the reach of all epistemic authority, and you ought to cuddle up into a ball and rely only on wordless intuitions and trying to put equal weight on good things happening and bad things happening.
I live in a world where I proceed with very strong confidence if I have a detailed formal theory that made detailed correct advance predictions, and otherwise go around saying, "well, it sure looks like X, but we can be on the lookout for a miracle too".
If this was a matter of thermodynamics, I wouldn't even be talking like this, and we wouldn't even be having this debate.
I'd just be saying, "Oh, that's a perpetual motion machine. You can't build one of those. Sorry." And that would be the end.
Meanwhile, political superforecasters go on making well-calibrated predictions about matters much murkier and more complicated than these, often without anything resembling a clearly articulated theory laid forth at length, let alone one that had made specific predictions even retrospectively. They just go do it instead of feeling helpless about it.
These seem better than nothing, but still fairly unsatisfying, insofar as I think they are related to more shallow properties of the theory.
Hmm, I think you're mischaracterising my position. I nowhere advocated for feeling helpless or curling up in a ball. I was just noting that this is a particularly large warning sign which has often been valuable in the past, and it seemed like you were not only speeding past it blithely, but also denying the existence of this category of warning signs.
I think you're looking for some particular kind of public obeisance that I don't bother to perform internally because I'd consider it a wasted motion. If I'm lost in a forest I don't bother going around loudly talking about how I need a forest theory that makes detailed advance experimental predictions in controlled experiments, but, alas, I don't have one, so now I should be very humble. I try to figure out which way is north.
When I have a guess at a northerly direction, it would then be an error to proceed with as much confidence as if I'd had a detailed map and had located myself upon it.
Insofar as I think we're less lost than you do, then the weaknesses of whichever forest theory implies that we're lost are relevant for this discussion.
The obeisance I make in that direction is visible in such statements as, "But this, of course, is a prediction about the future, which is well-known to be quite difficult to predict, in fact."
If my statements had been matters of thermodynamics and particle masses, I would not be adding that disclaimer.
But most of life is not a statement about particle masses. I have some idea of how to handle that. I do not need to constantly recite disclaimers to myself about it.
I know how to proceed when I have only a handful of data points which have already been observed and my theories of them are retrospective theories. This happens to me on a daily basis, eg when dealing with human beings.
(I have a bit of a sense that we're going in a circle. It also seems to me like there's some talking-past happening.)
(I suggest a 5min break, followed by EY attempting to paraphrase RN to his satisfaction and vice versa.)
I'd have more trouble than usual paraphrasing RN because epistemic helplessness is something I find painful to type out.
(I'm also happy to attempt to paraphrase each point as I see it; it may be that this smooths over some conversational wrinkle.)
Seems like a good suggestion. I'm also happy to move on to the next topic. This was meant to be a quick clarification.
nod. It does seem to me like it possibly contains a decently sized meta-crux, about what sorts of conclusions one is licensed to draw from what sorts of observations
that, eg, might be causing Eliezer's probabilities to concentrate but not Richard's.
Yeah, this is in the opposite direction of "more specificity".
I frankly think that most EAs suck at explicit epistemology, OpenPhil and FHI affiliated EAs are not much of an exception to this, and I expect I will have more luck talking people out of specific errors than talking them out of the infinite pit of humble ignorance considered abstractly.
Ok, that seems to me like a light bid to move to the next topic from both of you, my new proposal is that we take a 5min break and then move to the next topic, and perhaps I'll attempt to paraphrase each point here in my notes, and if there's any movement in the comments there we can maybe come back to it later.
Broadly speaking I am also strongly against humble ignorance (albeit to a lesser extent than you are).
I'm off to take a 5-minute break, then!
5.4. Government response and economic impact
A meta-level note: I suspect we're around the point of hitting significant diminishing marginal returns from this format. I'm open to putting more time into the debate (broadly construed) going forward, but would probably want to think a bit about potential changes in format.
[Soares][14:04, moved two up in log]
I actually think that may just be a matter of at least one of us, including Nate, having to take on the thankless job of shutting down all digressions into abstractions and the meta-level.
I'm not so sure about this, because it seems like some of the abstractions are doing a lot of work.
Anyways, government reactions?
It seems to me like the best observed case for government reactions - which I suspect is no longer available in the present era as a possibility - was the degree of cooperation between the USA and Soviet Union about avoiding nuclear exchanges.
This included such incredibly extravagant acts of cooperation as installing a direct line between the President and Premier!
which is not what I would really characterize as very "deep" cooperation, but it's more than a lot of cooperation you see nowadays.
More to the point, both the USA and Soviet Union proactively avoided doing anything that might lead towards starting down a path that led to a full nuclear exchange.
The question I asked earlier:
They still provoked one another a lot, but, whenever they did so, tried to do so in a way that wouldn't lead to a full nuclear exchange.
It was mutually understood to be a strategic priority and lots of people on both sides thought a lot about how to avoid it.
I don't know if that degree of cooperation ever got to the fantastic point of having people from both sides in the same room brainstorming together about how to avoid a full nuclear exchange, because that is, like, more cooperation than you would normally expect from two governments, but it wouldn't shock me to learn that this had ever happened.
It seems obvious to me that if some situation developed nowadays which increased the profile possibility of a nuclear exchange between the USA and Russia, we would not currently be able to do anything like installing a Hot Line between the US and Russian offices if such a Hot Line had not already been installed. This is lost social technology from a lost golden age. But still, it's not unreasonable to take this as the upper bound of attainable cooperation; it's been observed within the last 100 years.
Another guess for how governments react is a very simple and robust one backed up by a huge number of observations:
They have the same kind of advance preparation and coordination around AGI, in advance of anybody getting killed, as governments had around the mortgage crisis of 2007 in advance of any mortgages defaulting.
I am not sure I'd put this probability over 50% but it's certainly by far the largest probability over any competitor possibility specified to an equally low amount of detail.
I would expect anyone whose primary experience was with government, who was just approaching this matter and hadn't been talked around to weird exotic views, to tell you the same thing as a matter of course.
Is this also your upper bound conditional on a world that has experienced a century's worth of changes within a decade, and in which people are an order of magnitude wealthier than they currently are?
which one was this? US/UK?
Assuming governments do react, we have the problem of "What kind of heuristic could have correctly led us to forecast that the US's reaction to a major pandemic would be for the FDA to ban hospitals from doing in-house Covid tests? What kind of mental process could have led us to make that call?" And we couldn't have gotten it exactly right, because the future is hard to predict; the best heuristic I've come up with, that feels like it at least would not have been surprised by what actually happened, is, "The government will react with a flabbergasting level of incompetence, doing exactly the wrong thing, in some unpredictable specific way."
I think if we're talking about any single specific government like the US or UK then the probability is over 50% that they don't react in any advance coordinated way to the AGI crisis, to a greater and more effective degree than they "reacted in an advance coordinated way" to pandemics before 2020 or mortgage defaults before 2007.
|Maybe some two governments somewhere on Earth will have a high-level discussion between two cabinet officials.|
That's one lesson you could take away. Another might be: governments will be very willing to restrict the use of novel technologies, even at colossal expense, in the face of even a small risk of large harms.
I just... don't know what to do when people talk like this.
It's so absurdly, absurdly optimistic.
It's taking a massive massive failure and trying to find exactly the right abstract gloss to put on it that makes it sound like exactly the right perfect thing will be done next time.
This just - isn't how to understand reality.
This isn't how superforecasters think.
This isn't sane.
(be careful about ad hominem)
(Richard might not be doing the insane thing you're imagining, to generate that sentence, etc)
Right, I'm not endorsing this as my mainline prediction about what happens. Mainly what I'm doing here is highlighting that your view seems like one which cherrypicks pessimistic interpretations.
That abstract description "governments will be very willing to restrict the use of novel technologies, even at colossal expense, in the face of even a small risk of large harms" does not in fact apply very well to the FDA banning hospitals from using their well-established in-house virus tests, at risk of the alleged harm of some tests giving bad results, when in fact the CDC's tests were giving bad results and much larger harms were on the way because of bottlenecked testing; and that abstract description should have applied to an effective and globally coordinated ban against gain-of-function research, which didn't happen.
Alternatively: what could have led us to forecast that many countries will impose unprecedentedly severe lockdowns.
Well, I didn't! I didn't even realize that was an option! I thought Covid was just going to rip through everything.
(Which, to be clear, it still may, and Delta arguably is in the more primitive tribal areas of the USA, as well as many other countries around the world that can't afford vaccines financially rather than epistemically.)
|But there's a really really basic lesson here about the different style of "sentences found in political history books" rather than "sentences produced by people imagining ways future politics could handle an issue successfully".|
|Reality is so much worse than people imagining what might happen to handle an issue successfully.|
I might nudge us away from covid here, and towards the questions I asked before.
This being one.
And this being the other.
I don't expect this to happen at all, or even come remotely close to happening; I expect AGI to kill everyone before self-driving cars are commercialized.
[Yudkowsky][16:29] (Nov. 14 follow-up comment)
(This was incautiously put; maybe strike "expect" and put in "would not be the least bit surprised if" or "would very tentatively guess that".)
ah, I see
Okay, maybe here's a different angle which I should have been using. What's the most impressive technology you expect to be commercialised before AGI kills everyone?
Very hard to say; the UK is friendlier but less grown-up. We would obviously be VASTLY safer in any world where only two centralized actors (two effective decision processes) could ever possibly build AGI, though not safe / out of the woods / at over 50% survival probability.
Vastly safer and likewise impossibly miraculous, though again, not out of the woods at all / not close to 50% survival probability.
This is incredibly hard to predict. If I actually had to predict this for some reason I would probably talk to Gwern and Carl Shulman. In principle, there's nothing preventing me from knowing something about Go which lets me predict in 2014 that Go will probably fall in two years, but in practice I did not do that and I don't recall anybody else doing it either. It's really quite hard to figure out how much cognitive work a domain requires and how much work known AI technologies can scale to with more compute, let alone predict AI breakthroughs.
I'd be happy with some very rough guesses
If you want me to spin a scifi scenario, I would not be surprised to find online anime companions carrying on impressively humanlike conversations, because this is a kind of technology that can be deployed without major corporations signing on or regulatory approval.
Okay, this is surprising; I expected something more advanced.
Arguably AlphaFold 2 is already more advanced than that, along certain dimensions, but it's no coincidence that afaik people haven't really done much with AlphaFold 2 and it's made no visible impact on GDP.
I expect GDP not to depart from previous trendlines before the world ends, would be a more general way of putting it.
you mean least impressive?
That seems like a structurally easier question to answer
"Most impressive" is trivial. "Dyson Spheres" answers it.
Or, for that matter, "perpetual motion machines".
Ah yes, I was thinking that Dyson spheres were a bit too prosaic
My model mainly rules out that we get to certain points and then hang around there for 10 years while the technology gets perfected, commercialized, approved, adopted, ubiquitized enough to produce a visible trendline departure on the GDP graph; not so much various technologies themselves being initially demonstrated in a lab.
I expect that the people who build AGI can build a self-driving car if they want to. Getting it approved and deployed before the world ends is quite another matter.
OpenAI has commercialised GPT-3
Hasn't produced much of a bump in GDP as yet.
I wasn't asking about that, though
I'm more interested in judging how hard you think it is for AIs to take over the world
I note that it seems to me like there is definitely a kind of thinking here, which, if told about GPT-3 five years ago, would talk in very serious tones about how much this technology ought to be predicted to shift GDP, and whether we could bet on that.
By "take over the world" do you mean "turn the world into paperclips" or "produce 10% excess of world GDP over predicted trendlines"?
Turn world into paperclips
I expect this mainly happens as a result of superintelligence, which is way up in the stratosphere far above the minimum required cognitive capacities to get the job done?
The interesting question is about humans trying to deploy a corrigible AGI thinking in a restricted domain, trying to flip the gameboard / "take over the world" without full superintelligence?
I'm actually not sure what you're trying to get at here.
(my guess, for the record, is that the crux Richard is attempting to drive for here, is centered more around something like "will humanity spend a bunch of time in the regime where there are systems capable of dramatically increasing world GDP, and if not how can you be confident of that from here")
This is not the sort of thing I feel Confident about.
[Yudkowsky][16:31] (Nov. 14 follow-up comment)
(My confidence here seems understated. I am very pleasantly surprised if we spend 5 years hanging around with systems that can dramatically increase world GDP and those systems are actually being used for that. There isn't one dramatic principle which prohibits that, so I'm not Confident, but it requires multiple nondramatic events to go not as I expect.)
Yeah, that's roughly what I'm going for. Or another way of putting it: we have some disagreements about the likelihood of humans being able to get an AI to do a pivotal act which saves the world. So I'm trying to get some estimates for what the hardest act you think humans can get an AI to do is.
(and that a difference here causes, eg, Richard to suspect the relevant geopolitics happen after a century of progress in 10y, everyone being suddenly much richer in real terms, and a couple of warning shots, whereas Eliezer expects the relevant geopolitics to happen the day after tomorrow, with "realistic human-esque convos" being the sort of thing we get in stead of warning shots)
I mostly do not expect pseudo-powerful but non-scalable AI powerful enough to increase GDP, hanging around for a while. But if it happens then I don't feel I get to yell "what happened?" at reality, because there's an obvious avenue for it to happen: something GDP-increasing proved tractable to non-deeply-general AI systems.
where GPT-3 is "not deeply general"
Again, I didn't ask about GDP increases, I asked about impressive acts (in order to separate out the effects of AI capabilities from regulatory effects, people-having-AI-but-not-using-it, etc).
Where you can use whatever metric of impressiveness you think is reasonable.
so there's two questions here, one of which is something like, "what is the most impressive thing you can do while still being able to align stuff and make it corrigible", and one of which is "if there's an incorrigible AI whose deeds are being exhibited by fools, what impressive things might it do short of ending the world".
and these are both problems that are hard for the same reason I did not predict in 2014 that Go would fall in 2016; it can in fact be quite hard - even with a domain as fully lawful and known as Go - to figure out which problems will fall to which level of cognitive capacity.
Nate's attempted rephrasing: EY's model might not be confident that there's not big GDP boosts, but it does seem pretty confident that there isn't some "half-capable" window between the shallow-pattern-memorizer stuff and the scary-laserlike-consequentialist stuff, and in particular Eliezer seems confident humanity won't slowly traverse that capability regime
that's... allowed? I don't get to yell at reality if that happens?
and (shakier extrapolation), that regime is where a bunch of Richard's hope lies (eg, in the beginning of that regime we get to learn how to do practical alignment, and also the world can perhaps be saved midway through that regime using non-laserlike-systems)
so here's an example of a thing I don't think you can do without the world ending: get an AI to build a nanosystem or biosystem which can synthesize two strawberries identical down to the cellular but not molecular level, and put them on a plate
this is why I use this capability as the definition of a "powerful AI" when I talk about "powerful AIs" being hard to align, if I don't want to start by explicitly arguing about pivotal acts
this, I think, is going to end up being first doable using a laserlike world-ending system
so even if there's a way to do it with no lasers, that happens later and the world ends before then
Okay, that's useful.
it feels like the critical bar there is something like "invent a whole engineering discipline over a domain where you can't run lots of cheap simulations in full detail"
(Meta note: let's wrap up in 10 mins? I'm starting to feel a bit sleepy.)
This seems like a pretty reasonable bar
Let me think a bit about where to go from that
While I'm doing so, since this question of takeoff speeds seems like an important one, I'm wondering if you could gesture at your biggest disagreement with this post: https://sideways-view.com/2018/02/24/takeoff-speeds/
Oh, also in terms of scifi possibilities, I can imagine seeing 5% GDP loss because text transformers successfully scaled to automatically filing lawsuits and environmental impact objections.
My read on the entire modern world is that GDP is primarily constrained by bureaucratic sclerosis rather than by where the technological frontiers lie, so AI ends up impacting GDP mainly insofar as it allows new ways to bypass regulatory constraints, rather than insofar as it allows new technological capabilities. I expect a sudden transition to paperclips, not just because of how fast I expect cognitive capacities to scale over time, but because nanomachines eating the biosphere bypass regulatory constraints, whereas earlier phases of AI will not be advantaged relative to all the other things we have the technological capacity to do but which aren't legal to do.
[Shah][12:13] (Sep. 21 follow-up comment)
This is a fair point and updates me somewhat towards fast takeoff as operationalized by Paul, though I'm not sure how much it updates me on p(doom).
Er, wait, really fast takeoff as operationalized by Paul makes less sense as a thing to be looking for -- presumably we die before any 1 year doubling. Whatever, it updates me somewhat towards "less deployed stuff before scary stuff is around"
Ah, interesting. What are the two or three main things in that category?
mRNA vaccines, building houses, building cities? Not sure what you mean there.
"things we have the technological capacity to do but which aren't legal to do"
Eg, you might imagine, "What if AIs were smart enough to build houses, wouldn't that raise GDP?" and the answer is that we already have the pure technology to manufacture homes cheaply, but the upright-stick-construction industry already successfully lobbied to get it banned as it was starting to develop, by adding on various constraints; so the question is not "Is AI advantaged in doing this?" but "Is AI advantaged at bypassing regulatory constraints on doing this?" Not to mention all the other ways that building a house in an existing city is illegal, or that it's been made difficult to start a new city, etcetera.
"What if AIs could design a new vaccine in a day?" We can already do that. It's no longer the relevant constraint. Bureaucracy is the process-limiting constraint.
I would - looking in again at the Sideways View essay on takeoff speeds - wonder whether it occurred to you, Richard, to ask about what detailed predictions all the theories there had made.
After all, a lot of it is spending time explaining why the theories there shouldn't be expected to retrodict even the data points we have about progress rates over hominid evolution.
Surely you, being the evenhanded judge that you are, must have been reading through that document saying, "My goodness, this is even worse than retrodicting a few data points!"
A lot of why I have a bad taste in my mouth about certain classes of epistemological criticism is my sense that certain sentences tend to be uttered on incredibly selective occasions.
Some meta thoughts: I now feel like I have a pretty reasonable broad outline of Eliezer's views. I haven't yet changed my mind much, but plausibly mostly because I haven't taken the time to internalise those views; once I ruminate on them a bunch, I expect my opinions will shift (uncertain how far; unlikely to be most of the way).
Meta thoughts (continued): Insofar as a strong disagreement remains after that (which it probably will) I feel pretty uncertain about what would resolve it. Best guess is that I should write up some longer essays that try to tie a bunch of disparate strands together.
Near the end it seemed like the crux, to a surprising extent, hinged on this question of takeoff speeds. So the other thing which seems like it'd plausibly help a lot is Eliezer writing up a longer version of his response to Paul's Takeoff Speeds post.
(Just as a brief comment, I don't find the "bureaucratic sclerosis" explanation very compelling. I do agree that regulatory barriers are a huge problem, but they still don't seem nearly severe enough to cause a fast takeoff. I don't have strong arguments for that position right now though.)
This seems like a fine point to call it!
Some wrap-up notes
In particular, it seems maybe plausible to me we should have a pause for some offline write-ups, such as Richard digesting a bit and then writing up some of his current state, and/or Eliezer writing up some object-level response to the takeoff speed post above?
(I also could plausibly give that a go myself, either from my own models or from my model of Eliezer's model which he could then correct)
I endorse the idea of offline writeups
Cool. Then I claim we are adjourned for the day, and Richard has the ball on digesting & doing a write-up from his end, and I have the ball on both writing up my attempts to articulate some points, and on either Eliezer or I writing some takes on timelines or something.
(And we can coordinate our next discussion, if any, via email, once the write-ups are in shape.)
I also have a sense that there's more to be said about specifics of govt stuff or specifics of "ways to bypass consequentialism" and that I wish we could spend at least one session trying to stick to concrete details only
Even if it's not where cruxes ultimately lie, often you learn more about the abstract by talking about the concrete than by talking about the abstract.
(I, too, would be enthusiastic to see such a discussion, and Richard, if you find yourself feeling enthusiastic or at least not-despairing about it, I'd happily moderate.)
(I'm a little surprised about how poorly I did at staying concrete after saying that aloud, and would nominate Nate to take on the stern duty of blowing the whistle at myself or at both of us.)
I might be interested in this, but I'd be really helped by a TL;DR or similar providing some context on what's being discussed.
My comically oversimplified summary of the above conversation, which is not endorsed by Richard or Eliezer (and skips over a large number of topics and claims, doesn't try to stay close to the text, etc.):
R: I'm skeptical of your claim that capable-enough-to-save-the-world AI systems will (as a strong default) be means-end reasoners that approximate expected utility (EU) maximizers. (And therefore of your claim that like EU maximizers, as a strong default, they'll think about the long-term consequences of their actions and try to consistently "steer" the future in some direction -- properties that would be worrisome if they held, because they imply convergent instrumental goals like killing humans.)
In particular, I worry that you may be putting too much confidence in abstractions like expected utility, in the same way that you were too confident in recursive self-improvement (RSI) and missed that AI (e.g., GPT-3) could get pretty capable without it. The real world is messy, and abstractions like this often fail in surprising ways; so we should be correspondingly less confident that powerful future AGI systems will conform to the particular abstraction you're pointing at ("expected utility").
E: RSI still strikes me as just as good an abstraction as ever. It's true that I was surprised by how fast ML could advance without RSI, but RSI is properly a claim about what happens when AI gets sufficiently capable, not a claim 'there are no other ways to rapidly increase in capability'.
I see my error as 'giving too much attention to interesting complex ways things can go poorly, and neglecting the simple, banal ways things can go wrong earlier'. If I'm messing up, it's plausible that I haven't fully fixed that bias and am messing up in a similar way to that. But that doesn't make me think EU is a worse abstraction for its domain of applicability, or make me more optimistic about AI alignment.
R: If EU is a deep fundamental theory, then it should make some novel, verifiable predictions that other theories don't make.
E: EU makes plenty of mundane predictions about, e.g., how humans reason (via weighing futures according to probabilities, etc.), and how humans will tend to behave tomorrow (usually picking up $50 bills when they see them on the ground, etc.).
R: Those seem too obvious -- we already expected those things, so given things like hindsight bias, it's hard to know how much of an advantage those successful predictions should give EU over rival models, if any. I expect something more surprising and impressive, if EU really is a useful enough framework to let us make confident predictions about capable-enough-to-save-the-world AI systems.
E: Those sorts of prediction successes about everyday human behavior strike me as easily good enough, given that I'm not claiming the level of confidence of, e.g., a law of physics. I think you're being unreasonably skeptical here because, like a lot of EAs, you're overly skeptical about useful predictive abstractions, and overly credulous about modest-epistemology norms.
E: In general, I don't expect governments to prepare, coordinate, or exhibit any competence around AGI.
R: I feel more optimistic because before AGI, I think we might see (e.g.) a decade of non-dangerous AI radically transforming and enriching the world.
E: I don't expect that to happen at all, because (a) I don't expect the technology to go that way, and (b) I expect bureaucratic/regulatory obstacles to mostly prevent AI progress from hugely changing the world, until AGI saves or destroys the world.
Ok, that helps - a litte! - but it's still not quite at TL;DR. :)
tl;dr: Eliezer and Richard disagree about how hard alignment is, so they try to resolve that disagreement by talking about various things that might underlie the disagreement.