My personal cruxes for working on AI safety

Buck

The following is a heavily edited transcript of a talk I gave for the Stanford Effective Altruism club on 19 Jan 2020. I had rev.com transcribe it, and then Linchuan Zhang, Rob Bensinger and I edited it for style and clarity, and also to occasionally have me say smarter things than I actually said. Linch and I both added a few notes throughout. Thanks also to Bill Zito, Ben Weinstein-Raun, and Howie Lempel for comments.

I feel slightly weird about posting something so long, but this is the natural place to put it.

Over the last year my beliefs about AI risk have shifted moderately; I expect that in a year I'll think that many of the things I said here were dumb. Also, very few of the ideas here are original to me.

After all those caveats, here's the talk:

Introduction

It's great to be here. I used to hang out at Stanford a lot, fun fact. I moved to America six years ago, and then in 2015, I came to Stanford EA every Sunday, and there was, obviously, a totally different crop of people there. It was really fun. I think we were a lot less successful than the current Stanford EA iteration at attracting new people. We just liked having weird conversations about weird stuff every week. It was really fun, but it's really great to come back and see a Stanford EA which is shaped differently.

Today I'm going to be talking about the argument for working on AI safety that compels me to work on AI safety, rather than the argument that should compel you or anyone else. I'm going to try to spell out how the arguments are actually shaped in my head. Logistically, we're going to try to talk for about an hour with a bunch of back and forth and you guys arguing with me as we go. And at the end, I'm going to do miscellaneous Q and A for questions you might have.

And I'll probably make everyone stand up and sit down again because it's unreasonable to sit in the same place for 90 minutes.

Meta level thoughts

I want to first very briefly talk about some concepts I have that are about how you want to think about questions like AI risk, before we actually talk about AI risk.

Heuristic arguments

When I was a confused 15 year old browsing the internet around 10 years ago, I ran across arguments about AI risk, and I thought they were pretty compelling. The arguments went something like, "Well, sure seems like if you had these powerful AI systems, that would make the world be really different. And we don't know how to align them, and it sure seems like almost all goals they could have would lead them to kill everyone, so I guess some people should probably research how to align these things." This argument was about as sophisticated as my understanding went until a few years ago, when I was pretty involved with the AI safety community.

I in fact think this kind of argument leaves a lot of questions unanswered. It's not the kind of argument that is solid enough that you'd want to use it for mechanical engineering and then build a car. It's suggestive and heuristic, but it's not trying to cross all the T's and dot all the I's. And it's not even telling you all the places where there's a hole in that argument.

Ways heuristic arguments are insufficient

The thing which I think is good to do sometimes, is instead of just thinking really loosely and heuristically, you should try to have end-to-end stories of what you believe about a particular topic. And then if there are parts that you don't have answers to, you should write them down explicitly with question marks. I guess I'm basically arguing to do that instead of just saying, "Oh, well, an AI would be dangerous here." And if there's all these other steps as well, then you should write them down, even if you're just going to have your justification be question marks.

So here's an objection I had to the argument I gave before. AI safety is just not important if AI is 500 years away and whole-brain emulation or nanotechnology is going to happen in 20 years. Obviously, in that world, we should not be working on AI safety. Similarly, if some other existential risk might happen in 20 years, and AI is just definitely not going to happen in the next 100 years, we should just obviously not work on AI safety. I think this is pretty clear once I point it out. But it wasn't mentioned at all in my initial argument.

I think it's good to sometimes try to write down all of the steps that you have to make for the thing to actually work. Even if you're then going to say things like, "Well, I believe this because other EAs seem smart, and they seem to think this." If you're going to do that anyway, you might as well try to write down where you're doing it. So in that spirit, I'm going to present some stuff.

- [Guest] There's so many existential risks, like a nuclear war could show up at any minute.

- Yes.

- [Guest] So like, is there some threshold for the probability of an existential risk? What's your criteria for, among all the existential risks that exist, which ones to focus on?

- That's a great question, and I'm going to come back to it later.

- [Guest] Could you define a whole-brain emulation for the EA noobs?

- Whole-brain emulation is where you scan a human brain and run it on a computer. This is almost surely technically feasible; the hardest part is scanning human brains. There are a bunch of different ways you could try to do this. For example, you could imagine attaching a little radio transmitter to all the neurons in a human brain, and having them send out a little signal every time that neuron fires, but the problem with this is that if you do this, the human brain will just catch fire. Because if you just take the minimal possible energy in a radio transmitter, that would get the signal out, and then you multiply that by 100 billion neurons, you're like, "Well, that sure is a brain that is on fire." So you can't currently scan human brains and run them. We'll talk about this more later.

Thanks for the question. I guess I want to do a quick poll of how much background people are coming into this with. Can you raise your hand if you've spent more than an hour of thinking about AI risk before, or hearing talks about AI risk before?

Can you raise your hand if you know who Paul Christiano is, or if that name is familiar?Can you raise your hand if you knew what whole-brain emulation was before that question was asked?

Great. Can you raise your hand if you know what UDASSA is?

Great, wonderful.

I kind of wanted to ask a “seeing how many people are lying about things they know” question. I was considering saying a completely fake acronym, but I decided not to do that. I mean, it would have been an acronym for something, and they would have been like, "Why is Buck asking about that concept from theoretical biology?"

Ways of listening to a talk

All right, here's another thing. Suppose you're listening to a talk from someone whose job is thinking about AI risk. Here are two ways you could approach this. The first way is to learn to imitate my utterances. You could think, "Well, I want to know what Buck would say in response to different questions that people might ask him."

And this is a very reasonable thing to do. I often talk to someone who's smart. I often go talk to Paul Christiano, and I'm like, well, it's just really decision-relevant to me to know what Paul thinks about all these topics. And even if I don't know why he believes these things, I want to know what he believes.

Here’s the second way: You can take the things that I'm saying as scrap parts, and not try to understand what I overall believe about anything. You could just try to hear glimmers of arguments that I make, that feel individually compelling to you, such that if you had thought of that argument, you'd be like, "Yeah, this is a pretty solid argument." And then you can try and take those parts and integrate them into your own beliefs.

I'm not saying you should always do this one, but I am saying that at least sometimes, your attitude when someone's talking should be, "This guy's saying some things. Probably he made up half of them to confuse me, and probably he's an idiot, but I'm just going to listen to them, and if any of them are good, I'm going to try and incorporate them. But I'm going to assess them all individually."

Okay, that's the meta points. Ready for some cruxes on AI risk?

- [Guest] Just one clarification. So, does that mean then that the, in your belief, the whole-brain emulation is going to happen in 20 years?

- Sorry, what? I think whole-brain emulation is not going to happen in 20 years.

- [Guest] Okay, so the numbers you threw out were just purely hypothetical?

- Oh, yes, sorry, yes. I do in fact work on AI safety. But if I had these other beliefs, which I'm going to explain, then I would not work on AI safety. If I thought whole-brain emulation were coming sooner than AI, I would de-prioritize AI safety work.

- [Guest] Okay.

Norms

Something that would be great is that when I say things, you can write down things that feel uncompelling or confusing to you about the arguments. I think that's very healthy to do. A lot of the time, the way I'm going to talk is that I'm going to say something, and then I'm going to say the parts of it that I think are uncompelling. Like, the parts of the argument that I present that I think are wrong. And I think it's pretty healthy to listen out and try and see what parts you think are wrong. And then I'll ask you for yours.

Crux 1: AGI would be a big deal if it showed up here

Okay, AGI would be a big deal if it showed up here. So I'm going to say what I mean by this, and then I'm going to give a few clarifications and a few objections to this that I have.

This part feels pretty clear. Intelligence seems really important. Imagine having a computer that was very intelligent; it seems like this would make the world look suddenly very different. In particular, one major way that the world might be very different is: the world is currently very optimized by humans for things that humans want, and if I made some system, maybe it would be trying to make the world be a different way. And then maybe the world would be that very different way instead.

So I guess under this point, I want to say, "Well, if I could just have a computer do smart stuff, that's going to make a big difference to what the world is like, and that could be really good, or really bad.”

There's at least one major caveat to this, which I think is required for this to be true. I'm curious to hear a couple of people's confusion, or objections to this claim, and then I'll say the one that I think is most important, if none of you say it quickly enough.

- [Guest] What do you mean by "showed up here"? Because, to my mind, “AGI” actually means general intelligence, meaning that it can accomplish any task that a human can, or it can even go beyond that. So what do you mean by "showed up here"?

- Yeah, so by “here”, I guess I'm trying to cut away worlds that are very different from this one. So for instance, I think that if I just said, "AGI would be a big deal if it showed up", then I think this would be wrong. Because I think there are worlds were AGI would not be a big deal as much. For instance, what if we already have whole-brain emulation? I think in that world, AGI is a much smaller deal. So I'm trying to say that in worlds that don't look radically different from this one, AGI is a big deal.

- [Guest] So you're saying "if the world is identical, except for AGI"?

- That's a good way of putting it. If the world looks like this, kind of. Or if the world looks like what I, Buck, expect it to look in 10 years. And then we get AGI ⁠— that would be a really different world.

Any other objections? I've got a big one.

- [Guest] I'm a bit confused about how agency, and intelligence and consciousness relate, and how an intelligence would have preferences or ways it would want the world to be. Or, like, how broad this intelligence should be.

- Yeah!

I'm going to write down people's notes as I go, sometimes, irregularly, not correlated with whether I think they're good points or not.

- [Guest] Do you have definitions of “AGI” and “big deal”?

- “AGI”: a thing that can do all the kind of smart stuff that humans do. By “big deal”, I mean it basically is dumb to try to make plans that have phases which are concrete, and happen after the AGI. So, by analogy, almost all of my plans would seem like stupid plans if I knew that there was going to be a major alien invasion in a year. All of my plans that are, like, 5-year-time-scale plans are bad plans in the alien invasion world. That's what I mean by “big deal”.

[Post-talk note: Holden Karnofsky gives a related definition here: he defines “transformative AI” as “AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution”.]

- [Guest] I think one objection could be that if AGI were developed, we would be unable to get it to cooperate with us to do anything good, and it may have no interest in doing anything bad, in which case, it would not be a big deal.

- Yep, that makes sense. I personally don't think that's very likely, but that would be a way this could be wrong.

The main objection I have is that I didn't mention what the price of the AGI is. For instance, I think a really important question is "How much does it cost you to run your AGI for long enough for it to do the same intellectual labor that a human could do in an hour?" For instance, if it costs $1 million an hour: almost no human gets paid $1 million an hour for their brains. In fact, I think basically no human gets paid that much. I think the most money that a human ever makes in a year is a couple billion dollars. And there's approximately 2,000 working hours a year, which means that you're making $500,000 an hour. So max human wage is maybe $500,000 per hour. I would love it if someone checks the math on this.

[Linch adds: 500K * 2000 = 1 billion. I assume “couple billion” is more than one. Sanity check: Bezos has ~100 billion accumulated in ~20 years, so 5B/year; though unclear how much of Jeff Bezos' money is paid for his brain vs. other things like having capital/social capital. Also unclear how much Bezos should be valued at ex ante.]

So, a fun exercise that you can do is you can imagine that we have a machine that can do all the intellectual labor that a human can do, at some price, and then we just ask how the world looks different in that world. So for instance, in the world where that price is $500,000 an hour, that just does not change the world very much. Another one is: let's assume that this is an AGI that's as smart as the average human. I think basically no one wants to pay $500,000 an hour to an average human. I think that at $100 an hour, that's the price of a reasonably well-trained knowledge worker in a first-world country, ish. And so I think at that price, $100 an hour, life gets pretty interesting. And at the price of $10 an hour, it's really, really wild. I think at the price of $1 an hour, it's just absurd.

Fun fact: if you look at the computation that a human brain does, and you say, "How much would it cost me to buy some servers on AWS that run this much?", the price is something like $6 an hour, according to one estimate by people I trust. (I don’t think there’s a public citation available for this number, see here for a few other relevant estimates.) You estimate the amount of useful computational work done by the brain, using arguments about the amount of noise in various brain components to argue that the brain can't possibly be relying on more than three decimal places of accuracy of how hard a synapse is firing, or something like that, and then you look at how expensive it is to buy that much computing power. This is very much an uncertain median guess rather than a bound, and I think it is also somewhat lower than the likely price of running a whole brain emulation (for that, see “Whole Brain Emulation: A Roadmap”).

But yeah, $6 an hour. So the reason that we don't have AGI is not that we could make AGI as powerful as the brain, and we just don't because it's too expensive.

- [Guest] I'm just wondering, what's some evidence that can make us expect that AGI will be super expensive?

- Well, I don't know. I'm not particularly claiming that it will be particularly expensive to run. One thing that I am comfortable claiming is if something is extremely valuable, the first time that it happens, it's usually about that expensive, meaning you don't make much of a profit. There's some kind of economic efficiency argument that if you can make $1 million from doing something, and the price is steadily falling, people will probably first do it at the time when the price is about $1 million. And so an interesting question is: if I imagine in every year, people are being reasonable, then how much is the world different in the year when AGI costs you $2,500 an hour to run versus, like, $10 an hour to run?Another fun exercise, which I think is pretty good, is you can look at Moore's Law or something and say, "Well, let's just assume the price of a transistor costs something like this. It falls by a factor of two every 18 months. Let's suppose that one year it costs $10,000 an hour to run this thing, and then it halves every 18 months." And you look at how the world changes over time, and it's kind of an interesting exercise.

Other thoughts or objections?

- [Guest] Even if it's more expensive, if it’s ridiculously faster than a human brain, it could still be valuable.

- Yeah. So for instance, I know people who make a lot of money being traders. These people are probably mostly three standard deviations above average for a human. Some of these humans get paid thousands of dollars an hour, and also if you can just scale how fast they run, linearly in price, it would be worth it to run them many times faster. This is per hour of human labor, but possibly, you can get it faster in serial time. Like, another thing you probably want to do with them is have a bunch of unmanned submarines, where it's a lot less bad if your AI gets destroyed by a missile or something. Okay, any other thoughts?

- [Guest] So, yes, it wouldn’t necessarily be logical to run AGI if it was very expensive, but I still think people would do it, given that you have technology like quantum computers, which right now can't do anything that a normal computer can't do, and yet we pour millions and billions of dollars into building them and running them, and run all kinds of things on them.

- I mean, I think we don't pour billions of dollars. Tell me if I'm wrong, please. But I would have thought that we spend a couple tens of millions of dollars a year, and some of that is because Google is kind of stupid about this, and some of it is because the NSF funds dumb stuff. I could just be completely wrong.

- [Guest] Why is Google stupid about this?

- As in like, sociologically, what's wrong with them?

- [Guest] Yeah.

- I don't know. Whatever, I think quantum computing is stupid. Like, controversial opinion.

- [Guest] There was a bill to inject $1.2 billion into quantum.

- Into quantum.

- [Guest] I think I read it on Gizmodo. I remember when this happened. The U.S. government or someone — I don't know, Europe? — someone put a ton of money, like a billion dollars, into quantum research grants.

- Okay, but quantum... sorry, I'm not disagreeing with you, I'm just disagreeing with the world or something. Chemistry is just quantum mechanics of electrons. Maybe they just like that. I'd be curious if you could tell us. My guess is that we don't pour billions of dollars. The world economy is like $80 trillion a year, right? The U.S. economy's like $20 trillion a year.

- [Guest] Trump did in fact sign a $1.2 billion quantum computing bill.

- Well, that's stupid.

- [Guest] Apparently, this is because we don't want to fall behind in the race with China.

- Well, that's also stupid.

- [Guest] But I can see something similar happening with AGI.

- Yeah, so one thing is, it's not that dangerous if it costs a squillion billion dollars to run, because you just can't run it for long enough for anything bad to happen. So, I agree with your points. I think I'm going to move forward slightly after taking one last comment.

- [Guest] Do you have any examples of technologies that weren't a big deal, purely because of the cost?

- I mean, kind of everything is just a cost problem, right?

[Linch notes: We figured out alchemy in the early 1900s.]

- [Guest] Computers, at the beginning. Computers were so expensive that no one could afford them, except for like NASA.

- [Guest] Right, but over time, the cost decreased, so are you saying that...? Yeah, I'm just wondering, with AGI, it's like, reasonable to think maybe the initial version is very expensive, but then work will be put into it and it'll be less expensive. Is there any reason to believe that trend wouldn't happen for AGI?

- Not that I know of. My guess is that the world looks one of two ways. One is that either you have something like the cost of human intellectual labor folds by a factor of ten for a couple years, starting at way too expensive and ending at dirt cheap. Or it happens even faster. I would be very surprised if it's permanently too expensive to run AGI. Or, I'd be very, very, very surprised if we can train an AGI, but we never get the cost below $1 million.

And this isn't even because of the $6 an hour number. Like, I don't know man, brains are probably not perfect. It would just be amazing if evolution figured out a way to do it that's like a squillion times cheaper, but we still figure out a way to do it. Like, it just seems to me that the cost is probably going to mostly be in the training. My guess is that it costs a lot more to train your AGI than to run it. And in the world where you have to spend $500,000 an hour to run your thing, you probably had to spend fifty gazillion dollars to train it. And that would be the place where I expect it to fail.

You can write down your other objections, and then we can talk about them later.

Crux 2: AGI is plausibly soonish, and the next big deal

All right, here’s my next crux. AGI is plausibly soon-ish, as in, less than 50 years, and the next big deal. Okay, so in this crux I want to argue that AGI might happen relatively soon, and also, it might happen before one of the other crazy things happen that would mean we should only focus on that thing instead.

So a couple of things that people have already mentioned, or that I mentioned, as potentially crazy things that would change the world. There's whole-brain emulation. Can other people name some other things that would make the world radically different if they happened?

- [Guest] Very widespread genetic engineering.

- Yeah, that seems right. By the way, the definition of “big deal” that I want you guys to use is “you basically should not make specific concrete plans which have steps that happen after that thing happens”. I in fact think that widespread and wildly powerful genetic engineering of humans is one, such that you should not have plans that go after when the widespread genetic engineering happens, or you shouldn't have specific plans.

- [Guest] Nuclear war.

- Yeah, nuclear war. Maybe other global catastrophic risks. So anything which looks like it might just really screw up what the world looks like. Anything which might kill a billion people. If something's going to kill a billion people, it seems plausible that that's really important and you should work on that instead. It's not like a total slam dunk that you should work on that instead, but it seems plausible at least. Yeah, can I get some more?

- [Guest] What about nuclear fusion? I read an article saying that if any government could get that kind of technology, it could potentially trigger a war, just because it breaks the balance of power that is currently in place in international politics.

- Yeah, I can imagine something like that happening, maybe. I want to put that somewhat under other x-risks, or nuclear war. Another kind of thing that feels like is an example of destabilization of power. But destabilization of various types mostly is a thing because it leads to x-risk.

- [Guest] Do you consider P = NP to be such for that?

- Depends on how good the algorithm is. [Linch: The proof might also not be constructive.]

- [Guest] Yeah, it depends. In public key cryptography, there’s...

- I don't really care about public key cryptography breaking... If P = NP, and there's just like a linear time algorithm for like... If you can solve SAT problems of linear size and linear time, apologies for the jargon, I think that's just like pretty close to AGI. Or that's just like — if you have that technology, you can just solve any machine learning problem you want, by saying, "Hey, can you tell me the program which does the best on this particular score?" And that's just a SAT problem. I think that it is very unlikely that there's just like a really fast, linear time, SAT solving algorithm. Yeah, that's an interesting one. Any others?

- [Guest] Like a plague, or a famine. Or like, terrible effects of climate change, or like a super volcano.

- Okay.

Natural x-risks, things that would kill everyone, empirically don't happen that often. You can look at the earth, and you can be like, "How often have things happened that would have killed everyone if they happened now?" And the answer's like, a couple times. Natural disasters which would qualify as GCRs but not x-risks are probably also rare enough that I am not that worried about them. So I think it’s most likely that catastrophic disasters that happen soon will be a result of technologies which were either invented relatively recently (eg nukes) or haven’t been developed yet.

In the case of climate change, we can’t use that argument, because climate change is anthropogenic; however, my sense is that experts think that climate change is quite unlikely to cause enough damage to be considered a GCR.

Another one I want to include is sketchy dystopias. We have never had an evil empire which has immortal god emperors, and perfect surveillance, and mind reading and lie detection. There's no particular technical reason why you can't have all these things. They might all be a lot easier than AGI. I don't know, this seems like another one.

If I had to rank these in how likely they seem to break this claim, I'd rank them from most to least likely as:

Various biosecurity risks
Stable dystopias, nuclear war or major power war, whole brain emulation
Climate change
Super volcanos, asteroids

I want to say why I think AI risk is more likely than these things. Or getting AGI is more likely earlier.

But before I say that, you see how I wrote less than 50 years here? Even if I thought the world in 100 years was going to just be like the world like it is now, except with mildly better iPhones — maybe mildly worse iPhones, I don't know, it's not clear what the direction the trend is... I don't know. Affecting the world in 100 years seems really hard.

And it seems to me that the stories that I have for how my work ends up making a difference to the world, most of those are just look really unlikely to work if AGI is more than 50 years off. It's really hard to do research that impacts the world positively more than 50 years down the road. It's particularly hard to do research that impacts a single event that happens 50 years in the future, positively. I just don't think I can very likely do that. And if I learned that there was just no way we were going to have AGI in the next 50 years, I would then think, "Well, I should probably really rethink my life plans."

AI timelines

Okay, so here's a fun question. When are we going to get AGI? Here's some ways of thinking about it.

One of them is Laplace's Law of Succession. This one is: there is some random variable. It turns out that every year that people try to build an AGI, God draws a ball from an urn. And we see if it's white or black. And if it's white, he gives us an AGI. And if it's black, he doesn't give us an AGI. And we don't know what proportion of balls in the urn are black. So we're going to treat that as a random parameter between zero and one.

Now, the first year, your prior on this parameter theta, which is the proportion of years that God gives you an AGI — the first year, you have a uniform prior. The second year, you're like, "Well, it sure seems like God doesn't give us an AGI every year, because he didn't give us one last year." And I end up with a posterior where you’ve updated totally against the “AGI every year” hypothesis, and not at all against the “AGI never” hypothesis. And the next year, when you don't get an AGI you update against, and against, and against.

So this is one way to derive Laplace's Law of Succession. And if you use Laplace's Law of Succession, then it means that after 60 years of trying to build an AGI, there is now a 1 in 62 chance that you get an AGI next year. So you can say, "Okay. Let's just use Laplace's Law of Succession to estimate time until AGI." And this suggests that the probability of AGI in the next 50 years is around 40%. This is not the best argument in the world, but if you're just trying to make arguments that are at least kind of vaguely connected to things, then Laplace's Law of Succession says 40%.

- [Guest] What's your threshold for even including such an argument in your overall thought process? I'm guessing there are a lot of arguments at that level of... I don't know.

- I think there are fewer than 10 arguments that are that simple and that good.

- [Guest] This really depends on the size of the step you chose. You chose “one year” arbitrarily. It could have been one second ⁠— God draws a ball a second.

- No, that's not it. There's a limit, because in that case, if I choose my shorter time steps, then it's less likely that God draws me a ball in the next time step. But I also get to check more time steps over the next year.

- [Guest] I see.

- [Guest 2] “Poisson process” is the word you're looking for, I think.

- Yes, this is a Poisson process.

- [Guest] How is this argument different for anything else, really? Is the input parameter..

So you might say, what does this say about the risk of us summoning a demon next year? I'm going to say, "Well, we've been trying to summon demons for a long, long while. — Like 5,000 years.” I don’t know... I agree.

Here's another way you can do the Laplace's Law of Succession argument. I gave the previous argument based on years of research since 1960, because that's when the first conference on AI was. You could also do it on researcher years. As in: God draws from the urn every time a researcher finishes their year of thinking about AI. And in this model, I think that you get a 50% chance in 10 years or something insane like that, maybe less. Because there are so many more researchers now than there used to be. So I think this one gives you ⁠— I'm going to say the medians ⁠— this one gives you around 60 years, which just like, Laplace's Law of Succession always says you should wait as long as it's been so far. On researcher years, you get like 10 years or less.

All right, here are some other models you can use. I'm just going to name some quickly. One thing you can do is, you can ask, "Look, how big is a human brain? Now, let's pretend AGI will be a neural net. How much compute is required to train a policy that is that big? When will we have that amount of compute?” And you can do these kind of things. Another approach is, "How big is the human genome? How long does it take to train a policy that big?" Whatever, you do a lot of shit like this.

Honestly, the argument that's compelling to me right now is the following. Maybe to build an AGI, you need to have pretty good machine learning, in the kind of way that you have today. Like, you have to have machine learning that's good enough to learn pretty complex patterns, and then you have to have a bunch of smart people who from when they were 18, decided they were going to try and do really cool machine learning research in college. And then the smart people decide they're going to try and build AGIs. And if this is the thing that you think is the important input to the AGI creation process, then I think that you notice the amount of smart 18 year olds who decided they wanted to go into AGI is way higher than it used to be. It's probably 10 times higher than it was 10 years ago.

And if you have Laplace's Law of Succession over how many smart 18 year olds who turn into researchers are required before you get the AGI, then that also gives you pretty reasonable probabilities of AGI pretty soon. It ends up with me having... today, I'm feeling ~70% confident of AGI in the next 50 years.

Why do I think it's more likely than one of these other things? Basically, because it seems like it's pretty soon.

It seems like whole-brain emulation isn't going to happen that soon. Genetic engineering, I don't know, and I don't want to talk about it right now. Bio risk ⁠— there are a lot of people whose job is making really powerful smart ML systems. There are not very many people whose job is trying to figure out how to kill everyone using bioweapons. This just feels like the main argument for why AI is more urgent; it's just really hard for me to imagine a world where people don't try to build really smart ML systems. It's not that hard for me to imagine a world where no very smart person ever dedicates their life to trying really hard to figure out how to kill everyone using synthetic biology. Like, there aren't that many really smart people who want to kill everyone.

- [Guest] Why aren't you worried about nuclear war? Like, people killing the U.S. and having nuclear war and a bunch of places where there are AI researchers, and then it just slows it down for awhile. Why think this is not that concerning?

- Ah, seems reasonably unlikely to happen. Laplace's Law of Succession. We've had nuclear weapons for 80 years. (laughs)

Okay, you were like, "Why are you using this Laplace's Law of Succession argument?" And I'm like, look. When you're an idiot, if you have Laplace's Law of Succession arguments, you're at least limiting how much of an idiot you can be. I think there are just really bad predictors out there. There are people who are just like, "I think we'll get into a nuclear war with China in the next three years, with a 50% probability." And the thing is, I think that it actually is pretty healthy to be like, "Laplace's Law of Succession. Is your current situation really all that different from all the other three-year periods since we've had nuclear weapons?"

[Linch notes: Anthropics seems like a nontrivial concern, especially if we're conditioning on observer moments (or "smart observer moments") rather than literally "years at least one human is alive".]

- [Guest] Strictly, it places no limit on how much of an idiot you can be. Because you can modify your prior to get any posterior, using Laplace's Law of Succession, if you're careful. Basically. So, if you can justify using a uniform prior, then maybe it limits how much of an idiot you can be, but I don't think that if a uniform prior yields idiocy, then, I'm not sure it does place a limit.

- For some reason, I feel like people who do this end up being less an idiot, empirically.

- [Guest] Okay, that's fine.

- All right, we're going to stand up, and jump up and down five times. And then we're going to sit down again and we're going to hear some more of this.

Crux 3: You can do good by thinking ahead on AGI

Okay, number three. You can do good by thinking ahead on AGI. Can one do good by thinking ahead on particular technical problems? The specific version of this is that the kind of AI safety research that I do is predicated on the assumption that there are technical questions which we can ask now such that if we answer them now, AI will then go better.

I think this is actually kind of sketchy as a claim and I think that I don't see people push back on it quite enough and that meant that I was very happy about the people today who I talked to who pushed back on it, so bonus points to them.

So here's two arguments that we can’t make progress now.

Problems solve themselves

One is in general, problems solve themselves

Imagine if I said to you: “One day humans are going to try and take humans to Mars. And it turns out that most designs of a spaceship to Mars don't have enough food on them for humans to not starve over the course of their three-month-long trip to Mars. We need to work on this problem. We need to work on the problem of making sure that when people build spaceships to Mars they have enough food in them for the people who are in the spaceships.”

I think this is a stupid argument. Because people are just not going to fuck this one up. I would just be very surprised if all these people got on their spaceship and then they realized after a week oh geez, we forgot to pack enough food. Because people don't want to die of starvation on a spaceship and people would prefer to buy things that aren't going to kill them. And I think this is actually a really good default argument.

Another one is: “Most people have cars. It would be a tremendous disaster if everyone bought cars which had guns in the steering wheels such that if you turn on the accelerator, they shoot you in the face. That could kill billions of people.” And I'm like, yep. But people are not going to buy those cars because they don't want to get shot in the face. So I think that if you want to argue for AI safety being important you have to argue for a disanalogy between those two examples and the AI safety case.

Thinking ahead is real hard

The other one is: thinking ahead is real hard. I don't actually know of any examples ever where someone said, “It will be good if we solve this technical problem, because of this problem which is going to come up in 20 years.” I guess the only one I know of is those goddamn quantum computers again, where people decided to start coming up with quantum-resistant security ages ago, such that as soon as we get powerful quantum computers, even though they can break your RSA, you just use one of these other things. But I don't think they did this because they thought it was helpful. I think they did it because they're crypto nerds who like solving random theoretical problems. So I can’t name an example of anyone thinking ahead about a technical problem in a useful way.

- [Student] But even there, there's a somewhat more precise definition of what a quantum computer even is. It's not clear to me that there's anything close for what AGI is going to look like. So even that example strikes me as weird.

- You're saying it's easier for them to solve their problem than it would be for us to do useful work on AI?

- At least there's some definition. I actually don't know what's going on in their field at all. But I don't know that there's any definition of what AGI will look like.

- Yeah. I'm taking that as an argument for why even that situation is an easier case for thinking ahead than the AI safety case.

- Yeah, yeah, like here, what kind of assumption are we very sure about? And I think in our previous conversation you were saying the fact that some objective is going to be optimized or something.

Arguments for thinking ahead

Okay, so I want to argue for the claim that it's not totally crazy to think about the AI alignment problem right now.

So here are some arguments I want to make, about why I think we can maybe do good stuff now.

By the way, another phrasing of this is, if you could trade one year of safety research now for x years of safety research the year that AGI is developed or five years before AGI is developed, what is the value of x at which you're indifferent? And I think that this is just a question that you can ask people. And I think a lot of AI safety researchers think that the research that is done the year of building the AGI is just five times or 10 times more important. And I'm going to provide some arguments for why thinking ahead actually might be helpful.

Relaxations

One is relaxations of the problem. By “relaxation”, I mean you take some problem and instead of trying to solve it, you try to solve a different, easier problem.

Here's what I mean by this: There are a variety of questions whose answer I don't know, which seem like easier versions of the AI safety problem.

Here's an example. Suppose someone gave me an infinitely fast computer on a USB drive and I want to do good in the world using my infinitely fast computer on a USB drive. How would I do this? I think this has many features in common with AI safety problem, but it's just strictly easier because all I'm trying to do is to figure out how to use this incredibly smart, powerful thing that can do lots of stuff, and any thing which you can do with machine learning you can also do with this thing. You can either just run your normal machine learning algorithms or you can do this crazy optimizing over parameter space for whatever architecture you like, or optimizing over all programs for something.

This is just easier than machine learning, but I still don't know how to use this to make a drug that helps with a particular disease. I’m not even quite sure how to use this safely to make a million dollars on the stock market, though I am relatively optimistic I’d be able to figure that one out. There's a bunch of considerations.

If I had one of these infinitely fast computers, I don't think I know how to do safe, useful things with it. If we don't know how to answer this question now, then no matter how easy it is to align ML systems, it's never going to get easier than this question. And therefore, maybe I should consider trying to solve this now.

Because if I can solve this now, maybe I can apply that solution partially to the ML thing. And if I can’t solve this now, then that's really good to know, because it means that I'm going to be pretty screwed when the ML thing comes along.

Another relaxation you can do is you can pretend you have an amazing function approximator, where by “function approximator” I just mean an idealized neural net. If you have a bunch of labeled training data, you can put it in your magical function approximator and it'll be a really good function approximator on this. Or if you want to do reinforcement learning, you can do this and it'll be great. I think that we don't know how to do safe, aligned things using an amazing function approximator, and I think that machine learning is just strictly more annoying to align than this. So that's the kind of work that I think we can do now, and I think that the work that we do on that might either just be applicable or it might share some problems in common with the actual AI alignment problem. Thoughts, questions, objections?

- [Student] For the halting Oracle thing, are we assuming away the “what if using it for anything is inherently unsafe for spooky universal prior reasons” thing?

- That's a really great question. I think that you are not allowed to assume away the spooky universal prior problems.

- [Student 2] So what was the question? I didn't understand the meaning of the question.

- The question is... all right, there's some crazy shit about the universal prior. It's a really long story. But basically if you try to use the Solomonoff prior, it's... sorry, nevermind. Ask me later. It was a technicality. Other questions or objections?

So all right, I think this claim is pretty strong and I think a lot of you probably disagree with it. The claim is, you can do research on AI safety now, even though we don't know what the AGI looks like, because there are easier versions of the problem that we don't know how to solve now, so we can just try and solve them. Fight me.

- [Student] You could technically make the problem worse by actually arriving to some conclusions that will help actual AI research, like not safety but like the capabilities research by accident.

- Seems right. Yeah, maybe you should not publish all the stuff that you come up with.

When you're doing safety research, a lot of the time you're implicitly trying to answer the question of what early AGI systems will look like. I think there’s a way in which safety research is particularly likely to run into dangerous questions for this reason.

- [Student] So if we say that AGI is at least as good as a human, couldn't you just relax it to a human? But if you do relax it to just, say, “I'm going to try to make this human or this brain as safe as possible,” wouldn't that be similar to operations research? In business school, where they design systems of redundancies in nuclear plants and stuff like that?

- So, a relaxation where you just pretend that this thing is literally a human — I think that this makes it too easy. Because I think humans are not going to try and kill you, most of the time. You can imagine having a box which can just do all the things that a human does at 10 cents an hour. I think that it’d be less powerful than an AGI in some ways, but I think it's pretty useful. Like, if I could buy arbitrary IQ-100 human labor for 10 cents an hour, I would probably become a reseller of cheap human labor.

- [Student] I got a question from Discord. How interpretable is the function approximator? Do you think that we couldn't align a function approximator with, say, the opacity of a linear model?

- Yes. I mean, in this case, if you have an opaque function approximator, then other problems are harder. I'm assuming away inner alignment problems (apologies for the jargon). Even linear models still have the outer alignment problem.

Analogy to security

Here's another argument I want to make. I'm going to use security as an analogy. Imagine you want to make a secure operating system, which has literally zero security bugs, because you're about to use it as the control system for your autonomous nuclear weapons satellite that's going to be in space and then it's going to have all these nuclear weapons in it.

So you really need to make sure that no one's going to be able to hack it and you're not able to change it and you expect it to be in the sky for 40 years. It turns out that in this scenario you're a lot better off if you've thought about security at the start of the project than if you only try to think about security at the end of the project. Specifically it turns out that there are decisions about how to write software which make it drastically easier or harder to prove security. And you really want to make these decisions right.

And in this kind of a world, it's really important that you know how one goes about building a secure system before you get down to the tricky engineering research of how to actually build the system. I think this is another situation which suggests that work done early might be useful.

Another way of saying this is to think of operating systems. I want to make an operating system which has certain properties, and currently no one knows how to make an operating system with these properties, but it's going to need to be built on top of some other properties that we already understand about operating systems and we should figure out how to do those securely first.

This is an argument that people at MIRI feel good about and often emphasize. It’s easier to put security in from the start. Overall I think this is the majority of my reason for why I think that you can do useful safety work starting right now.

I want to give some lame reasons too, like lame meta reasons. Maybe it's useful for field building. Maybe you think that AI safety research that happens today is just 10 times less useful than AI safety research that happens in the five years before the AGI is built. But if you want to have as much of that as possible it's really helpful if you get the field built up now. And you have to do something with your researchers and if you have them do the best AI safety research they can, maybe that's not crazy.

- [Student] Maybe if you jump the gun and you try to start a movement before it's actually there and then it fizzles out, then it's going to be harder to start it when it's really important.

- Yep. So here's an example of something kinda like that. There are people who think that MIRI, where I work, completely screwed up AI safety for everyone by being crazy on the internet for a long time. And they're like, “Look, you did no good. You got a bunch of weird nerds on the internet to think AI safety is important, but those people aren't very competent or capable, and now you've just poisoned the field, and now when I try to talk to my prestigious, legit machine learning friends they think that this is stupid because of the one time they met some annoying rationalist.” I think that's kind of a related concern that is real. Yeah, I think it's a strong consideration against doing this.

- [Student] I agree with the security argument, but it brings up another objection, which is: even if you “make progress”, people have to actually make use of the things you discovered. That means they have to be aware of it, it has to be cost effective. They have to decide if they want to do it.

- Yeah, all right, I'm happy to call this crux four.

Crux 4: good alignment solutions will be put to use

Good alignment solutions will be put to use, or might be put to use. So I in fact think that it's pretty likely... So there are these terms like “competitiveness” and “safety tax” (or “alignment tax”) which are used to refer to the extent to which it's easier to make an unaligned AI than an aligned AI. I think that if it costs you only 10% more to build an aligned AI, and if the explanation of why this AI is aligned is not that hard, as in you can understand it if spend a day thinking about it, I would put more than 50% probability on the people who try to build this AGI using that solution.

The reason I believe this is that when I talk to people who are trying to build AGIs, like people at DeepMind or OpenAI, I'm like, “Yep, they say the right things, like ‘I would like to build my AI to be aligned, because I don't want to kill everyone’”. And I honestly believe them. I think it's just a really common desire to not be the one who killed all of humanity. That's where I'm at.

- [Student] I mean, as a counterargument, you could walk into almost any software company and they'll pay tons of lip service to good security and then not do it, right?

- Yep, that's right. And that’s how we might all die. And what I said is in the case where it's really easy, in the case where it's really cheap, and it costs you only 10% more to build the AGI that's aligned, I think we're fine. I am a lot more worried about worlds where it would have cost you $10 billion to build the subtly unaligned AI but it costs you $100 billion to build the aligned AI, and both of these prices fall by a factor of two every year.

And then we just have to wonder whether someone spends the $100 billion for the aligned AI before someone spends the $10 billion dollars for the unaligned AI; and actually all these figures are falling, maintaining a constant ratio. I think thinking about this is a good exercise.

And even scarier is the thing that I think is actually likely, is that building the aligned AI takes an extra three years or something. And the question will be, “How much of a lead time would the people who are trying to build the aligned one actually have? Is it actually three years, I don’t think it is...”

Wouldn’t someone eventually kill everyone?

- [Student] Even if most people would not want to destroy the human race, isn't there still that risk there will just be one really dangerous or crazy person who does deliberately want to cause havoc? And how do we deal with that?

- Yeah. I think that long-term, it's not acceptable to have there be people who have the ability to kill everyone. It so happens that so far no one has been able to kill everyone. This seems good. I think long-term we're either going to have to fix the problem where some portion of humans want to kill everyone or fix the problem where humans are able to kill everyone.

And I think that you could probably do this through regulating really dangerous technology or modifying how humans work so they aren't going to kill everyone.

This isn’t a ridiculous change from the status quo. The U.S. government employs people who will come to your house and arrest you if you are trying to make smallpox. And this seems good, because I don't think it would be good if anyone who wanted to could make smallpox.

Long-term, humanity is not going to let people kill everyone. Maybe it turns out that if you want to build an AGI that can kill everyone, you'd have to have at least three million super GPUs, or maybe you need three TPU pods. Either way, people are going to be like, “Well, you're not allowed to have three TPU pods unless you've got the official licence. There’ll be regulation and surveillance. Maybe the government runs all the TPU pods, a bit like how governments runs all the plutonium and hopefully all of the bioweapons.

So that's the answer to the question, “Wouldn't someone always try to kill everyone?”. The answer is yes, unless you make all the humans so they aren't going to do that by modifying them. But long-term we need to get the risk to zero by making it impossible, and it seems possible to imagine us succeeding at this.

- [Student] Do you think that the solution is better achieved through some sort of public policy thing like that or by something that's a private tool that people can use? Like, should we go through government or should it be something crowdsourced?

- I don't like the term crowdsourced very much.

- [Student] I don't really know why I used that, but something that comes from some open source tool or something like that, or something private.

- I don't have a strong opinion. It seems like it's really hard to get governments to do complicated things correctly. Like their $1.2 billion quantum computing grant. (laughs) And so it seems like we're a bit safer in worlds where we don't need complicated government action. Like, yeah, I just feel pretty screwed if I need the government to understand why and how to regulate TPU pods because otherwise people will make really dangerous AI. This would be really rough. Imagine trying to explain this to various politicians. Not going to be a good time.

- [Student] (unintelligible)

- Yeah. Most humans aren't super evil. When I occasionally talk to senior people who work on general AI research, I’m like, “This person, they’re not a saint, but they’re a solid person”.

Here’s a related question — what would happen if you gave some billionaire 10 squillion dollars? If you gave most billionaires in America 10 squillion dollars and they could just rule the world now, I think there's like at least a 70% chance that this goes really solidly well, especially if they know that one of the things they can do with their AGI is ask it what they should do or whatever. I think that prevents some of the moral risk. That's where I'm at.

[Post talk note: Some other justifications for this: I think that (like most people) billionaires want, all else equal, to do good things rather than bad things, and I think that powerful technologies might additionally be useful for helping people to do a better job of figuring out what actions are actually good or bad according to their values. And to be clear, hopefully something better happens than handing control of the future to a randomly selected billionaire. But I think it’s worth being realistic about how bad this would be, compared to other things that might happen.]

Sounds like there are some disagreements. Anything you want to say?

- [Student] Yeah. The world, and this country especially, is ruled by the 1%, and I don't think they're doing very good things. So I think when it comes to evil and alignment and how money is especially distributed in this country — they don't have access to AGI just yet, but it would scare me if it was put in their hands. Say, Elon Musk for instance. I mean, I don't think he's an evil person — he's very eccentric, but I don't think he's evil — but he's probably one. Let's say it was put in the hands of the Rockefellers or somebody like that, I don't think they would use it for good.

- Yeah, I think this is a place where people...

- [Student] It's a political argument, yeah.

- Yeah, I don't know. My best guess is that the super rich people are reasonably good, yeah.

So the place where I'm most scared about this is I care a lot about animal welfare and an interesting fact about the world is that things like technology got a lot better and this meant that we successfully harmed farm animals in much greater numbers.

[Post talk note: If you include wild animal suffering, it’s not clear what the net effect of technology on animal welfare has been. Either way, technology has enabled a tremendous amount of animal suffering.]

And this is kind of a reason to worry about what happens when you take people and you make them wealthier. On the other hand, I kind of believe it's a weird fluke about the world that animals have such a bad situation. Like, I kind of think that most humans actually do kind of have a preference against torturing animals. And if you made everyone a squillion babillionaire they would figure out the not-torturing-animals thing. These are some things where my intuition comes from.

Crux 5: My research is the good kind

My research is the good kind. My work, or the things that I do, are related to the argument that there are things that you have to figure out ahead of time if you want things to be good. I can't talk about it in detail, because MIRI doesn’t by default disclose all the research that it does. But that's what I do.

Conclusion

I'm going to give an estimate of how confident I am in each of these. Every time I do this I get confused over whether I want to give every step conditioned on the previous steps. We're going to do that.

AI would be a big deal if it showed up here. I'm ~95% sure that if AGI was really cheap and it showed up in a world like this, the world would suddenly look really different. I don't think I'm allowed to use numbers larger than 95%, because of that one time I made that terrible error. And it's very hard to calibration train enough, that you're allowed to say numbers larger than 95%. But I feel really damn sure that the world would look really different if someone built AGI.
AI is plausibly soonish and the next big deal. Given the previous one, not that the conditional matters that much for this one, I feel ~60% confident.
You can do good by thinking ahead on AGI. It's kind of rough, because the natural product of this isn't like a probability, it's like a weighting; it's like how much worse is it than doing things. I'm going to give this 70%.
Alignment solutions might be put to use by goodish people if you have good enough ones. 70%.
My research is the good kind. Maybe 50%?

Okay, cool, those are the numbers. We can multiply them all together. 60% times 95% times 70% times 70% times 50%.

[Post-talk note: This turns out to be about 14%, which is somewhat lower than my actual intuition for how enthusiastic I am about my work.]

Q&A

I'm going to take some more questions for a bit.

- [Student] So, is this how MIRI actually chooses what to work on?

- No.

- [Student] So, how does MIRI choose where to allocate resources and then do research?

- I think MIRI is much more into particular mental motions.

- [Student] Mental motions?

- The thinking I’ve been describing is the kind of thinking that I do when I'm saying, “Should I, instead of my job, do a different job?” For instance, I could do EA movement-building work. (Like for example coming to Stanford EA and talking to Stanford students and giving talks.) And I think this is pretty good and I do it sometimes.

When I'm trying to think of what I should do for AI safety in terms of technical research, I would say mostly I just don't use my own judgment. Mostly I'm just like, “Nate Soares, who runs MIRI, thinks that it would be helpful for him if I did this. And on the couple of domains where I feel like I can evaluate Nate, I think he's really smart.”

- [Student] Smart in what way? Like, what's your metric?

- I think that when I talk, Nate is just really, really good at taking very general questions about the world and figuring out how to think about them in ways that get new true answers.

E.g., I talk to him about physics ⁠— and I feel qualified to think about some areas of physics ⁠— and then he just has really smart thoughts and he thinks about them in a really clever way. And I think that whenever I argue with him about AI safety he says pretty smart things.

And then he might tell me he thinks this particular research direction is great. And then I more update based on my respect for Nate and based on his arguments about what type of technical problems would be good to solve, than I update based on my own judgment about the technical problems. This is particularly because there are worldview questions about what type AI alignment research is helpful that I don't know what I think of.

- [Student] Do you ever consider what you just enjoy doing in a completely outcome-independent way?

- I do occasionally ask the question, what do I enjoy doing? And when I’m considering potential projects, I give bonus of like 2x or 3x to activities that I really enjoy.

- [Student 2] Maybe this is too broad, but why did you choose, or was it a choice, to place your trust on research directions in Nate Soares versus like Paul Christiano or somebody else?

- Well once upon a time I was in a position where I could try to work for MIRI or I could try to work for Paul. I have a comparative advantage of working for MIRI. I have a comparative disadvantage at working for Paul, compared to the average software engineer. Because MIRI wanted some people who were good at screwing around with functional programming and type theory and stuff, and that's me. And Paul wanted someone who was good at messing around with machine learning, and that's not me. And I said, “Paul, how much worse do you think my work will be if I go to MIRI?” And he said, “Four times.” And then I crunched some numbers. And I was like, “Okay, how right are different people likely to be about what AI alignment work is important.” And I was like, “Well…”

I don’t ⁠— look, you asked. I'm going to tell you what I actually thought. I don't think it makes me sound very virtuous. I thought, “Eliezer Yudkowsky from MIRI is way smarter than me. Nate Soares is way smarter than me. Paul Christiano is way smarter than me. That's two to one.” And that's how I'm at MIRI.

I would say, time has gone on and now I’ve updated towards Paul's view of the world in a lot of ways. But the comparative advantage argument is keeping me at MIRI.

- [Student] So if you were to synthesize a human being through biotechnology and create an artificial human then does that count as AI or AGI?

- Eh, I mean I'm interested in defining words inasmuch as they help me reason about the future. And I think an important fact about making humans is that it will change the world if and only if you know how to use that to make really smart humans. In that case I would call that intelligence enhancement, which we didn't really talk about but which does seem like it deserves to be on the list of things that would totally change the world. But if you can just make artificial humans — I don't count IVF as AGI, even though there's some really stupid definition of AGI such that it's AGI. And that's just because it's more useful to have the word “AGI” refer to this computery thing where the prices might fall rapidly and the intelligence might increase rapidly.

- [Student] And what if it's like some cyborg combination of human and computer and then those arguments do apply, with at least the computer part’s price falling rapidly?

- Yep, that's a good question. My guess is that the world is not radically changed by human-computer interfaces, or brain interfaces, before it's radically changed by one of the other things, but I could be wrong. One of the ways in which that seems most likely to change the world is by enabling really crazy mind control or lie detection things.

- [Student] How likely do you think it is that being aware of current research is important for long-term AGI safety work? Because I think a lot of the people from MIRI I talked to were kind of dismissive about knowing about current research because they think it's so irrelevant that eventually it won't really yield most benefit in the future. What's your personal opinion?

- It seems like one really relevant thing that plays into this is whether the current machine learning stuff is similar in important ways to the AI systems that we're going to build in the future. To the extent you believe that it will be similar, I think the answer is yes, obviously machine learning facts from now are more relevant.

Okay, the following is kind of subtle and I'm not quite sure I'm going to be able to say it properly. But remember when I was saying relaxations are one way you can think about AI safety? I think there's a sense that if you don't know how to solve a problem in the relaxed version — if I don't even know how to do good things with my halting oracle on a USB drive — then I’m not going to be able to align ML systems.

Part of this is that I think facts about machine learning should never make the problem easier. You should never rely on specific facts about how machine learning works in your AI safety solutions, because you can't rely on those to hold as your systems get smarter.

If empirical facts about machine learning systems should never be relied on in your AI safety solutions, and there are just not that many non-empirical facts about machine learning, then if you just think of machine learning as magical function approximators, that's just most of the structure of machine learning that is safe to assume. So that's an argument against caring about machine learning.

- [Student] Or any prior knowledge, I guess? The same argument could be made about any assumptions about a system that might not hold in the future.

- That's right. That's right, it does indeed hold there as well.

- [Student] Yeah.

- So the main reason to know about machine learning from this perspective, is it's really nice to have concrete examples. If you're studying abstract algebra and you've never heard of any concrete examples of a group, you should totally just go out and learn 10 examples of a group. And I think that if you have big theories about how intelligence works or whatever, or how function approximators work, it's absolutely worth it to know how machine learning works in practice because then you might realize that you're actually an idiot and this is totally false. So I think that it's very worthwhile for AI safety researchers to know at least some stuff about machine learning. Feel free to quiz me and see whether you think I'm being virtuous by my own standards. I think it's iffy. I think it's like 50-50 on whether I should spend more or less time learning machine learning, which is why I spend the amount of time I spend on it.

- [Student] From a theoretical standpoint, like Marcus Hutter’s perspective, there's a theory of general AI. So to make powerful AGI, it's just a question of how to create a good architecture which can do Bayesian inference, and it's a question of how to run it well in hardware. It’s not like you need to have great insights which one guy could have, it's more about engineering. And then it's not 10% which is added to cost to do safety; we need to have a whole team which would try to understand how to do safety. And it seems that people who don't care about safety will build the AGI faster than that, significantly faster than people who care about safety. And I mean how bad is it?

- I heard many sentences and then, “How bad is it?”. And the sentences made sense.

How bad is it? I don't know. Pretty bad?

In terms of the stuff about AIXI, my answer is kind of long and I kind of don't want to give it. But I think it's a pretty bad summary to say “we already know what the theoretical framework is and we're just doing engineering work now”. That's also true of literally every other technical subject. You can say all of chemistry is like — I already know how to write down the Schrodinger equation, it's “just engineering work” to answer what chemicals you get. Also, all of biology is just the engineering work of figuring out how the Schrodinger equation explains ants or something. So I think that the engineering work is finding good algorithms to do the thing. But this is also work which involves however much theoretical structure. Happy to talk about this more later.

- [Student] Do you disagree with Paul Christiano on anything?

- Yes.

- [Student] Or with other smart people?

- So, Paul Christiano is really smart and it's hard to disagree with him, because every time I try to disagree with him, I’d say something like, “But what about this?” And he's like, “Oh, well I would respond to that with this rebuttal”. And then I'm like, “Oh geez, that was a good rebuttal”. And then he'd say something like, “But I think some similar arguments against my position which are stronger are the following” and then he rattles off four better arguments against his position and then he rebuts those and it's really great. But the places where I most think Paul is wrong, I think Paul is maybe wrong about... I mean, obviously I'm betting on MIRI being better than he thinks. Paul would also think I should quit my job and work on meta stuff probably.

- [Student] Meta stuff?

- Like, work on AI safety movement building.

The biggest thing where I suspect Paul Christiano is wrong is, if I had to pick a thing which feels like the simplest short sweet story for a mistake, it's that he thinks the world is metaphorically more made of liquids than solids.

So he thinks that if you want to think about research you can add up all the contributions to research done by all the individuals and each of these is a number and you add the numbers together. And he thinks things should be smooth. Before the year in which AGI is worth a trillion dollars, it should have been worth half a trillion dollars and you can look at the history of growth curves and you can look at different technological developments and see how fast they were and you can infer all these things from it. And I think that when I talk to him, I think he's more smooth-curve-fitting oriented than I am.

- [Student] Sorry, I didn't follow that last part.

- A thing that he thinks is really compelling is that world GDP doubles every 20 years, and has doubled every 20 years or so for the last 100 years, maybe 200 years, and before that doubled somewhat more slowly. And then before the Industrial Revolution it doubled every couple hundred years. And he's like, “It would be really damn surprising if the time between doublings fell by a factor of two.” And he argues about AI by being like, “Well, this theory about AI can't be true, because if that was true then the world would have doubling times that changed by more than this ratio.”

[Post-talk note: I believe the Industrial Revolution actually involved a fall of doubling times from 600 to 200 years, which is a doubling time reduction of 3x. Thanks to Daniel Kokotajlo for pointing this out to me once.]

- [Student] But I guess the most important things are things that are surprising. So all of these kind of, it just strikes me as sort of a—

- I mean, I think he thinks your plans are good according to the expected usefulness they have. And he's like, “Look, the world is probably going to have a lot of smooth curves. There's probably going to be a four-year period in which the economy doubles before there's a one-year period in which the economy doubles.” And I'm less inclined to take that kind of argument as seriously.

We are at time. So I want to get dinner with people. So I'm going to stand somewhere and then if you stand close enough to me you might figure out where I'm getting dinner, if you want to get dinner with me afterwards. Anything else supposed to happen before we leave here? Great, thanks so much.

adamShimiFeb 13 202020

Thanks a lot for this great post! I think the part I like the most, even more than the awesome deconstruction of arguments and their underlying hypotheses, is the sheer number of times you said "I don't know" or "I'm not sure" or "this might be false". I feel it places you at the same level than your audience (including me), in the sense that you have more experience and technical competence than the rest of us, but you still don't know THE TRUTH, or sometimes even good approximations to it. And the standard way to present clearly ideas and research is to structure them so that these points that we don't know are not the focus. So that was refreshing.

On the more technical side, I had a couple of questions and remarks concerning your different positions.

One underlying hypothesis that was not explicitly pointed out, I think, was that you are looking for priority arguments. That is, part of your argument is about whether AI safety research is the most important thing you could do (It might be so obvious in an EA meeting or the EA forum that it's not worth exploring, but I like expliciting the obvious hypotheses). But that's different from whether or not we should do AI safety research at all. That is one common criticism I have about taking at face value effective altruism career recommendations: we would not have for example pure mathematicians, because pure mathematics is never the priority. Whereas you could argue that without pure mathematics, almost all the positive technological progress we have now (from quantum mechanics to computer science) would not exist. (Note that this is not an argument for having a lot of mathematicians, just an argument for having some).
For the problems-that-solve-themselves arguments, I feel like your examples have very "good" qualities for solving themselves: both personal and economic incentives are against them, they are obvious when one is confronted with the situation, and at the point where the problems becomes obvious, you can still solve them. I would argue that not all these properties holds for AGI. What are your thoughts about that?
About the "big deal" argument, I'm not sure that another big deal before AGI would invalidate the value of current AI Safety research. What seems weird in your definition of big deal is that if I assume the big deal, then I can make informed guess and plans about the world after it, no? Something akin to The Age of Em by Hanson, where he starts with ems (whole-brain emulations) and then try to derive what our current understanding of the various sciences can tell us about this future. I don't see why you can't do this even if there is another big deal before AGI. Maybe the only cost is more and more uncertainty.
The arguments you point out against the value of research now compared to research closer to AGI seems to forget about incremental research. Not all research is a breakthrough, and most if not all breakthrough build on previous decades or centuries of quiet research work. In this sense, working on it now might be the only way to ensure the necessary breakthroughs closer to the deadline.

BuckFeb 21 20208

For the problems-that-solve-themselves arguments, I feel like your examples have very "good" qualities for solving themselves: both personal and economic incentives are against them, they are obvious when one is confronted with the situation, and at the point where the problems becomes obvious, you can still solve them. I would argue that not all these properties holds for AGI. What are your thoughts about that?

I agree that it's an important question whether AGI has the right qualities to "solve itself". To go through the ones you named:

"Personal and economic incentives are aligned against them"--I think AI safety has somewhat good properties here. Basically no-one wants to kill everyone, and AI systems that aren't aligned with their users are much less useful. On the other hand, it might be the case that people are strongly incentivised to be reckless and deploy things quickly.
"they are obvious when one is confronted with the situation"--I think that alignment problems might be fairly obvious, especially if there's a long process of continuous AI progress where unaligned non-superintelligent AI systems do non-catastrophic damage. So this comes down to questions about how rapid AI progress will be.
"at the point where the problems become obvious, you can still solve them"--If the problems become obvious because non-superintelligent AI systems are behaving badly, then we can still maybe put more effort into aligning increasingly powerful AI systems after that and hopefully we won't lose that much of the value of the future.

BuckFeb 20 20205

One underlying hypothesis that was not explicitly pointed out, I think, was that you are looking for priority arguments. That is, part of your argument is about whether AI safety research is the most important thing you could do (It might be so obvious in an EA meeting or the EA forum that it's not worth exploring, but I like expliciting the obvious hypotheses).

This is a good point.

Whereas you could argue that without pure mathematics, almost all the positive technological progress we have now (from quantum mechanics to computer science) would not exist.

I feel pretty unsure on this point; for a contradictory perspective you might enjoy this article.

adamShimiFeb 20 20205

I'm curious about the article, but the link points to nothing. ^^

KirstenFeb 13 202020

"And it seems to me that the stories that I have for how my work ends up making a difference to the world, most of those are just look really unlikely to work if AGI is more than 50 years off. It's really hard to do research that impacts the world positively more than 50 years down the road."

This was nice to read, because I'm not sure I've ever seen anyone actually admit this before.

You say you think there's a 70% chance of AGI in the next 50 years. How low would that probability have to be before you'd say, "Okay, we've got a reasonable number of people to work on this risk, we don't really need to recruit new people into AI safety"?

BuckFeb 13 202019

Not everyone agrees with me on this point. Many safety researchers think that their path to impact is by establishing a strong research community around safety, which seems more plausible as a mechanism to affect the world 50 years out than the "my work is actually relevant" plan. (And partially for this reason, these people tend to do different research to me.)

I don't know what the size of the AI safety field is such that marginal effort is better spent elsewhere. Presumably this is a continuous thing rather than a discrete thing. Eg it seems to me that now compared to five years ago, there are way more people in AI safety and so if your comparative advantage is in some other way of positively influencing the future, you should more strongly consider that other thing.

Gordon Seidoh WorleyFeb 13 202019

Regarding the 14% estimate, I'm actually surprised it's this high. I have the opposite intuition, that there is so much uncertainty, especially about whether or not any particular thing someone does will have impact, that I place the likelihood of anything any particular person working on AI safety does producing positive outcomes at <1%. The only reason it seems worth working on to me despite all of this is that when you multiply it against the size of the payoff it ends up being worthwhile anyway.

Eli Rose🔸Feb 13 202014

I agree with this intuition. I suspect the question that needs to be asked is "14% chance of what?"

RomeoStevensFeb 14 20205

The chance that the full stack of individual propositions evaluates as true in the relevant direction (work on AI vs work on something else).

Eli Rose🔸Feb 15 20201

Suppose you're in the future and you can tell how it all worked out. How do you know if it was right to work on AI safety or not?

There are a few different operationalizations of that. For example, you could ask whether your work obviously directly saved the world, or you could ask whether, if you could go back and do it over again with what you knew now, you would still work in AI safety.

The percentage would be different depending on what you mean. I suspect Gordon and Buck might have different operationalizations in mind, and I suspect that's why Buck's number seems crazy high to Gordon.

RomeoStevensFeb 15 20202

You don't, but that's a different proposition with a different set of cruxes since it is based on ex post rather than ex ante.

I'm saying we need to specify more than, "The chance that the full stack of individual propositions evaluates as true in the relevant direction." I'm not sure if we're disagreeing, or ... ?

Rohin ShahFeb 25 202018

I enjoyed this post, it was good to see this all laid out in a single essay, rather than floating around as a bunch of separate ideas.

That said, my personal cruxes and story of impact are actually fairly different: in particular, while this post sees the impact of research as coming from solving the technical alignment problem, I care about other sources of impact as well, including:

1. Field building: Research done now can help train people who will be able to analyze problems and find solutions in the future, when we have more evidence about what powerful AI systems will look like.

2. Credibility building: It does you no good to know how to align AI systems if the people who build AI systems don't use your solutions. Research done now helps establish the AI safety field as the people to talk to in order to keep advanced AI systems safe.

3. Influencing AI strategy: This is a catch all category meant to include the ways that technical research influences the probability that we deploy unsafe AI systems in the future. For example, if technical research provides more clarity on exactly which systems are risky and which ones are fine, it becomes less likely that people build the risky systems (nobody _wants_ an unsafe AI system), even though this research doesn't solve the alignment problem.

As a result, cruxes 3-5 in this post would not actually be cruxes for me (though 1 and 2 would be).

BuckFeb 25 20208

Yeah, for the record I also think those are pretty plausible and important sources of impact for AI safety research.

I think that either way, it’s useful for people to think about which of these paths to impact they’re going for with their research.

Matthew_BarnettFeb 25 20202

I like this way of thinking about AI risk, though I would emphasize that my disagreement comes a lot from my skepticism of crux 2 and in turn crux 3. If AI is far away, then it seems pretty difficult to understand how it will end up being used, and I think even when timelines are 20-30 years from now, this remains an issue [ETA: Note that also, during a period of rapid economic growth, much more intellectual progress might happen in a relatively small period of physical time, as computers could automate some parts of human intellectual labor. This implies that short physical timelines could underestimate the conceptual timelines before systems are superhuman].

I have two intuitions that pull me in this direction.

The first is that it seems like if you asked someone from 10 years ago what AI would look like now, you'd mostly get responses that wouldn't really help us that much at aligning our current systems. If you agree with me here, but still think that we know better now, I think you need to believe that the conceptual distance between now and AGI is smaller than the conceptual distance between AI in 2010 and AI in 2020.

The second intuition is that it seems like safety engineering is usually very sensitive to small details of a system that are hard to get access to unless the design schematics are right in front of you.

Without concrete details, the major approach within AI safety (as Buck explicitly advocates here) is to define a relaxed version of the problem that abstracts low level details away. But if safety engineering mostly involves getting little details right rather than big ones, then this might not be very fruitful.

I haven't discovered any examples of real world systems where doing extensive abstract reasoning beforehand was essential for making it safe. Computer security is probably the main example where abstract mathematics seems to help, but my understanding is that the math probably could have been developed alongside the computers in question, and that the way these systems are compromised is usually not due to some conceptual mistake.

Rohin ShahFeb 26 20202

I broadly agree with this, but I feel like this is mostly skepticism of crux 3 and not crux 2. I think to switch my position on crux 2 using only timeline arguments, you'd have to argue something like <10% chance of transformative AI in 50 years.

Matthew_BarnettFeb 26 20202

I think to switch my position on crux 2 using only timeline arguments, you'd have to argue something like <10% chance of transformative AI in 50 years.

That makes sense. "Plausibly soonish" is pretty vague so I pattern matched to something more similar to -- by default it will come within a few decades.

It's reasonable that for people with different comparative advantages, their threshold for caring should be higher. If there were only a 2% chance of transformative AI in 50 years, and I was in charge of effective altruism resource allocation, I would still want some people (perhaps 20-30) to be looking into it.

Neel NandaFeb 14 202018

Thanks for writing this up! I thought it was really interesting (and this seems a really excellent talk to be doing at student groups :) ). Especially the arguments about the economic impact of AGI, and the focus on what it costs - that's an interesting perspective I haven't heard emphasised elsewhere.

The parts I feel most unconvinced by:

The content in Crux 1 seems to argue that AGI will be important when it scales and becomes cheap, because of the economic impact. But the argument for the actual research being done seem more focused on AGI as a single monolithic thing, eg framings like a safety tax/arms race, comparing costs of building an unaligned AGI vs an aligned AGI.

My best guess for what you mean is that "If AGI goes well, for economic reasons, the world will look very different and so any future plans will be suspect. But the threat from AGI comes the first time one is made", ie that Crux 1 is an argument for prioritising AGI work over other work, but unrelated to the severity of the threat of AGI - is this correct?

The claim that good alignment solutions would be put to use. The fact that so many computer systems put minimal effort into security today seems a very compelling counter-argument.

I'm especially concerned if the problems are subtle - my impression is that especially a lot of what MIRI thinks about sounds weird and "I could maybe buy this", but could maybe not buy it. And I have much lower confidence that companies would invest heavily in security for more speculative, abstract concerns

This seems bad, because intuitively AI Safety research seems more counterfactually useful the more subtle the problems are - I'd expect people to solve obvious problems before deploying AGI even without AI Safety as a field.

Related to the first point, I have much higher confidence AGI would be safe if it's a single, large project eg a major $100 billion deployment, that people put a lot of thought into, than if it's cheap and used ubiquitously.

RomeoStevensFeb 14 202011

First, doing philosophy publicly is hard and therefore rare. It cuts against Ra-shaped incentives. Much appreciation to the efforts that went into this.

>he thinks the world is metaphorically more made of liquids than solids.

Damn, the convo ended just as it was getting to the good part. I really like this sentence and suspect that thinking like this remains a big untapped source of generating sharper cruxes between researchers. Most of our reasoning is secretly analogical with deductive and inductive reasoning back-filled to try to fit it to what our parallel processing already thinks is the correct shape that an answer is supposed to take. If we go back to the idea of security mindest, then the representation that one tends to use will be made up of components, your type system for uncertainty will be uncertainty of those components varying. So which sorts of things your representation uses as building blocks will be the kinds of uncertainty that you have an easier time thinking about and managing. Going upstream in this way should resolve a bunch of downstream tangles since the generators for the shape/direction/magnitude (this is an example of such a choice that might impact how I think about the problem) of the updates will be clearer.

This gets at a way of thinking about metaphilosophy. We can ask what more general class of problems AI safety is an instance of, and maybe recover some features of the space. I like the capability amplification frame because it's useful as a toy problem to think about random subsets of human capabilities getting amplified, to think about the non-random ways capabilities have been amplified in the past, and what sorts of incentive gradients might be present for capability amplification besides just the AI research landscape one.

D_M_xFeb 13 202011

This was great, thank you. I've been asking people about their reasons to work on AI safety as opposed to other world improving things, assuming they want to maximize the world improving things they do. Wonderful when people write it up without me having to ask!

One thing this post/your talk would have benefited from to make things clearer (or well, at least for me) is if you gave more detail on the question of how you define 'AGI', since all the cruxes depend on it.

Thank you for defining AGI as something that can do regularly smart human things and then asking the very important question how expensive that AGI is. But what are those regularly smart human things? What fraction of them would be necessary (though that depends a lot on how you define 'task')?

I still feel very confused about a lot of things. My impression is that AI is much better than humans at quite a few narrow tasks though this depends on the definition. If AI was suddenly much better than humans at half of all the tasks human can do, but sucked at the rest, then that wouldn't count as artificial 'general' intelligence under your definition(?) but it's unclear to me whether that would be any less transformative though this depends a lot on the cost again. Now that I think about it, I don't think I understand how your definition of AGI is different to the results of whole-brain emulation, apart from the fact that they used different ways to get there. I'm also not clear on whether you use the same definition as other people, whether those usually use the same one and how much all the other cruxes depend on how exactly you define AGI.

LinchFeb 14 20204

(Only attempting to answer this because I want to practice thinking like Buck, feel free to ignore)

Now that I think about it, I don't think I understand how your definition of AGI is different to the results of whole-brain emulation, apart from the fact that they used different ways to get there

My understanding is that Buck defines AGI to point at a cluster of things such that technical AI Safety work (as opposed to, eg., AI policy work or AI safety movement building, or other things he can be doing) is likely to be directly useful. You can imagine that "whole-brain emulation safety" will look very different as a problem to tackle, since you can rely much more on things like "human values", introspection, the psychology literature, etc.

omernevoFeb 16 20209

Thank you for writing this!

I really appreciate your approach of thoroughly going through potential issues with your eventual conclusion. It's a really good way of getting to the interesting parts of the discussion!

The area where I'm left least convinced by is the use of Laplace's Law of Succession (LLoC) to suggest that AGI is coming soonish (that isn't to say there aren't convincing arguments for this, but I think this argument probably isn't one of them).

There are two ways of thinking that make me skeptical of using LLoC in this context (they're related but I think it's helpful to separate them):

1. Given a small amount of observations, there's not enough information to "get away" from our priors. So whatever prior we load into the formula - we're bound to get something relatively close to it. This works if we have a good reason to use a uniform prior or in contexts where we're only trying to separate hypotheses that aren't "far enough away" from the uniform prior, which I don't think is the case here:

In my understanding, what we're really trying to do is separate two hypotheses: The first is that the chance of AGI appearing in the next 50 years is non-negligible (it won't make a huge difference to our eventual decision making if it's 40% or 30% or 20%). The second is that it is negligible (let's say, less than 0.1%, or one in a thousand).

When we use a uniform prior (which starts out with a 50% chance of AGI appearing within a year) - we have already loaded the formula with the answer and the method isn't helpful to us.

2. In continuation to the "demon objection" within the text, I think the objection there could be strengthened to become a lot more convincing. The objection is that LLoC doesn't take the specific event it's trying to predict into account, which is strange and sounds problematic. The example given turns out ok: We've been trying to summon demons for thousands of years so the chance of it happening in the next 50 years is calculated to be small.

But of course, that's just not the best example to show that LLoC is problematic in these areas:

Example 1: I have thought up of a completely new and original demon. It was obviously never attempted to summon my new and special demon until this year, when, apparently it wasn't summoned. The LLoC chance of summoning my demon next year is quite high (and over the next 50 years is incredibly high). It's also larger than the chance of summoning any demon (including my own) over those time periods.

The problematic nature of it isn't just because I picked an extreme example with a single observation -

Example 2: What is the chance that the movie Psycho is meant to hypnotize everyone watching it and we'll only realize it when Hitchcock takes over the world? Well, turns out that this hasn't yet happened for exactly 60 years. So, it seems like the chance of this happening soon is precisely the same as the chance of AGI appearing.

Next, what is the chance of Hitchcock doing this AND Harper Lee (To Kill a Mockingbird came out in the same year) attempts doing this in a similar fashion AND Andre Cassagnes (Etch-A-Sketch is also from 1960) does so (I want to know the chance of all three happening at the exact same time)? Turns out that this specific and convoluted scenario is just as likely since it could only start happening at 1960… This is both obviously wrong and an instance of the conjunction fallacy.

EdoAradFeb 16 20202

This reminds me of the discussion around the Hinge of History Hypothesis (and the subsequent discussion of Rob Wiblin and Will Macaskill).

I'm not sure that I understand the first point. What sort of prior would be supported by this view?

The second point I definitely agree with, and the general point of being extra careful about how to use priors :)

omernevoFeb 17 20204

Sorry, I wasn't very clear on the first point: There isn't a 'correct' prior.

In our context (by context I mean both the small number of observations and the implicit hypotheses that we're trying to differentiate between), the prior has a large enough weight that it affects the eventual result in a way that makes the method unhelpful.

MichaelA🔸Feb 26 20207

Thanks for writing this! As others have commented, I thought the focus on your actual cruxes and uncertainties, rather than just trying to lay out a clean or convincing argument, was really great. I'd be excited to see more talks/write-ups of a similar style from other people working on AI safety or other causes.

I think that long-term, it's not acceptable to have there be people who have the ability to kill everyone. It so happens that so far no one has been able to kill everyone. This seems good. I think long-term we're either going to have to fix the problem where some portion of humans want to kill everyone or fix the problem where humans are able to kill everyone.

This, and the section it's a part of, reminded me quite a bit of Nick Bostrom's Vulnerable World Hypothesis paper (and specifically his "easy nukes" thought experiment). From that paper's abstract:

Scientific and technological progress might change people’s capabilities or incentives in ways that would destabilize civilization. For example, advances in DIY biohacking tools might make it easy for anybody with basic training in biology to kill millions; novel military technologies could trigger arms races in which whoever strikes first has a decisive advantage; or some economically advantageous process may be invented that produces disastrous negative global externalities that are hard to regulate. This paper introduces the concept of a vulnerable world: roughly, one in which there is some level of technological development at which civilization almost certainly gets devastated by default, i.e. unless it has exited the ‘semi-anarchic default condition’. [...] A general ability to stabilize a vulnerable world would require greatly amplified capacities for preventive policing and global governance.

I'd recommend that paper for people who found that section of this post interesting.

MaxDaltonJan 9 20224

Review for the Decade Review

[I'm doing a bunch of low-effort reviews of posts I read a while ago and think are important. Unfortunately, I don't have time to re-read them or say very nuanced things about them.]

I really like the direct, personal, thoughtful style of this talk, and would like to see more posts like it. Seems like maybe one of the best intros-of-this-length to the reasons for working on AI alignment.

Rohin ShahFeb 25 20205

Planned summary for the Alignment Newsletter:

This post describes how Buck's cause prioritization within an effective altruism framework leads him to work on AI risk. The case can be broken down into a conjunction of five cruxes. Specifically, the story for impact is that 1) AGI would be a big deal if it were created, 2) has a decent chance of being created soon, before any other "big deal" technology is created, and 3) poses an alignment problem that we both **can** and **need to** think ahead in order to solve. His research 4) would be put into practice if it solved the problem and 5) makes progress on solving the problem.

Planned opinion:

I enjoyed this post, and recommend reading it in full if you are interested in AI risk because of effective altruism. (I've kept the summary relatively short because not all of my readers care about effective altruism.) My personal cruxes and story of impact are actually fairly different: in particular, while this post sees the impact of research as coming from solving the technical alignment problem, I care about other sources of impact as well. See this comment for details.

BuckFeb 25 20202

I think your summary of crux three is slightly wrong: I didn’t say that we need to think about it ahead of time, I just said that we can.

Rohin ShahFeb 25 20202

My interpretation was that the crux was

We can do good by thinking ahead

One thing this leaves implicit is the counterfactual: in particular, I thought the point of the "Problems solve themselves" section was that if problems would be solved by default, then you can't do good by thinking ahead. I wanted to make that clearer, which led to

we both **can** and **need to** think ahead in order to solve [the alignment problem].

Where "can" talks about feasibility, and "need to" talks about the counterfactual.

I can remove the "and **need to**" if you think this is wrong.

BuckFeb 25 20206

I'd prefer something like the weaker and less clear statement "we **can** think ahead, and it's potentially valuable to do so even given the fact that people might try to figure this all out later".

adamShimiFeb 13 20205

On a tangent, what are your issues with quantum computing? Is it the hype? that might indeed be abusive for what we can do now. But the theory is fascinating, there are concrete applications where we should get positive benefits for humanity, and the actual researchers in the field try really hard to clarify what we know and what we don't about quantum computing.

EdoAradFeb 16 20205

Jaime Sevilla wrote a long (albeit preliminary) and interesting report on the topic

Aaron Gertler 🔸May 22 20203

This post was awarded an EA Forum Prize; see the prize announcement for more details.

My notes on what I liked about the post, from the announcement:

“I edited [the transcript] for style and clarity, and also to occasionally have me say smarter things than I actually said.”
The “enhanced transcript” format seems very promising for other Forum content, and I hope to see more people try it out!
As for this enhanced transcript: here, Buck reasons through a difficult problem using techniques we encourage — laying out his “cruxes,” or points that would lead him to change his mind if he came to believe they were false. This practice encourages discussion, since it makes it easier for people to figure out where their views differ from yours and which points are most important to discuss. (You can see this both in the Q&A section of the transcript and in comments on the post itself.)
I also really appreciated Buck’s introduction to the talk, where he suggested to listeners how they might best learn from his work, as well as his concluding summary at the end of the post.
Finally, I’ll quote one of the commenters on the post:
I think the part I like the most, even more than the awesome deconstruction of arguments and their underlying hypotheses, is the sheer number of times you said "I don't know" or "I'm not sure" or "this might be false".

micOct 5 20211

What does AI safety movement building look like? What sorts of projects or tasks does this involve? What are the relevant organizations where one could do AI safety movement building work?

Sam ClarkeMar 18 20201

On crux 4: I agree with your argument that good alignment solutions will be put to use, in worlds where AI risk comes from AGI being an unbounded maximiser. I'm less certain that they would be in worlds where AI risk comes from structural loss of control leading to influence-seeking agents (the world still gets better in Part I of the story, so I'm uncertain whether there would be sufficient incentive for corporations to use AIs aligned with complex values rather than AIs aligned with profit maximisation).

Do you have any thoughts on this or know if anyone has written about it?

Effective Altruism Forum
EA Forum