@ Machine Intelligence Research Institute
6582 karmaJoined Sep 2014Berkeley, CA, USA


Late 2021 MIRI Conversations


Topic Contributions

Dustin Moskovitz comments on Twitter:

The deployment problem is part of societal response to me, not separate.

[...] Eg race dynamics, regulation (including ability to cooperate with competitors), societal pressure on leaders, investment in watchdogs (human and machine), safety testing norms, whether things get open sourced, infohazards.

"The deployment problem is hard and weird" comes from a mix of claims about AI (AGI is extremely dangerous, you don't need a planet-sized computer to run it, software and hardware can and will improve and proliferate by default, etc.) and about society ("if you give a decent number of people the ability to wield dangerous AGI tech, at least one or them will choose to use it").

The social claims matter — two people who disagree about how readily Larry Page and/or Mark Zuckerberg would put the world at risk might as a result disagree about whether a Good AGI Project has median 8 months vs. 12 months to do a pivotal act.

When I say "AGI ruin rests on strong claims about the alignment problem and deployment problem, not about society", I mean that the claims you need to make about society in order to think the alignment and deployment problems are that hard and weird, are weak claims (e.g. "if fifty random large AI companies had the ability to use dangerous AGI, at least one would use it"), and that the other claims about society required for high p(doom) are weak too (e.g. "humanity isn't a super-agent that consistently scales up its rationality and effort in proportion to a problem's importance, difficulty, and weirdness").

Arguably the difficulty of the alignment problem itself also depends in part on claims about society. E.g., the difficulty of alignment depends on the difficulty of the task we're aligning, which depends on "what sort of task is needed to end the acute x-risk period?", which depends again on things like "will random humans destroy the world if you hand them world-destroying AGI?".

The thing I was trying to communicate (probably poorly) isn't "Alignment, Deployment, and Society partitions the space of topics", but rather:

  1. High p(doom) rests on strong claims about AI/compute/etc. and quite weak claims about humanity/society.
  2. The most relevant claims (~all the strong ones, and an important subset of the weak ones) are mostly claims about the difficulty, novelty, and weirdness of the alignment and deployment problems.

To chime in, I think it would be helpful to distinguish between:

1. AI risks on a 'business as usual' model, where society continues as it was before, ie not doing much


2. AI risks given different levels of society response.

I like this! Richard Ngo and Eliezer discuss this a bit in Ngo's view on alignment difficulty:

[Ngo]  (Sep. 25 [2021] Google Doc) 

Perhaps the best way to pin down disagreements in our expectations about the effects of the strategic landscape is to identify some measures that could help to reduce AGI risk, and ask how seriously key decision-makers would need to take AGI risk for each measure to be plausible, and how powerful and competent they would need to be for that measure to make a significant difference. Actually, let’s lump these metrics together into a measure of “amount of competent power applied”. Some benchmarks, roughly in order (and focusing on the effort applied by the US):

  • Banning chemical/biological weapons
    • Key points: mRNA vaccines, lockdowns, mask mandates
  • Nuclear non-proliferation
  • The International Space Station
    • Cost to US: ~$75 billion
  • Climate change
    • US expenditure: >$154 billion (but not very effectively)
  • Project Apollo
    • Wikipedia says that Project Apollo “was the largest commitment of resources ($156 billion in 2019 US dollars) ever made by any nation in peacetime. At its peak, the Apollo program employed 400,000 people and required the support of over 20,000 industrial firms and universities.”
  • WW1
  • WW2

[Yudkowsky][12:02]  (Sep. 25 [2021] comment) 


This level of effort starts to buy significant amounts of time.  This level will not be reached, nor approached, before the world ends.

See the post for more discussion, including an update from Eliezer: "I've updated somewhat off of Carl Shulman's argument that there's only one chip supply chain which goes through eg a single manufacturer of lithography machines (ASML), which could maybe make a lock on AI chips possible with only WW1 levels of cooperation instead of WW2."

Eliezer's Pausing AI Developments Isn't Enough. We Need to Shut it All Down is also trying to do something similar, as is his (written-in-2017) post Six Dimensions of Operational Adequacy in AGI Projects.

I interpret "Pausing AI Developments Isn't Enough" as saying "if governments did X, then we'd still probably be in enormous amounts of danger, but there would now be a non-tiny probability of things going well". (Maybe even a double-digit probability of things going well for humanity.)

Eliezer doesn't think governments are likely to do X, but he thinks we should make a desperate effort to somehow pull off getting governments to do X anyway on EV grounds: there aren't any markedly-more-hopeful alternatives, and we're all dead if we fail.

(Though there may be some other similarly-hopeless-but-worth-trying-anyway options, like moonshot attempts to solve the alignment problem, or a Manhattan Project to build nanotechnology, or what-have-you. My Eliezer-model wants highly competent and sane people pursuing all of these unlikely-to-work ideas in parallel, because then it's more likely that at least one succeeds.)

Six Dimensions of Operational Adequacy in AGI Projects divides amounts of effort into "token", "improving", "adequate", "excellent", and "unrealistic", but it doesn't say how high the risk level is under different buckets. I think this is mostly because Eliezer's model gives a macroscopic probability to success if an AGI project is "adequate" on all six dimensions at once, and a tiny probability to success if it falls short of adequacy on any dimension.

My Eliezer-model thinks that "token" and "improving" both mean you're dead, and he doesn't necessarily think he can give meaningful calibrated confidences that distinguish degrees of deadness when the situation looks that bad.

(A) we're almost certainly screwed, whatever we do

(B) we might be screwed, but not if we get our act together, which we're not doing now

(C) we might be screwed, but not if we get our act together, which I'm confident will happen anyway 

(D) there's nothing to worry about in the first place.

Obviously, these aren't the only options. (A), (C), and (D) imply that few or no additional resources are useful, whereas (B) implies extra resources are worthwhile. My impression is Yudowsky's line is (A).

Seems like a wrong framing to me. My model (and Eliezer's) is that A and B are both right: We're almost certainly screwed, whatever we do; but not if humanity gets its act together in a massive way (which we're currently not doing, but should try to do because otherwise we're dead).

"No additional resources are useful" makes it sound like Eliezer is advocating for humanity to give up, which he obviously isn't doing. Rather, my view and Eliezer's is that we should try to save the world (because the alternative is ruin), even though some things will have to go miraculously right in order for our efforts to succeed.

Note that if it were costless to make the title way longer, I'd change this post's title from "AGI ruin mostly rests on strong claims about alignment and deployment, not about society" to the clearer:

The AGI ruin argument mostly rests on claims that the alignment and deployment problems are difficult and/or weird and novel, not on strong claims about society

Your example with humanity fails because humans have always and continue to be a social species that is dependent on each other.

I would much more say that it fails because humans have human values.

Maybe a hunter-gatherer would have worried that building airplanes would somehow cause a catastrophe? I don't exactly see why; the obvious hunter-gatherer rejoinder could be 'we built fire and spears and our lives only improved; why would building wings to fly make anything bad happen?'.

Regardless, it doesn't seem like you can get much mileage via an analogy that sticks entirely to humans. Humans are indeed safe, because "safety" is indexed to human values; when we try to reason about non-human optimizers, we tend to anthropomorphize them and implicitly assume that they'll be safe for many of the same reasons. Cf. The Tragedy of Group Selectionism and Anthropomorphic Optimism.

You argue that most paths to some ambitious goal like whole-brain emulation end terribly for humans, because how else could the AI do whole-brain emulation without subjugating, eliminating or atomising everyone?

'Wow, I can't imagine a way to do something so ambitious without causing lots of carnage in the process' is definitely not the argument! On the contrary, I think it's pretty trivial to get good outcomes from humans via a wide variety of different ways we could build WBE ourselves.

The instrumental convergence argument isn't 'I can't imagine a way to do this without killing everyone'; it's that sufficiently powerful optimization behaves like maximizing optimization for practical purposes, and maximizing-ish optimization is dangerous if your terminal values aren't included in the objective being maximized.

If it helps, we could maybe break the disagreement about instrumental convergence into three parts, like:

  • Would a sufficiently powerful paperclip maximizer kill all humans, given the opportunity?
  • Would sufficiently powerful inhuman optimization of most goals kill all humans, or are paperclips an exception?
  • Is 'build fast-running human whole-brain emulation' an ambitious enough task to fall under the 'sufficiently powerful' criterion above? Or if so, is there some other reason random policies might be safe if directed at this task, even if they wouldn't be safe for other similarly-hard tasks?

For a STEM-capable AGI (or any intelligence for that matter) to do new science, it would have to interact with the physical environment to conduct experiments.

Or read arXiv papers and draw inferences that humans failed to draw, etc.

Doesn't this significantly throttles the speed of AGI gaining advantage over humanity, giving us more time for alignment?

I expect there's a ton of useful stuff you can learn (that humanity is currently ignorant about) just from looking at existing data on the Internet. But I agree that AGI will destroy the world a little slower in expectation because it may get bottlenecked on running experiments, and it's at least conceivable that at least one project will decide not let it run tons of physical experiments.

(Though I think the most promising ways to save the world involve AGIs running large numbers of physical experiments, so in addition to merely delaying AGI doom by some number of months, 'major labs don't let AGIs run physical experiments' plausibly rules out the small number of scenarios where humanity has a chance of surviving.)

The full text of the TIME piece is now available on the EA Forum here, with two clarifying notes by Eliezer added at the end.

(Meta note: The TIME piece was previously discussed here. I've cross-posted the contents because the TIME version is paywalled in some countries, and is plastered with ads. This version adds some clarifying notes that Eliezer wrote on Twitter regarding the article.)

Gordon Worley adds:

yeah been complaining about this for a while. I'm not sure exactly when things started to fall apart, but it's been in about the last year. the quality of discussion there has fallen off a cliff because it now seems to be full of folks unfamiliar with the basics of rationality or even ea thought. ea has always not been exactly rationality, but historically there was enough overlap to make eaf a cool place. now it's full of people who don't share a common desire to understand the world.

(obviously still good folks on the forum, just enough others to make it less fun and productive to post there)

I posted it there and on Twitter. :) Honestly, it plausibly deserves a top-level EA Forum post as well; I don't usually think memes are the best way to carry out discourse, but in this case I feel like it would be healthy for EA to be more self-aware and explicit about the fact that this dynamic is going on, and have a serious larger conversation about it.

(And if people nit-pick some of the specific factual claims implied by my meme, all the better!)

Some comments Duncan made in a private social media conversation:

(Resharing because I think it's useful for EAs to be tracking why rad people are bouncing off EA as a community, not because I share Duncan's feeling—though I think I see where he's coming from!)

I have found that the EA forum is more like a "search for traitors" place, now, than like a "allies in awesomeness" place.

Went there to see what's up just now, and the first thing in recent comments is:

May be an image of text

Which, like. If I had different preconceptions, might land as somebody being like "Oh! Huh! What cool thing was happening here, that I don't know about?"

But in the current atmosphere, with my current preconceptions, feels like a knife sliding from a sheath.

That seemed like a potential warning sign to me of cultural unhealth on the EA Forum, especially given that others shared Duncan's sentiments.

I asked if Duncan would like to share his impression on the EA Forum so EAs could respond and talk it out, and he said:

Not worth it at this point! Feels like jumping into a pool of piranhas and saying "I don't like that you're piranhas."

More charitably: the EA forum has made it clear that it's not for people like me, and doesn't want people like me around; I can respect that and will only show up for rare and important things where my being on a different wavelength is a cost worth the benefit.

(He was willing to let me cross-post it myself, however.)

Load more