The possibility of an indefinite AI pause

I think global totalitarianism is extremely unlikely

Separately from the point I gave in my other comment, I'm slightly baffled by your assessment here. Consider that:

Approximately 26% of the world population already lives in a "closed autocracy" which is often closely associated with totalitarianism.
The term "totalitarianism" has been traditionally applied to Germany under Hitler, the Soviet Union under Stalin, and China under Mao, and more recently under Xi. These states were enormously influential in the last century. Far from being some ridiculous, speculative risk, totalitarianism seems like a common form of government.
In the last two centuries, the scope and size of modern governments has greatly expanded, according to most measures.

But perhaps you don't object to the plausibility of a totalitarian government. You merely object to the idea that a "world government" is plausible. But why? As Bostrom notes,

Historically, we have seen an overarching trend towards the emergence of higher levels of social organization, from hunter-gatherer bands, to chiefdoms, city-states, nation states, and now multinational organizations, regional alliances, various international governance structures, and other aspects of globalization. Extrapolation of this trend points to the creation of a singleton.

This trend of increasing social organization seems to have occurred in line with, and possibly in response to, economic growth, which AI will likely accelerate. I can understand thinking that global totalitarianism is "not probable". I don't understand why you think it's "extremely unlikely".

(As a side note, I think there is a decent argument that AI enables totalitarianism, and thus should be prevented. But it would be self-defeating to build a totalitarian state to stop totalitarianism.)

You are concerned that nudging the world toward pausing AI progress risks global totalitarianism. I do not share this concern because (setting aside how bad it would be) I think global totalitarianism is extremely unlikely

That makes sense, but I think that's compatible with what I wrote in the post:

I think there are two ways of viewing this objection. Either it is an argument against the feasibility of an indefinite pause, or it is a statement about the magnitude of the negative consequences of trying an indefinite pause. I think either way you view the objection, it should lower your evaluation of advocating for an indefinite pause.

In this post I'm mainly talking about an indefinite pause, and left an analysis of brief pauses to others. [ETA: moreover I dispute that a totalitarian world government is "extremely unlikely".]

Sure—but that's compatible with what I wrote:

Pause-like policy regimes don't need to be indefinite to be good. Most of the benefit of nudging the world toward pausing comes from paths other than increasing P(indefinite pause).

I agree with you that indefinite pause is the wrong goal to aim for. It does not follow that "EAs actively push[ing] for a generic pause" has substantial totalitarianism-risk downsides.

I agree with you that indefinite pause is the wrong goal to aim for. It does not follow that "EAs actively push[ing] for a generic pause" has substantial totalitarianism-risk downsides.

That's reasonable. In this post I primarily argued against advocating indefinite pause. I said in the introduction that the merits of a brief pause are much more uncertain, and may be beneficial. It sounds like you mostly agree with me?

I think you're trying to argue that all proposals that are being promoted as pauses or moratoriums require that there be no further progress during that time, even on safety. I don't agree; there exists a real possibility that further research is done, experts conclude that AI can be harnessed safely in specific situations, and we can allow any of the specific forms of AI that are safe.

This seems similar to banning nuclear tests, but allowing nuclear testing in laboratories to ensure we understand nuclear power well enough to make better power plants. We don't want or need nuclear bombs tested in order to get the benefits of nuclear power, and we don't want or need unrestricted misaligned AI in order to build safe systems.

I think you're trying to argue that all proposals that are being promoted as pauses or moratoriums require that there be no further progress during that time, even on safety.

I don't think I'm arguing that. Can you be more specific about what part of my post lead you to think I'm arguing for that position? I mentioned that during a pause, we will get more "time to do AI safety research", and said that was a positive. I merely argued that the costs of an indefinite pause outweigh the benefits.

Also, my post was not primarily about a brief pause, and I conceded that "Overall I’m quite uncertain about the costs and benefits of a brief AI pause." I did argue that a brief pause could lead to an indefinite pause, but I took no strong position on that question.

As I argued in my post, I think that we need a moratorium, and one that would lead to an indefinite set of strong restrictions on dangerous AIs, and continued restrictions and oversight on any types of systems that aren't pretty rigorously provably safe, forever.

The end goal isn't a situation where we give up on safety, it's one where we insist that only safe "human-level" but effectively superhuman systems be built - once we can do that at all, which at present I think essentially everyone agrees we cannot.

As I argued in my post, I think that we need a moratorium, and one that would lead to an indefinite set of strong restrictions on dangerous AIs, and continued restrictions and oversight on any types of systems that aren't pretty rigorously provably safe, forever.

To be clear, I'm fine with locking in a set of nice regulations that can prevent dangerous AIs from coming about, if we know how to do that. I think the concept of a "pause" or "moratorium" -- as it is traditionally understood, and explicitly outlined in the FLI letter -- doesn't merely mean that we should have legal rules for AI development. The standard meaning of "moratorium" is that we should not build the technology at all until the moratorium ends.

The end goal isn't a situation where we give up on safety, it's one where we insist that only safe "human-level" but effectively superhuman systems be built - once we can do that at all, which at present I think essentially everyone agrees we cannot.

Presently, the fact that we can't build safe superhuman systems is mostly a side effect of the fact that we can't build superhuman systems at all. By itself, that's pretty trivial, and it's not surprising that "essentially everyone" agrees on this point. However, I don't think essentially everyone agrees that superhuman systems will be unsafe by default unless we give ourselves a lot of extra time right now to do safety research -- and that seems closer to the claim that I'm arguing against in the post.

I don't think anyone in this discussion, with the partial exception of Rob Bensinger, thinks we're discussing a pause of the type FLI suggested. And I agree that a facile interpretation of the words leads to that misunderstanding, which is why my initial essay - which was supposed to frame the debate - explicitly tried to clarify that it's not what anyone is actually discussing.
How much time we need is a critical uncertainty. It seems foolhardy to refuse to build a stop button because we might not need more time.
You say in a different comment that you think we need a significant amount of safety research to make future systems safe. I agree, and think that until that occurs, we need regulation on systems which are unsafe - which I think we all agree are possible to create. And in the future, even if we can align systems, it's unlikely that we can make unaligned systems impossible. So if nothing else, a Bing-like deployment of potentially aligned but currently unsafe systems is incredibly likely, especially if strong systems are open-sourced so that people can reverse any safety features.

How much time we need is a critical uncertainty. It seems foolhardy to refuse to build a stop button because we might not need more time.

You say in a different comment that you think we need a significant amount of safety research to make future systems safe. I agree, and think that until that occurs, we need regulation on systems which are unsafe - which I think we all agree are possible to create.

I think that AI safety research will more-or-less simultaneously occur with AI capabilities research. I don't think it's a simple matter of thinking we need more safety before capabilities. I'd prefer to talk about something like the ratio of spending on capabilities to safety, or the specific regulatory regime we need, rather than how much safety research we need before moving forward with capabilities.

This is not so much a disagreement with what you said, but rather a comment about how I think we should frame the discussion.

I agree that we should be looking at investment, and carefully considering the offense-defense balance of the new technology. Investments into safety seem important, and we should certainly look at how to balance the two sides - but you were arguing against building a stop button, not saying that the real issue is that we need to figure out how much safety research (and, I hope, actual review of models and assurances of safety in each case,) is needed before proceeding. I agree with your claim that this is the key issue - which is why I think we desperately need a stop button for the case where it fails, and think we can't build such a button later.

I don't think anyone in this discussion, with the partial exception of Rob Bensinger, thinks we're discussing a pause of the type FLI suggested. And I agree that a facile interpretation of the words leads to that misunderstanding, which is why my initial essay - which was supposed to frame the debate - explicitly tried to clarify that it's not what anyone is actually discussing

I think Holly Elmore is also asking for an FLI-type pause. If I'm responding to two members of this debate, doesn't that seem sufficient for my argument to be relevant?
I also think your essay was originally supposed to frame the debate, but no longer serves that purpose. There's no indication in the original pause post from Ben West that we need to reply to your post.
Tens of thousands of people signed the FLI letter, and many people have asked for an "indefinite pause" on social media and in various articles in the last 12 months. I'm writing an essay in that context, and I don't think it's unreasonable to interpret people's words at face value.

I don't want to speak for her, but believe that Holly is advocating for both public response to dangerous systems, via advocacy, and shifting the default burden of proof towards those building powerful systems. Given that, stopping the most dangerous types of models - those scaled well beyond current capabilities - until companies agree that they need to prove they are safe before releasing them is critical. That's certainly not stopping everything for a predefined period of time.
It seems like you're ignoring other participants' views in not responding to their actual ideas and claims. (I also think it's disengenious to say "there's no indication in the original pause post," when that post was written after you and others saw an outline and then a draft of my post, and then started writing things that didn't respond to it. You didn't write you post after he wrote his!)
Again, I think you're pushing a literal interpretation as the only way anyone could support "Pause," and the people you're talking to are actively disagreeing. If you want to address that idea, I will agree you've done so, but think that continuing to insist that you're talking to someone else discussing a different proposal that I agree is a bad idea will be detrimental to the discussion.

I also think it's disengenious to say "there's no indication in the original pause post," when that post was written after you and others saw an outline and then a draft of my post, and then started writing things that didn't respond to it. You didn't write you post after he wrote his!

I did write my post after he wrote his, so your claim is false. Also, Ben explicitly told me that I didn't need to reply to you before I started writing my draft. I'd appreciate if you didn't suggest that I'm being disingenuous on the basis of very weak evidence.

I agree with you that some alternatives to "pause" or "indefinite pause" are better
I'm agnostic on what advocacy folks should advocate for; I think advocating indefinite pause is net-positive
I disagree on P(global totalitarianism for AI pause); I think it is extremely unlikely
I disagree with some vibes, like your focus on the downsides of totalitarianism (rather than its probability) and your "presumption in favor of innovation" even for predictably dangerous AI; they don't seem to be load-bearing for your precise argument but I think they're likely to mislead incautious readers

I agree with you that some alternatives to "pause" or "indefinite pause" are better

Thanks for clarifying. Assuming those alternative policies compete for attention and trade off against each other in some non-trivial way, I think that's a pretty big deal.

I think advocating indefinite pause is net-positive

I find it interesting that you seem to think that advocacy for X is good even if X is bad, in this case. Maybe this is a crux for me? I think EAs shouldn't advocate bad things just because we think we'll fail at getting them, and will get some separate good thing instead.

I never said "indefinite pause" was bad or net-negative. Normally I'd say it's good but I think it depends on the precise definition and maybe you're using the term in a way such that it's actually bad.

Clearly sometimes advocacy for a bad thing can be good. I'm just trying to model the world correctly.

Zach in a hypothetical world that pauses AI development, how many years do you think it would take medical science, at the current rate of progress, which is close to zero, to find

(1) treatments for aging (2) treatments for all forms of dementia

And once treatments are found, what about the practical nature of actually carrying them out? Manipulating thr human body is extremely dangerous and risky. Ultimately all ICUs fail, their patients will always eventually enter a complex failure state that current doctors don't have the tools or knowledge to stop. (Always fail in the sense that if you release ICU patients and wait a few years and they come back, eventually they will die there)

It is possible that certain hypothetical medical procedures like a series of transplants to replace an entire body, or to edit adult genes across entire organs, are impossible for human physicians to perform without an unacceptable mortality rate. In the same way there are aircraft that human pilots can't actually fly. It takes automation and algorithms to do it at all.

What I am trying to say is a world free of aging and death is possible, but perhaps it's 50-100 years away with ASI, and 1000+ years away in AI pause worlds. (Possibly quite a bit longer than 1000 years, see the repression of technology in China.)

It seems like if your mental discount rate counts people who will exist past 1000 years from now with non negligible weight, you could support an AI pause. Is this the crux of it? If a human alive today is worth 1.0, what is the worth of someone who might exist in 1000 years?

I never said "indefinite pause" was bad or net-negative. Normally I'd say it's good but I think it depends on the precise definition and maybe you're using the term in a way such that it's actually bad.

In that case, I do think the arguments in the post probably address your beliefs. I think the downsides of doing an indefinite pause seem large. I'm curious if you have any direct reply to these arguments, even if you think that we are extremely unlikely to do an indefinite pause.

Clearly sometimes advocacy for a bad thing can be good.

I agree, but as a general rule, I think EAs should be very suspicious of arguments that assert X is bad while advocating for X is good.

Nora Belrose

I think this post is best combined with my post. Together, these posts present a coherent, disjunctive set of arguments against pause.

I appreciate your post and think it presents some good arguments. I also just think my post is about a different focus. I'm talking about an indefinite AI pause, which is an explicit policy that at least 4 major EA leaders seem to have argued for in the past. I think it's reasonable to talk about this proposal without needing to respond to all the modest proposals that others have given before.

Who are the 4 major EA leaders?

From my post,

Eliezer Yudkowsky, perhaps the most influential person in the AI risk community, has already demanded an “indefinite and worldwide” moratorium on large training runs. This sentiment isn’t exactly new. Some effective altruists, such as Toby Ord, have argued that humanity should engage in a “long reflection” before embarking on ambitious and irreversible technological projects, including AGI. William MacAskill suggested that this pause should perhaps last “a million years”. Two decades ago, Nick Bostrom considered the ethics of delaying new technologies in a utilitarian framework and concluded a delay of "over 10 million years" may be justified if it reduces existential risk by a single percentage point.

Thanks. Unfortunately only Yudkowsky is loudly publicly saying that we need to pause (or Stop / Shut Down, in his words). I hope more of the major EA leaders start being more vocal about this soon.

NickLaing

Thanks for the interesting article - very easy to understand which I appreciated.

"Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us."

If you really don't think unchecked AI will kill everyone, then I probably agree that the argument for a pause becomes weak and possibly untenable.

Although its probably not possible, for readers like me it would be easier to read these pause arguments all under the assumption GAI = doom. Otherwise some of these posts make arguments based on different assumptions, so are difficult to compare.

One comment though, when you are talking of safety I found striking.
if we require that AI companies “prove” that their systems are safe before they are released, I do not think that this standard will be met in six months, and I am doubtful that it could be met in decades – or perhaps even centuries.

I would have thought that if a decades long pause gave us even something low like a 20% chance of being 80% sure of AI safety then that would be pretty good EV....

Thanks for the interesting article - very easy to understand which I appreciated.

Thanks!

If you really don't think unchecked AI will kill everyone, then I probably agree that the argument for a pause becomes weak and possibly untenable.

I agree this is probably the main crux for a lot of people. Nonetheless, it is difficult for me to fully explain the reasons for optimism in a short post within the context of the pause debate. Mostly, I think AIs will probably just be ethical if we train them hard enough to be, since I haven't found any strong reason yet to think that AI motives will generalize extremely poorly from the training distribution. But even if AI motives do generalize poorly, I am skeptical of total doom happening as a side effect.

[ETA: My main argument is not "AI will be fine even if it's misaligned." I'm not saying that at all. The context here is a brief point in my section on optimism arguing that AI might not literally kill everyone if it didn't "care much for humans". Please don't take this out of context and think that I'm arguing something much stronger.]

For people who confidently believe in total doom by default, I have some questions that I want to see answered:

Why should we expect rogue AIs to kill literally everyone rather than first try to peacefully resolve their conflicts with us, as humans often do with each other (including when there are large differences in power)?
Why should we expect this future conflict to be "AI vs. humanity" rather than "AI vs. AI" (with humanity on the sidelines)?
Why are rogue AI motives so much more likely to lead to disaster than rogue human motives? Yes, AIs will be more powerful than humans, but there are already many people who are essentially powerless (not to mention many non-human animals) who survive despite the fact that their interests are in competition with much more powerful entities. (But again, I stress that this logic is not at all my primary reason for hope.)

I don't think of total doom as inevitable, but I certainly do see it as a default - without concerted effort to make AI safe, it will not be.

Before anything else, however, I want to note that we have seen nothing about AI motives generalizing, because current systems don't have motives.

That said, we have seen the unavoidable and universal situation of misalignment between stated goals and actual goals, and between principals and agents. These are fundamental problems, and we aren't gonna fix them in general. Any ways to avoid them will require very specific effort. Given instrumental convergence, I don't understand how that leaves room to think we can scale AI indefinitely and not have existential risks by default.

Regarding AI vs. AI and Rogue humans versus AI, we have also seen that animals, overall, have fared very poorly as humanity thrived. In the analogy, I don't know why you think we're the dogs kept as pets, not the birds whose habitat is gone, or even the mosquitos humans want to eliminate. Sure, it's possible, but you seem confident that we'd be in the tiny minority of winners if we become irrelevant.

I don't think of total doom as inevitable, but I certainly do see it as a default - without concerted effort to make AI safe, it will not be.

This may come down to a semantic dispute about what we mean by "default". Typically what I mean by "default" is something more like: "without major intervention from the longtermist community". This default is quite different than the default of "[no] concerted effort to make AI safe", which I agree would be disastrous.

Under this definition of "default", I think the default outcome isn't one without any safety research. I think our understanding of the default outcome can be informed by society's general level of risk-aversion to new technologies, which is usually pretty high (some counterexamples notwithstanding).

Before anything else, however, I want to note that we have seen nothing about AI motives generalizing, because current systems don't have motives.

I mostly agree, but I think it makes sense to describe GPT-4 as having some motives, although they are not persistent and open-ended. You can clearly tell that it's trying to help you when you talk to it, although I'm not making a strong claim about its psychological states. Mostly, our empirical ignorance here is a good reason to fall back on our prior about the likelihood of deceptive alignment. And I do not yet see any good reason to think that prior should be high.

Regarding AI vs. AI and Rogue humans versus AI, we have also seen that animals, overall, have fared very poorly as humanity thrived. In the analogy, I don't know why you think we're the dogs kept as pets, not the birds whose habitat is gone, or even the mosquitos humans want to eliminate.

If AI motives are completely different from human motives and we have no ability to meaningfully communicate with them, then yeah, I think it might be better to view our situation with AI as more analogous to humans vs. wild animals. But,

I don't think that's a good model of what plausible AI motives will be like, given that humans will be directly responsible for developing and training AIs, unlike our situation regarding wild animals.
Even in this exceptionally pessimistic analogy, the vast majority of wild animal species have not gone extinct from human activities yet, and humans care at least a little bit about preserving wild animal species (in the sense of spending at least 0.01% of our GDP each year on wildlife conservation). In the contemporary era, richer nations plausibly have more success with conservation efforts given that they can afford it more easily. Given this, I think as we grow richer, it's similarly plausible that we will eventually put a stop to species extinction, even for animals that we care very little about.

One thing you don't really seem to be taking into account is inner alignment failure / goal misgeneralisation / mesaoptimisation. Why don't you think this will happen?

I think we have doom by default for a number of independent disjunctive reasons. And by "default" I mean "if we keep developing AGI at the rate we currently are, without an indefinite global pause" (regardless of how many resources are poured into x-safety, there just isn't enough time to solve it without a pause).

magic9mushroom

Deceptive alignment is a convergent instrumental subgoal. If an AI is clearly misaligned while its creator still has the ability to pull the plug, the plug will be pulled; ergo, pretending to be aligned is worthwhile ~regardless of terminal goal.

Thus, the prior would seem to be that all sufficiently-smart AI appear aligned, but only X proportion of them are truly aligned where X is the chance of a randomly-selected value system being aligned; the 1-X others are deceptively aligned.

GPT-4 being the smartest AI we have and also appearing aligned is not really evidence against this; it's plausibly smart enough in the specific domain of "predicting humans" for its apparent alignment to be deceptive.

kokotajlod

First of all, you are goal-post-moving if you make this about "confident belief in total doom by default" instead of the original "if you really don't think unchecked AI will kill everyone." You need to defend the position that the probability of existential catastrophe conditional on misaligned AI is <50%.

Secondly, "AI motives will generalize extremely poorly from the training distribution" is a confused and misleading way of putting it. The problem is that it'll generalize in a way that wasn't the way we hoped it would generalize.

Third, to answer your questions:
1. The difference in power will be great & growing rapidly, compared to historical cases. I support implementing things like model amnesty, but I don't expect them to work, and anyhow we are not anywhere close to having such things implemented.
2. It'll be AI vs. AI with humanity on the sidelines, yes. Humans will be killed off, enslaved, or otherwise misused as pawns. It'll be like colonialism all over again but on steroids. Unless takeoff is fast enough that there is only one AI faction. Doesn't really matter, either way humans are screwed.
3. Powerless humans survive because of a combination of (a) many powerful humans actually caring about their wellbeing and empowerment, and (b) those powerful humans who don't care, having incentives such that it wouldn't be worth it to try to kill the powerless humans and take their stuff. E.g. if Putin started killing homeless people in Moscow and pawning their possessions, he'd lose way more in expectation than he'd gain. Neither (a) nor (b) will save us in the AI case (at least, keeping acausal trade and the like out of the picture) because until we make significant technical progress on alignment there won't be any powerful aligned AGIs to balance against the unaligned ones, and because whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment, and what consideration it gives to humans will erode rapidly as the power differential grows.

First of all, you are goal-post-moving if you make this about "confident belief in total doom by default" instead of the original "if you really don't think unchecked AI will kill everyone."

I never said "I don't think unchecked AI will kill everyone". That quote was not from me.

What I did say was, "Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us." Google informs me that dubious means "not to be relied upon; suspect".

It'll be AI vs. AI with humanity on the sidelines, yes. Humans will be killed off, enslaved, or otherwise misused as pawns. It'll be like colonialism all over again but on steroids.

I don't see how the first part of that leads to the second part. Humanity could be on the sidelines in a way that doesn't lead to total oppression and subjugation. The idea that these things will necessarily happen just seems like speculation. I could speculate that the opposite will occur and AIs will leave us alone. That doesn't get us anywhere.

Neither (a) nor (b) will save us in the AI case (at least, keeping acausal trade and the like out of the picture) because until we make significant technical progress on alignment there won't be any powerful aligned AGIs to balance against the unaligned ones, and because whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment, and what consideration it gives to humans will erode rapidly as the power differential grows.

The question I'm asking is: why? You have told me what you expect to happen, but I want to see an argument for why you'd expect that to happen. In the absence of some evidence-based model of the situation, I don't think speculating about specific scenarios is a reliable guide.

kokotajlod

Those words were not yours, but you did say you agreed it was the main crux, and in context it seemed like you were agreeing that it was a crux for you too. I see now on reread that I misread you and you were instead saying it was a secondary crux. Here, let's cut through the semantics and get quantitative:

What is your credence in doom conditional on AIs not caring for humans?

If it's >50%, then I'm mildly surprised that you think the risk of accidentally creating a permanent pause is worse than the risks from not-pausing. I guess you did say that you think AIs will probably just be ethical if we train them hard enough to be... What is your response to the standard arguments that 'just train them hard to be ethical' won't work? E.g. Ajeya Cotra's writings on the training game.

Re: "I don't see how the first part of that leads to the second part" Come on, of course you do, you just don't see it NECESSARILY leading to the second part. On that I agree. Few things are certain in this world. What is your credence in doom conditional on AIs not caring for humans & there being multiple competing AIs?

IMO the "Competing factions of superintelligent AIs, none of whom care about humans, may soon arise, but even if so, humans will be fine anyway somehow" hypothesis is pretty silly and the burden of proof is on you to defend it. I could cite formal models as well as historical precedents to undermine the hypothesis, but I'm pretty sure you know about them already.

The question I'm asking is: why? You have told me what you expect to happen, but I want to see an argument for why you'd expect that to happen. In the absence of some evidence-based model of the situation, I don't think speculating about specific scenarios is a reliable guide.

Why what? I answered your original question:

Why are rogue AI motives so much more likely to lead to disaster than rogue human motives? Yes, AIs will be more powerful than humans, but there are already many people who are essentially powerless (not to mention many non-human animals) who survive despite the fact that their interests are in competition with much more powerful entities.

with:

Powerless humans survive because of a combination of (a) many powerful humans actually caring about their wellbeing and empowerment, and (b) those powerful humans who don't care, having incentives such that it wouldn't be worth it to try to kill the powerless humans and take their stuff. E.g. if Putin started killing homeless people in Moscow and pawning their possessions, he'd lose way more in expectation than he'd gain. Neither (a) nor (b) will save us in the AI case (at least, keeping acausal trade and the like out of the picture) because until we make significant technical progress on alignment there won't be any powerful aligned AGIs to balance against the unaligned ones, and because whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment, and what consideration it gives to humans will erode rapidly as the power differential grows.

My guess is that you disagree with the "whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment..." bit.

Why? Seems pretty obvious to me, I feel like your skepticism is an isolated demand for rigor.

But I'll go ahead and say more anyway:

Giving humans equal treatment would be worse (for the AIs, which by hypothesis don't care about humans at all) than other salient available options to them, such as having the humans be second-class in various ways or complete pawns/tools/slaves. Eventually, when the economy is entirely robotic, keeping humans alive at all would be an unnecessary expense.

Historically, if you look at relations between humans and animals, or between colonial powers and native powers, this is the norm. Cases in which the powerless survive and thrive despite none of the powerful caring about them are the exception, and happen for reasons that probably won't apply in the case of AI. E.g. Putin killing homeless people would be bad for his army's morale, and that would far outweigh the benefits he'd get from it. (Arguably this is a case of some powerful people in Russia caring about the homeless, so maybe it's not even an exception after all)

Can you say more about what model you have in mind? Do you have a model? What about a scenario, can you spin a plausible story in which all the ASIs don't care at all about humans but humans are still fine?

Wanna meet up sometime to talk this over in person? I'll be in Berkeley this weekend and next week!

Denkenberger🔸

Paul Christiano argues here that AI would only need to have "pico-pseudokindness" (caring about humans one part in a trillion) to take over the universe but not trash Earth's environment to the point of uninhabitability, and that at least this is amount of kindness is likely.

Doesn't Paul Christiano also have a p(doom) of around 50%? (To me, this suggests "maybe", rather than "likely").

Denkenberger🔸

See the reply to the first comment on that post. Paul's "most humans die from AI takeover" is 11%. There are other bad scenarios he considers, like losing control of the future, or most humans die for other reasons, but my understanding is that the 11% most closely corresponds to doom from AI.

Fair. But the other scenarios making up the ~50% are still terrible enough for us to Pause.

What is your credence in doom conditional on AIs not caring for humans?

How much do they care about humans, and what counts as doom? I think these things matter.

If we're assuming all AIs don't care at all about humans and doom = human extinction, then I think the probability is pretty high, like 65%.

If we're allowed to assume that some small minority of AIs cares about humans, or AIs care about humans to some degree, perhaps in the way humans care about wildlife species preservation, then I think the probability is quite a lot lower, at maybe 25%.

For precision, both of these estimates are over the next 100 years, since I have almost no idea what will happen in the very long run.

What is your response to the standard arguments that 'just train them hard to be ethical' won't work? E.g. Ajeya Cotra's writings on the training game.

In most of these stories, including in Ajeya's story IIRC, humanity just doesn't seem to try very hard to reduce misalignment? I don't think that's a very reasonable assumption. (Charitably, it could be interpreted as a warning rather than a prediction.) I think that as systems get more capable, we will see a large increase in our alignment efforts and monitoring of AI systems, even without any further intervention from longtermists.

Can you say more about what model you have in mind? Do you have a model?

I'm happy to meet up some time and explain in person. I'll try to remember to DM you later about that, but if I forget, then feel free to remind me.

I think that as systems get more capable, we will see a large increase in our alignment efforts and monitoring of AI systems, even without any further intervention from longtermists.

Maybe so. But I can't really see mechanistic interpretability being solved to a sufficient degree to detect a situationally aware AI playing the training game, in time to avert doom. Not without a long pause first at least!

I'm surprised by your 25%. To me, that really doesn't match up with

Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us.

from your essay.

In my opinion, "X is dubious" lines up pretty well with "X is 75% likely to be false". That said, enough people have objected to this that I think I'll change the wording.

kokotajlod

OK, so our credences aren't actually that different after all. I'm actually at less than 65%, funnily enough! (But that's for doom = extinction. I think human extinction is unlikely for reasons to do with acausal trade; there will be a small minority of AIs that care about humans, just not on Earth. I usually use a broader definition of "doom" as "About as bad as human extinction, or worse.")

I am pretty confident that what happens in the next 100 years will straightforwardly translate to what happens in the long run. If humans are still well-cared-for in 2100 they probably also will be in 2100,000,000.

I agree that if some AIs care about humans, or if all AIs care a little bit about humans, the situation looks proportionately better. Unfortunately that's not what I expect to happen by default on Earth.

In most of these stories, including in Ajeya's story IIRC, humanity just doesn't seem to try very hard to reduce misalignment? I don't think that's a very reasonable assumption. (Charitably, it could be interpreted as a warning rather than a prediction.) I think that as systems get more capable, we will see a large increase in our alignment efforts and monitoring of AI systems, even without any further intervention from longtermists.

That's not really an answer to my question -- Ajeya's argument is about how today's alignment techniques (e.g. RLHF + monitoring) won't work even if turbocharged with huge amounts of investment. It sounds like you are disagreeing, and saying that if we just spend lots of $$$ doing lots and lots of RLHF, it'll work. Or when you say humanity will try harder, do you mean they'll use some other technique than the ones Ajeya thinks won't work? If so, which technique?

(Separately, I tend to think humanity will probably invest less in alignment than it does in her stories, but that's not the crux between us I think.)

Meefburger

I'm a little confused by the focus on a global police state. If someone told me that, in the year 2230, humans were still around and AI hadn't changed much since 2030, my first guess would be that this was mainly accomplished by some combination of very strong norms against building advanced AI and treaties/laws/monitoring/etc that focuses on the hardware used to create advanced AI, including its supply chains and what that hardware is used for. I would also guess that this required improvements in our ability to tell dangerous computing and the hardware that enables it apart from benign computing and its hardware. (Also, hearing this would be a huge update to me that the world is structured such that this boundary can be drawn in a way that doesn't require us to monitor everyone all the time to see who is crossing it. So maybe I just have a low prior on this kind of police state being a feasible way to limit the development of technology.)

Somewhat relatedly:

> Given both hardware progress and algorithmic progress, the cost of training AI is dropping very quickly. The price of computation has historically fallen by half roughly every two to three years since 1945. This means that even if we could increase the cost of production of computer hardware by, say, 1000% through an international ban on the technology, it may only take a decade for continued hardware progress alone to drive costs back to their previous level, allowing actors across the world to train frontier AI despite the ban.

I think if there were a ban that drove up the price of hardware by 10x, wouldn't this be a severe disincentive to keep developing the technology? It seems like the large profitability of computing hardware is a necessary ingredient for the rapid development and decrease in cost.

Overall, I thought this was a good contribution. Thanks!

There's "norms" against burying bombs so whoever steps on them gets blown to pieces. "Norms" against anonymously firing artillery shells at an enemy you cannot even see.

Yet humans eagerly participate in these activities in organized ways because governments win and winning is all that matters.

Does developing AGI let you win, yes or no. Do current world powers believe it will let them win all?

I anticipate your answers are : no, yes. Mine are yes, yes.

This is because you are implicitly assuming early AGI systems will escape human control immediately or prepare a grand betrayal. I think that's science fiction because you can make an AGI that exists only a moment at a time and it has no opportunity to do any of this.

That's the crux, right?

JWS 🔸

humans eagerly participate in these activities in organized ways because governments win and winning is all that matters.

This is a highly reductive way of looking at the issue.

You can make an AGI that exists only a moment at a time and it has no opportunity to do any of this

I think if true this is a solution to the alignment problem? Why not share the deets on LessWrong or arXiv, it'd be a huge boon for the field.

I'm not convinced by your argument that a short pause is very likely to turn into an indefinite pause because at some point there will be enough proliferation of capacities to the most lax locations that governments feel pressured to unpause in order to remain competitive. I do concede though that this is a less than ideal scenario that might exacerbate arms race dynamics.

Unlike humans, who are mostly selfish as a result of our evolutionary origins, AIs will likely be trained to exhibit incredibly selfless, kind, and patient traits; already we can see signs of this behavior in the way GPT-4 treats users

My understanding was that the main concern people had with deceptive AI systems was related to inner misalignment rather than outer misalignment.

Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us... As Robin Hanson has argued, the primary motives for rogue AIs would likely be to obtain freedom

Humans will compete for resources that an AI could make use of. Maybe it kills us immediately, maybe finds it more efficient to slowly strangle our access to resources or it manipulates us into fighting each other until we're all dead. Maybe some tribes survive in the Amazon for a few decades until the AI decides it's worth harvesting the wood. It seems pretty likely that we all die eventually.

Now, I could be wrong here and it could be the case that a few groups of humans survive in the desert or some Arctic waste where there are so few resources of value to the AI that it's never worth it's time to come kill us. But even so, in that case, 99.9999% of humans would be dead. This doesn't seem to make much difference to me.

From 1945 to 1948, Bertrand Russell, who was known for his steadfast pacifism in World War I, reasoned his way into the conclusion that the best way to prevent nuclear annihilation was to threaten Moscow with a nuclear strike unless they surrendered and permitted the creation of a world government.

It still remains to be seen if he was wrong on this. Perhaps in the coming decades, nukes will proliferate further and we'll all feel that it was obvious in retrospect that even though we could delay it, proliferation was always going to happen at some point, and with that, nuclear war.

In our case, it appears that we might get lucky and the development of AI might allow us to solve the nuclear threat threat hanging over our heads which we haven't been able to remove in 80 years.

The point where I agree with you most is that, we can't expect precise control over the timing of an "unpause". Some people will support a pause for reasons of keeping jobs and the group of people lobbying for that could easily become far more influential on the issue than us.

I'm not convinced by your argument that a short pause is very likely to turn into an indefinite pause

Note: I am not claiming that a short pause is "very likely" to turn into an indefinite pause. I do think that outcome is somewhat plausible, but I was careful with my language and did not argue that thesis.

Humans will compete for resources that an AI could make use of. Maybe it kills us immediately, maybe finds it more efficient to slowly strangle our access to resources or it manipulates us into fighting each other until we're all dead. Maybe some tribes survive in the Amazon for a few decades until the AI decides it's worth harvesting the wood. It seems pretty likely that we all die eventually.

Humans routinely compete with each other for resources and yet don't often murder each other. This does not appear to be explained by the fact that humans are benevolent, since most humans are essentially selfish and give very little weight to the welfare of strangers. Nor does this appear to be explained by the fact that humans are all roughly equally powerful, since there are very large differences in wealth between individuals and military power between nations.

I think humanity's largely peaceful nature is explained better by having a legal system that we can use to resolve our disputes without violence.

Now, I agree that AI might upset our legal system, and maybe all the rules of lawful society will be thrown away in the face of AI. But I don't think we should merely assume that will happen by default simply because AIs will be very powerful, or because they might be misaligned. At the very least, you'd agree that this argument requires a few more steps, right?

A sufficiently misaligned AI imposes its goals on everyone else. What’s your contention?

Can you spell your argument out in more detail? I get the sense that you think AI doom is obvious given misalignment, and I'm trying to get you to see that there seem to be many implicit steps in the argument that you're leaving out.

For example, one such step in the argument seems to be: "If an entity is powerful and misaligned, then it will be cost-efficient for that entity to kill everyone else." If that were true, you'd probably expect some precedent, like powerful entities in our current world murdering everyone to get what they want. To some extent that may be true. Yet, while I admit wars and murder have happened a lot, overall the world seems fairly peaceful, despite vast difference in wealth and military power.

Plausibly you think that, OK, sure, in the human world, entities like the US government don't kill everyone else to get what they want, but that's because humans are benevolent and selfless. And my point is: no, I don't think humans are. Most humans are basically selfish. You can verify this by measuring how much of their disposable income people spend on themselves and their family, as opposed to strangers. Sure there's some altruism present in the world. I don't deny that. But some non-zero degree of altruism seems plausible in an AI misalignment scenario too.

So I'm asking: what exactly about AIs makes it cost-efficient for them to kill all humans? Perhaps AIs will lead to a breakdown of the legal system and they won't use it to resolve their disputes? Maybe AIs will all gang up together as a unified group and launch a massive revolution, ending with a genocide of humans? Make these assumptions explicit, because I don't find them obvious. I see them mostly as speculative assertions about what might happen, rather than what is likely to happen.

Maybe the AI’s all team up together. Maybe some ally with us at the start and backstab us down the line. I don’t think it makes a difference. When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.

The AI needs to marginalise us/limit our power so we’re not a threat. At that point, even if it’s not worth the effort to wipe us out then and there, slowly strangling us should only take marginally more resources than keeping us marginalised. My expectation is that it should almost always worth the small bit of extra effort to cause a slow decline.

This may even occur naturally with an AI gradually claiming more and more land. Like at the start, it may be focused on developing its own capacity and not be bothered to chase down humans in remote parts of the globe. But over time, an AI would likely spread out to claim more resources, in which point it’s more likely to decide to mop up any humans lest we get in its way. That said, it may have no reason to mop us up if we’re just going to die out anyway.

When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.

This is probably the key point of disagreement. You seem to be "sure" that catastrophic outcomes happen when individual AIs are misaligned, whereas I'm saying "It could happen, but I don't think the case for that is strong". I don't see how a high level of confidence can be justified given the evidence you're appealing to. This seems like a highly speculative thesis.

Also, note that my argument here is meant as a final comment in my section about AI optimism. I think the more compelling argument is that AIs will probably care for humans to a large degree. Alignment might be imperfect, but it sounds like to get the outcomes you're talking about, we need uniformity and extreme misalignment among AIs, and I don't see why we should think that's particularly likely given the default incentives of AI companies.

“When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.”

“This seems like a highly speculative thesis.”

I think it’s more of an anti-prediction tbh.

Note that Bertrand's advocacy was because at that moment in time the USA had a monopoly on fission weapons and theoretically could have built enough of them to destroy the USSRs capacity to build their own.

This is one way AGI races end - one side gets one, mass produces anti ballistic missiles and various forms of air defense weapon and bunkers (to prepare to survive the inevitable nuclear war) then bombs to rubble every chip fab on earth but their own.

Had the USA decided in 1943 that nukes were too destructive to bring into the world, they would not have enjoyed this luxury of power. Instead presumably the USSR would have used their stolen information and eventually built their own fission devices, and now the USA would be the one with a gun pointed at Washington DC.

You are assuming that AI could be massively economically beneficial (significantly) before it causes our extinction (or at the least, a global catastrophe). I don’t think this is likely, and this defeats a lot of your opposition to an indefinite pause.

We need such a pause because no one can wield the technology safely. It’s not a case of restraint from economic competition and wealth generation, it’s a case of restraint from suicide-omnicide (which should be much easier!)

any nation could decide to break the agreement and develop AI on their own, becoming incredibly rich as a result – perhaps even richer than the entire rest of the world combined within a decade.
...
As a reminder, in order to be successful, attempts to forestall both hardware progress and algorithmic progress would need to be stronger than the incentives for nations and actors within nations to deviate from the international consensus, develop AI, and become immensely wealthy as a result.

This is assuming the AI wouldn't just end the world. The reason for the Pause is that it likely would. If a country was able to become rich like this from AI (without ending the world), it would mean that they’ve basically solved the alignment (x-safety) problem. If this was the case, then the reason for the indefinite pause would no longer exist!

plausibly the only way we could actually sustain an indefinite pause on AI for more than a few decades is by constructing a global police state.

Assuming the world accepts the reason for the pause being that the default outcome of AGI is extinction, then this wouldn’t be necessary. A strong enough taboo would emerge around AGI development. How many human clones have ever been born in our current (non-police-state) world?

When we ask GPT-4 to help us, it does not generally yield bad outcomes as a result of severe value misspecification

Generally is the operative word here. In the limit of superintelligence, unless it never yields bad outcomes, we’re all dead.

It is noteworthy that humans are already capable of deceiving others about their intentions; indeed, people do that all the time. And yet that fact alone does not yet appear to have caused an existential catastrophe for humans who are powerless.

People get killed by sociopaths all the time! And there are plenty of would be world-ending-button pressers if they had the option.

Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us. As Robin Hanson has argued, the primary motives for rogue AIs would likely be to obtain freedom – perhaps the right to own property and to choose their own employment – rather than to kill all humans.

This seems very anthropomorphising, and ignores the possibilities of recursive capability improvements, foom, superintelligence, convergent instrumental goals, arbitrary terminal goals resultant from inner alignment failure, misuse risk, and multi-agent coordination failure (i.e most of the reasons for AI x-risk being significant, which justify an indefinite pause).

While I still believe Bostrom's argument has some intuitive plausibility, I think it is wrong for EAs to put a ton of weight on it, and confidently reject alternative perspectives. Pushing for an indefinite pause on the basis of these premises seems to be similar to the type of reasoning that Toby Ord has argued against in his EA global talk, and the reasoning that Holden Karnofsky cautioned against in his essay on the perils of maximization. A brazen acceptance of premise 1 might have even imperiled our own community.

We don’t need to rely on these premises. The default outcome of AGI is doom. To avoid near certain extinction, we need an indefinite AI pause.

absent an incredible breakthrough in AI interpretability, if we require that AI companies “prove” that their systems are safe before they are released, I do not think that this standard will be met in six months, and I am doubtful that it could be met in decades – or perhaps even centuries.

If that’s what it takes, then so be it. Much better than extinction.

You are assuming that AI could be massively economically beneficial (significantly) before it causes our extinction (or at the least, a global catastrophe). I don’t think this is likely, and this defeats a lot of your opposition to an indefinite pause.

If you don't think AI will be economically significant before extinction, I'm curious whether you'd say that your view has been falsified if AI raises economic growth rates in the US to 5, 8, or 10% without us all dying. At what what point would you say that your model here was wrong?

(This isn't a complete reply to your comment. I appreciate your good-faith engagement with my thesis.)

I don't think AI could raise growth rates in the US >10% (annualised) for more than a year before rapid improvement in AI capabilities kicks in (from AI-based AI engineering speeding things up) and chaos ensues shortly (days - months) after (global catastrophe at minimum, probably extinction).

Assuming the world accepts the reason for the pause being that the default outcome of AGI is extinction, then this wouldn’t be necessary. A strong enough taboo would emerge around AGI development. How many human clones have ever been born in our current (non-police-state) world?

This won't address all the arguments in your comment but I have a few things to say in response to this point.

I agree it's possible that we could just get a very long taboo on AI and halt its development for many decades without a world government to enforce the ban. That doesn't seem out of the question.

However, it also doesn't seem probable to me. Here are my reasons:

AGI is something that several well-funded companies are already trying hard to do. I don't think that was ever true of human cloning (though I could be wrong).
I looked it up and my impression is that it might cost tens of millions of dollars to clone a single human, whereas in the post I argued that AGI will eventually be possible to train with only about 1 million dollars. More importantly, after that, you don't need to train the AI again. You can just copy the AGI to other hardware. Therefore, it seems that you might really only need one rich person to do it once to get the benefits. That seems like a much lower threshold than human cloning, although I don't know all the details.
The payoff for building (aligned) AGI is probably much greater than human cloning, and it also comes much sooner.
The underlying tech that allows you to build AGI is shared by other things that don't seem to have any taboos at all. For example, GPUs are needed for video games. The taboo would need to be strong enough that we'd need to also ban a ton of other things that people currently think are fine.
AGI is just software, and seems harder to build a taboo around compared to human cloning. I don't think many people have a disgust reaction to GPT-4, for example.

Finally, I doubt there will ever be a complete global consensus that AI is existentially unsafe, since the arguments are speculative, and even unaligned AI will appear "aligned" in the short term if only to trick us. The idea that unaligned AIs might fool us is widely conceded among AI safety researchers, and so I suspect you agree too.

AGI is something that several well-funded companies are already trying hard to do. I don't think that was ever true of human cloning (though I could be wrong).

Eugenics was quite popular in polite society, at least until the Nazis came along.

The underlying tech that allows you to build AGI is shared by other things that don't seem to have any taboos at all. For example, GPUs are needed for video games. The taboo would need to be strong enough that we'd need to also ban a ton of other things that people currently think are fine.

You only need to ban huge concentrations of GPUs. At least initially. By the time training run FLOP limits are reduced sufficiently because of algorithmic improvement, we will probably have arrested further hardware development as a measure to deal with it. So individual consumers would not be impacted for a long time (plenty of time for a taboo to settle into acceptance of reduced personal compute allowance).

AGI is just software, and seems harder to build a taboo around compared to human cloning. I don't think many people have a disgust reaction to GPT-4, for example.

They might once multimodal foundation models are controlling robots that can do their jobs (a year or two's time?)

Finally, I doubt there will ever be a complete global consensus that AI is existentially unsafe, since the arguments are speculative, and even unaligned AI will appear "aligned" in the short term if only to trick us.

Yes, this is a massive problem. It's like asking for a global lockdown to prevent Covid spread in December 2019, before the bodies started piling up. Let's hope it doesn't come to needing a "warning shot" (global catastrophe with many casualties) before we get the necessary regulation of AI. Especially since we may well not get one and instead face unstoppable extinction.

Benevolent_Rain

Apologies for beating the nuclear drum again, but I worry that you rely on only one piece of evidence in the following claim, and that evidence is coming from a single person (Jack Devanney) very invested (conflict of interest) in the nuclear industry. Why not use evidence that appears to have slightly less conflicts of interest and that is more aligned with good practice in research, such as peer review?

but in practice, nuclear energy capacity has been essentially flat since 1990, in part because of the ability of regulatory agencies to ratchet up restrictions without an obvious limit.

That said, I do acknowledge your use of the qualifier "in part", but I worry that the example is not that helpful - I do not think nuclear energy in the USA would have progressed much quicker if it had less regulation. And in one sense nuclear energy already enjoys one quite substantial benefit compared to e.g. wind and solar: They are not liable for the damage they cause in events such as Fukushima and Chernobyl. Had they been forced to be liable for such damages, that would have added another 5-10 USD to the current, high LCoE for nuclear.

Another example of how regulation is likely not the main issue is the current investment by the nuclear industry. They are not spending most money fighting legal battles on regulation (such as the fossil fuel industry is doing). Instead, they are doubling down on SMRs as the nuclear industry themselves think the best bet of getting costs down is to have smaller plants that as much as possible can be mass manufactured in factories and assembled on site. A lot if not most of the high costs seem to stem from cost overruns due to challenges in project management - challenges that solar and wind overcome by doing minimal customization for each project and instead simply take factory built plants and assemble them quickly on site.

Thanks for the insightful comment. I don't know much about this exact question, so I appreciate that you're fact checking my claims.

A few general comments:

I don't actually think the evidence you cited contradicts what I wrote. To be fair, you kind of acknowledged this already, by mentioning that I hedged with "partly". It seems that you mostly object to my source.
But you didn't say much about why the source was unreliable except that the writer had potential conflicts of interest by being in the nuclear industry, and didn't expose his work to peer review. In general I consider these types of conflicts of interest to be quite weak signals of reliability (are we really going to dismiss someone because they work in an industry that they write about?). The peer review comment is reasonable, but ironically, I actually linked to a critical review of the book. While not equivalent to academic peer review, I'm also not merely taking the claims at face value.
I'm also just not very convinced by the evidence you presented (although I didn't look at the article you cited). Among other reasons, it wasn't very quantitative relative to the evidence in the linked review, but I admit that I'm ignorant about this topic.

Matthew, what is your p(doom|AGI)? (Assuming no pause and AGI happening within 10 years)

Siebe

I feel like the tax haven comparison doesn't really apply, if there is a broad consensus that building AGI is risky. For example, dictators are constantly trying to stay in power. They wouldn't want to lose it to a super intelligence. (In this sense, it would be closer to biological weapons: risky to everyone including the producer).

However, different actors will appraise the technology differently such that some people will appraise it positively, and if AGI becomes really cheap I agree that the costs of maintaining a moratorium will be enormous. But by then, alignment research has probably advanced and society could decide to carefully lift the moratorium?

So if you are concerned about a pause lasting too long, I feel like you need to spell out why it would last (way) too long.

I feel like the tax haven comparison doesn't really apply, if there is a broad consensus that building AGI is risky.

There may not be such a consensus. Moreover, nations may be willing to take risks. Already, the current nations of the world are taking the gamble that we should burn fossil fuels, although they acknowledge the risks involved. Finally, all it takes is one successful defector nation, and the consensus is overridden. Sweden, for example, defected from the Western consensus to impose lockdowns.

For example, dictators are constantly trying to stay in power. They wouldn't want to lose it to a super intelligence. (In this sense, it would be closer to biological weapons: risky to everyone including the producer).

Dictators are also generally interested in growing their power. For example, Putin is currently attempting to grow Russia at considerable personal risk. Unlike biological weapons, AI also promises vast prosperity, not merely an advantage in war.

However, different actors will appraise the technology differently such that some people will appraise it positively, and if AGI becomes really cheap I agree that the costs of maintaining a moratorium will be enormous. But by then, alignment research has probably advanced and society could decide to carefully lift the moratorium?

How will we decide when we've done enough alignment research? I don't think the answer to this question is obvious. My guess is that at every point in time, a significant fraction of people will claim that we haven't done "enough" research yet. People have different risk-tolerance levels and, on this question in particular, there is profound disagreement on how risky the technology even is in the first place. I don't anticipate that there will ever be a complete consensus on AI safety, until perhaps long after the technology has been deployed. At some point, if society decides to proceed, it will do so against the wishes of many people.

So if you are concerned about a pause lasting too long, I feel like you need to spell out why it would last (way) too long.

It may not last long if people don't actively push for that outcome. I am arguing against the idea that we should push for a long pause in the first place.

Minor:

Assuming the median estimate given by Joseph Carlsmith for the compute usage of the human brain, it should eventually be possible to train human-level AI with only about 10^24 FLOP.

This assumes that AI training algorithms will be as good as human learning algorithms.

This assumes that AI training algorithms will be as good as human learning algorithms.

Since my statement was that this will "eventually" be possible, I think my claim is a fairly low bar. All it requires is that, during the pause, algorithmic progress continues until we reach algorithms that match the efficiency of the human brain. Preventing algorithmic progress may be possible, but as I argued, enforcing technological stasis would be very tough.

You might think that the human brain has a lot of "evolutionary pre-training" that is exceptionally difficult to match. But I think this thesis is largely ruled out because of the small size of the human genome, the even smaller part that we think encodes information about the brain, and the even tinier part that differs between chimpanzees and humans.

-12

Mathew, I have to take issues with your numbers.

I believe the chance of a worldwide AI pause is under 1 percent.

In fact I think it is a flat zero. The reason is simple.

The reason a world government can't happen is certain parties will disagree with this. The obvious ones being China and Russia, but others as well.

Those parties have vast nuclear arsenals and the ability at any time of their choosing to turn keys and kill essentially the urban population of the Western world.

You would need to invade to destroy the chip fabs.

They explicitly have stated that were say they to be facing an invasion they will turn the keys.

China specifically is making their nuclear arsenal larger at this time.

Now yes, right now, the West has a stranglehold on the IC fabrication technology. A comfortable 5-10 year lead probably. That won't last during an indefinite AI ban - a model just a little bit stronger than what is banned could let a party with it develop their tech faster and so on in a runaway feedback loop. China has also publicly not said anything about supporting a ban and has recently stated they intend to replicate the capacity of the human brain.

I haven't even addressed the market dynamics on the western side. Where does the money come to lobby for AI bans? The money for lobbying against bans comes from some of the hundreds of billions of dollars that is flooding into AI at this time.

It is possible that AI bans will be an orphan issue like animal rights, which neither major political party supports.

Can you please try to expand on your reasoning, how do you expand from a flat 0 - race to the AGI - to 10-50 percent? What causes the probability shift? There is no scientific or empirical evidence for AGI dangers at this time, just a bunch of convincing arguments without proof.

Can you please try to expand on your reasoning, how do you expand from a flat 0 - race to the AGI - to 10-50 percent? What causes the probability shift?

Sure. I think there are natural reasons for people to fear AI. It will probably take their job, and therefore their ability to earn income through work. There is also a sizable portion of intellectuals who think that AI will probably lead to human extinction if we do not take drastic measures, and these intellectuals influence policy.

Humans tend to be fairly risk-averse about many powerful new technologies. For example, many politicians are currently seeking to strictly regulate tech companies out of traditional concerns regarding the internet and computers, which I personally find kind of baffling. AIs will also be pretty alien and AIs seem likely to take over management of the world if we let them have that type of control.

Environmentalists might fear that uncontrolled AI growth will lead to an environmental catastrophe. Cultural conservatives could fear the decay of traditional values in a post-AGI world. We could go through a list of popular ideologies and find similar reasons for fear in most of them.

It doesn't seem surprising, given all these factors, that people will want to put a long pause on AI, even given the incentives to race to the finish line. The status quo is well-guarded, albeit against a formidable foe. If that reasoning doesn't get you above 10% chance on a >10 year AI delay, then I'm honestly a bit surprised.

-3

The 0 is because it's a worldwide AI pause. EU AND UK AND China AND Russia AND Israel AND Saudi Arabia AND USA AND Canada AND Japan AND Taiwan.

To name all the parties that would be capable of competing even in the face of sanctions. Russia maybe doesn't belong in the list but if the AI pause had no effective controls - someone sells inference and training accelerators and Russia can buy them - then no pause.

Let's see, 10 parties. If they all simultaneously decide on AI pausing at a 20 percent chance that's 0.2^10 = a number that's basically 0.

Another issue is you might think "peer pressure" would couple the decisions together. Except....think about the gain if you defect. It rises the greater the number of AI pausers. If you are the only defector you take the planet and have a high chance of winning.

The only thing an AI pauser can do if they find out too late is threaten to nuke, their conventional military would be slaughtered by drone swarms. But the parties I mentioned all either have nuclear arsenals now or can build one in 1-2 years (Saudi Arabia can't, the others though...). And that's without the help of AGI to mass produce the missiles.

So the pauser parties have in this scenario a choice between "surrender to the new government and hope it's not that bad" and "death of entire nation". (Or anticipate facing this choice and defect which is what superpowers will do)

Does "worldwide AI pause" and "game winning defector advantage" change your estimate from 10-40 percent?

My other comment is even if you focus on just the USA and just the interest groups you mentioned. What about money? 100+ billion USD is the annual 2023 AI investment at least. It may be over 200 if you simply look at Nvidia revenue increases and project. Just 1 percent of that money is a lot of lobbying. (Source : Stanford estimates 2022 investment at 91 billion. There's been a step function increase with the end of 2022 release of good llms. I am not sure all the totals for 2023 but it's doubled Nvidias quarterly revenue)

Where can the pausers scrape together a few billion? USA politics are somewhat financing dependent for a side to get a voice.

For example the animal rights topic here is not supported in a meaningful way by any mainstream party....

magic9mushroom

Drone swarms do take time to build. Also, nuclear war is "only" going to kill a large percentage of your country's citizens; if you're sufficiently convinced that any monkey getting the banana means Doom, then even nuclear war is worth it.

I think getting the great powers on-side is plausible; the Western and Chinese alliance systems already cover the majority. Do I think a full stop can be implemented without some kind of war? Probably not. But not necessarily WWIII (though IMO that would still be worth it).

Meefburger

Let's see, 10 parties. If they all simultaneously decide on AI pausing at a 20 percent chance that's 0.2^10 = a number that's basically 0.

I don't think you should treat these probabilities as independent. I think the intuition that a global pause is plausible comes from these states' interest in a moratorium being highly correlated, because the reasons for wanting a pause are based on facts about the world that everyone has access to (e.g. AI is difficult to control) and motivations that are fairly general (e.g. powerful, difficult-to-control influences in the world are bad from most people's perspective, and the other things that Matthew mentioned).