High-stakes uncertainty warrants caution and research


When I see confident dismissals of AI risk from other philosophers, it’s usually not clear whether our disagreement is ultimately empirical or decision-theoretic in nature. (Are they confident that there’s no non-negligible risk here, or do they think we should ignore the risk even though it’s non-negligible?) Either option seems pretty unreasonable to me, for the general reasons I previously outlined in X-Risk Agnosticism. But let me now take a stab at spelling out an ultra-minimal argument for worrying about AI safety in particular:

  1. It’s just a matter of time until humanity develops artificial superintelligence (ASI). There’s no in-principle barrier to such technology, nor should we by default expect sociopolitical barriers to automatically prevent the innovation.
    1. Indeed, we can’t even be confident that it’s more than a decade away.
    2. Reasonable uncertainty should allow at least a 1% chance that it occurs within 5 years (let alone 10).
  2. The stakes surrounding ASI are extremely high, to the point that we can’t be confident that humanity would long survive this development.
  3. Even on tamer timelines (with no “acute jumps in capabilities”), gradual disempowerment of humanity is a highly credible concern.
  4. We should not neglect credible near-term risks of human disempowerment or even extinction. Such risks warrant urgent further investigation and investment in precautionary measures.
    1. If there’s even a 1% chance that, within a decade, we’ll develop technology that we can’t be confident humanity would survive—that easily qualifies as a “credible near-term risk” for purposes of applying this principle.

Conclusion: AI risk warrants urgent further investigation and precautionary measures.[1]

Sufficient probability density in the danger zone?

My question for those who disagree with the conclusion: which premise(s) do you reject?

See also:

  1. ^

    Of course, there’s a lot of room for disagreement about what precise form this response should take. But resolving that requires further discussion. For now, I’m just focused on addressing those who claim not to view AI safety as worth discussing at all.

3

1
1

Reactions

1
1
Comments11
Sorted by Click to highlight new comments since:

Since you ask the viewpoint of those who disagree, here is a summary of my objections to your argument. It consists of two parts, first my objection to your probability of AI risk and then my objection to your conclusion.

  1. It’s just a matter of time until humanity develops artificial superintelligence (ASI). There’s no in-principle barrier to such technology, nor should we by default expect sociopolitical barriers to automatically prevent the innovation.
    1. Indeed, we can’t even be confident that it’s more than a decade away.
    2. Reasonable uncertainty should allow at least a 1% chance that it occurs within 5 years (let alone 10).

A reasonable prior is that we will not develop ASI in near future (out of all possible decades, each single decade has a very small probability of ASI being developed, way less than 1%). To overcome this prior, we would need evidence. However, there is little to no evidence that suggests that any AGI/ASI technologies are possible in near future.

It is clear that our current LLM tech is not sufficient for AGI, lacking several properties that an AGI would require, such as learning-planning[1]. Since the current progress is not going towards an AGI, it does not count as good evidence for AGI technology surfacing in near future.

  1. We should not neglect credible near-term risks of human disempowerment or even extinction. Such risks warrant urgent further investigation and investment in precautionary measures.
    1. If there’s even a 1% chance that, within a decade, we’ll develop technology that we can’t be confident humanity would survive—that easily qualifies as a “credible near-term risk” for purposes of applying this principle.

I'm a firm believer of the neglectedness, tractability and importance framework whenever deciding on possible interventions. Therefore, if the question is should we neglect a risk, first thing to ask is, do others neglect it. In the case of AI risk, the answer is, in my opinion, no. AI risk is not neglected. It is, in fact, taken very seriously by major AI companies, numerous other organizations, and even some governments. AI is researched in almost every university on our planet, and massive funds are used for AI safety research. So I believe AI risk fails the neglectedness criterion.

But even more crucially, I think it also fails tractability. Because the AGI technology does not exist, we cannot research it. Most so called "AI safety research" focuses on unimportant sidetracks that do not have any measurable effect on AI risk. Similarly, it is very difficult to establish any governmental policy to limit AI development, as we do not even know what kind of technology we need to regulate aside from a blanket ban on AI research, which most our politicians correctly deem would be an overreaching and harmful policy, since current AI tech is harmless from the X-risk viewpoint (and there would be no way out of that ban since we cannot research safety of non-existing tech).

I do not believe AI risk is important as there is no good reason to believe we will develop ASI in near future. But even if we believed so, it fails the two other criteria of the ITN framework and thus would not be a good target for interventions.

  1. ^

    Learning-planning is what I call the ability to assess one's own abilities and efficiently learn missing abilities in a targeted way. Currently, machine learning algorithms are extremely inefficient, and models lack introspection capabilities required to assess missing abilities.

Thanks for explaining your view! 

On the first point: I think we should view ASI as disproportionately likely in decades that already feature (i) recent extraordinary progress in AI capabilities that surprises almost everyone, and (ii) a fair number of experts in the field who appear to take seriously the possibility that continued progress in this vein could soon result in ASI.

I'd then think we should view it as disproportionately unlikely that ASI will either be (a) achieved before any such initial signs of impressive progress, OR (b) achieved centuries after such initial progress. (If not achieved within a century, I'd think it more likely to be outright unachievable.)

I don't really know enough about AI safety research to comment on the latter disagreement. I'm curious to hear others' views.

While it might feel to you that AI progress has been rapid in the past decade, most innovations behind it such as neural networks, gradient descent, backpropagation, and the concept of language models are very old innovations. The only major innovation in the past decade is the Transformer architecture from 2017, and almost everything else is just incremental progress and scaling on larger models and datasets. Thus, the pace of AI architecture development is very slow and the idea that a groundbreaking new AGI architecture will surface has a low probability.

ASI is the ultimate form of AI and in some sense computer science as a whole. Claiming that we will reach it just because we've just got started in computer science seems premature, akin to claiming that physics is soon solved just because we've made so much progress recently. Science (and AI in particular) is often compared to an infinite ladder: you can take as many steps as you like, and there will still be infinite steps ahead. I don't believe there are literally infinite steps to ASI, but assuming there must be only a few steps ahead just because there are a lot of steps behind is a fallacy.

I was recently reading Ada Lovelace's "Translator's Notes" from 1843, and came across this timeless quote (emphasis original):

It is desirable to guard against the possibility of exaggerated ideas that might arise as to the powers of the Analytical Engine. In considering any new subject, there is frequently a tendency, first, to overrate what we find to be already interesting or remarkable; and, secondly, by a sort of natural reaction, to undervalue the true state of the case, when we do discover that our notions have surpassed those that were really tenable.

This is a comment to the text by Luigi Menabrea she was translating, in which he was hyping that the "conceptions of intelligence" could be encoded into the instructions of the Analytical Engine[1]. Having a much better technical understanding of the machine than Menabrea, Lovelace was skeptical of his ideas and urged him to calm down.

The rest of their discussion is much more focused on concrete programs the machine could execute, but this short quote stroke me as very familiar of our current discussion. There existed (some level of) scientific discussion of artificial intelligence in 1840s, and their talking points seem so similar to ours, with some hyping and others being skeptical!

From the perspective of Lovelace and Menabrea, computer science progressed incredibly fast. Babbage's Analytical Engine was a schematic for a working computer that was much better than earlier plans such as the Differential Engine. Designing complex programs became possible. I can feel their excitement while reading their texts. But even despite this, it took a hundred years until ENIAC, the first general purpose digital computer, was built in 1945. The fact that a field progresses fast in its early days does not mean much when predicting its progress in future.

  1. ^

    The quote she was commenting: "Considered under the most general point of view, the essential object of the machine being to calculate, according to the laws dictated to it, the values of numerical coefficients which it is then to distribute appropriately on the columns which represent the variables, it follows that the interpretation of formulae and of results is beyond its province, unless indeed this very interpretation be itself susceptible to expression by means of the symbols which the machine employs. Thus, although it is not itself the being that reflects, it may yet be considered as the being which executes the conceptions of intelligence."

I really want to stress the difference between saying something will (definitely) happen versus saying there's a credible chance (>1%) that it will happen. They're very different claims!

Lovelace and Menabrea probably should have regarded their time as disproportionately likely (compared to arbitrary decades) to see continued rapid progress. That's compatible with thinking it overwhelmingly likely (~99%) that they'd soon hit a hurdle.

As a heuristic, ask: if one were, at the end of history, to plot the 100 (or even just the 50) greatest breakthrough periods in computer science prior to ASI (of course there could always be more breakthroughs after that), should we expect our current period to make the cut? I think it would be incredible to deny it.

It is true that Lovelace and Menabrea should have assumed a credible chance of rapid progress. Who knows, maybe if they had had the right resources and people, we could have had computers much earlier than we ultimately had.

But when talking about ASI, we are not just talking about rapid progress, we are talking about the most extreme progress imaginable. Extraordinary claims require extraordinary evidence, and so forth. We do not know what breakthroughs ASI requires, nor do we know how far we are from it.

It all comes down to the question of whether the current tech is relevant for ASI or not. In my estimation, it is not – something else entirely is required. The probability for us discovering that something else just now is low.

"It all comes down to the question of whether the current tech is relevant for ASI or not. In my estimation, it is not – something else entirely is required. The probability for us discovering that something else just now is low." 

I think Richard's idea is that you shouldn't have *super-high* confidence in your estimation here, but should put some non-negligible credence on the idea that it is wrong, and current progress is relevant. Why be close to certainty about a question that you probably think is hard and that other smart people disagree about being the reasoning? And once you open yourself up to a small chance that current progress is in fact relevant, it then becomes at least somewhat unclear that you should be way below 1% in the chanc of AGI in the relatively near term or in current safety work being relevant. (Not necessarily endorsing the line of thought in this paragraph myself.) 

Thanks for your answer.

other smart people disagree

I'm generally against this sort of appeal to authority. While I'm open to hear the arguments of smart people, we should evaluate those arguments themselves and not the people giving them. So far, I've heard no argument that would change my opinion on this matter.

You seem to make a similar argument in your other comment:

[...] But when I ask myself what evidence I have for "there are not >20 similar sized jumps before AGI" I come up short. I don't necessarily think the burden of proof here is actually on people arguing that the chance of AGI in the next decade is non-negligible though: it's a goal of some serious people within the relevant science [...]

Again, I think just because there are serious people with this goal, that doesn't mean it is a justified belief. As you say yourself, you can't find evidence for your view. Extraordinary claims require extraordinary evidence, and the burden of proof lays on the person making the claim. That some serious/smart people believe in it is not enough evidence.

1%

I want to stress that even if we gave AGI/ASI a 1% probability in the next decade, my other point that AI safety work is not tractable and not neglected still stands, and it is thus not a good intervention that people in EA should focus on.

And it's not so much that I think I have zero evidence: I keep up with progress in AI to some degrees, I have some idea of what the remaining gaps are to general intelligence, I've seen the speed at which capabilities have improved in recent years etc. It's that how to evaluate that evidence is not obvious, and so simply presenting a skeptic with it probably won't move them, especially as the skeptic-in this case you-probably already has most of the evidence I have anyway. If it was just some random person who had never heard of AI asking why I thought the chance of mildly-over-human level AI in 10 years was not far under 1%, there are things I could say. It's just you already know those things, probably, so there's not much point in my saying them to you. 

"I'm generally against this sort of appeal to authority. While I'm open to hear the arguments of smart people, we should evaluate those arguments themselves and not the people giving them. So far, I've heard no argument that would change my opinion on this matter."

I think this attitude is just a mistake if your goal is to form the most accurate credences you can. Obviously, it is always good practice to ask people for their arguments rather than only taking what they say on trust. But your evaluation of other people's arguments is fallible, and you know it is fallible. So you should distribute some of your confidence to cases where your personal evaluations of credible people's arguments are just wrong. This isn't the same as failing to question purported experts. I can question an expert, and even disagree with them overall, and still move my credences somewhat towards theirs. (I'm much more confident about this general claim than I am about what credences in ASI in the next decade are or aren't reasonable, or how much credibility anyone saying ASI is coming in the next decade should get.) 

[comment deleted]2
0
0

It seems like if you find it incredible to deny and he doesn't, it's very hard to make further progress :(  I'm on your side about the chance being over 1% in the next decade, I think, but I don't know how I'd prove it to a skeptic, except to gesture and say that capabilities have improved loads in a short time, and it doesn't seem like the are >20 similar sized jumps before AGI. But when I ask myself what evidence I have for "there are not >20 similar sized jumps before AGI" I come up short. I don't necessarily think the burden of proof here is actually on people arguing that the chance of AGI in the next decade is non-negligible though: it's a goal of some serious people within the relevant science, and they are not making zero progress, and some identifiable quantifiable individual capabilities have improved very fast. Plus the extreme difficulty of forecasting technological breakthroughs over more than a couple of years cuts both ways. 

Thanks for crossposting this, Richard!

The stakes surrounding ASI are extremely high, to the point that we can’t be confident that humanity would long survive this development

I guess the risk of human extinction over the next 10 years is 10^-7, and I am not aware of any quantitative empirical modelling suggesting otherwise. I do not think that is high enough to justify your conclusion that "AI risk warrants urgent further investigation and precautionary measures", even accounting for longterm effects. I very much agree there should be some investigation and precautionary measures, but I do not consider this "urgent" at the margin

Curated and popular this week
Relevant opportunities