I want to know how people estimate the probabilty of AI takeoff and causing humans extinction, and the details(such as: Humans attitude on AI safety, how AI gain physical access to world, how AI is good at tricking humans...) people consider on to predict. But I can only find estimation "results" on EA forum(mostly 2-10% in this century), but I don't know how you estimate it. Did you use complex math models to calculate? I know we should take a pinch of salt with the prediction, but I just want to know what people considers as important factors of AI risks.




New Answer
New Comment

1 Answers sorted by

Sorted by Click to highlight new comments since: Today at 10:34 AM

I think most probabilistic estimates are subjective probability estimates. There are no complicated math models behind them usually.

Some people do make models, but then make subjective probability estimates. The math is typically not that complicated for these models, often just multiplying different probabilities together (which is imo not a good class of models for this kind of problem).

My guess would be that even some of the people who make models have different probability estimates for human extinction than the one that the model spits out, because they realize that their models have flaws and try to correct for that.

So the prediction experts made are all pureb"subjective" predictions? I think there are some logical thinking/arguments or maybe like fermi estimation to explain how he estimates the number unless it's mostly intuition.

[This article](https://slatestarcodex.com/2013/05/02/if-its-worth-doing-its-worth-doing-with-made-up-statistics/) explores why it is useful to work with subjective, "made-up" statistics.

My own view hinges on the following:

  • instrumental convergence: agents will tend to try and accumulate some kinds of resources like money, regardless of what their goals are;
  • value-capabilities orthogonality (often known as just "the orthogonality thesis"): regardless of their capabilities, agents might have pretty much any kind of goal;
  • the fact that most possible goals are incompatible with human thriving (we need a very specific set of conditions to survive, let alone thrive);
  • the fact that current AI capabilities are growing, the growth rate seems to be increasing, and that there are strong economic incentives to keep pushing them forward.

These factors lead me to think we have significantly worse than even odds (that is, <50%) of surviving this century.

I'm also quite interested in how these estimates are being made, so can I ask you for more detail about how you got your estimate? 

In particular, I'm interested in the "chain of events" involved. AI extinction involves several consecutive speculative events. What are your estimates for the following, conditional on the previous steps occurring? 

  1. at least one AGI is built this century
  2. at least one of these AGI is motivated to conquer and wipe out humanity
  3. at least one of the rebellious AGI  successfully conquers and destroy humanity

Did your >50% estimate come from reasoning like this about each step? 

I think 1 is >95% likely. We're in an arms race dynamic for at least some of the components of AGI. This is conditional on us not having been otherwise wiped out (by war, pandemic, asteroid, etc).

I think 2 and 3 are the wrong way to think about the question. Was humankind "motivated to conquer" the dodo? Or did we just have a better use for its habitat, and its extinction was just a whoopsie in the process?

I think 2 and 3 are the wrong way to think about the question. Was humankind "motivated to conquer" the dodo? Or did we just have a better use for its habitat, and its extinction was just a whoopsie in the process?

 When I say "motivated to", I don't mean that it would be it's primary motivation. I mean that it has motivations that, at some point, would lead to it having "perform actions that would kill all of humanity" as a sub-goal. And in order to get to the point where we were dodo's to it, it would have to disempower humanity somehow. 

Would you prefer the following restatement, each conditional on the previous step:

  1. At least one Agi is built in our lifetimes
  2. At least one of these AGI’s has the motivations that include "disempower humanity" as a sub-goal
  3. At least one of these disempowerment attempts are successful 

And then either:

4a: The process of disempowering humanity involves wiping out all of humanity


4b: After successfully disempowering humanity with some of humanity still intact, the AI ends up wiping out the rest of humanity anyway