The median superforecaster gave a 0.38% risk of extinction due to AI by 2100, while the median AI domain expert gave a 3.9% risk of extinction.

Tetlock's previous results show that domain experts are not very good at making predictions, and that superforecasters are significantly better.  We should all revise our views on AI xrisk.




Sorted by Click to highlight new comments since: Today at 3:00 PM

I am a bit worried about a narrative of "the forecasters think x-risk is low" when I know a bunch of excellent forecasters who have much higher AI x-risk probabilities. 

For example, Samotsvety (who afaict have an excellent forecasting track record on domain-relevant questions) gave some estimates here (on sep-8-2022) :

A few of the headline aggregate forecasts are:

  1. 25% chance of misaligned AI takeover by 2100, barring pre-APS-AI catastrophe
  2. 81% chance of Transformative AI (TAI) by 2100, barring pre-TAI catastrophe
  3. 32% chance of AGI being developed in the next 20 years

Conversely, the median estimate of all domain level experts is probably lower than the 3.9% presented here. The sampling of experts is non-random: people who are already concerned about AI risk are more likely to do the voluntary survey. The sample here had ~40% of experts attending at least one AI meetup, which is not at all typical for AI experts as a group. 

This could also be true of previous surveys, like the 2022 AI impacts survey which had a response rate of only 17%. I reckon that if you added in the other 83% of experts, the median estimate would drop by a fair margin. 

"when I know a bunch of excellent forecasters..."

Perhaps your sampling techniques are better than Tetlock's then.

The Samotsvety track record does straightforwardly look better than what I expect the median superforecaster's track record to be (which I think is ~99th percentile in either the original Tetlock studies or on GJO), especially on AI. Though perhaps Tetlock's team also selected for better forecasters than the median superforecaster? It's unclear to me.

Last I checked, Tetlock's result on the efficacy of superforecasters vs. domain experts wasn't apples-to-apples: it was comparing individual domain expert forecasts vs. superforecaster forecasts that had been aggregated.

As this post explains, the main study that people cite when saying that "superforecasters are better than experts" comes from a competition where the aggregation methods for the two groups was different (Good Judgment Project's aggregation algorithm versus prediction market with low liquidity for amateur forecasters and experts, respectively). Prediction markets for forecasters and experts had similar performance.

Teddy - this is an important and fascinating paper, and I'd highly recommend EAs to read it.

I'm genuinely baffled about why the superforecasters are giving such an _extremely_ low risk of extinction compared to the AI domain experts, and I'd value any suggestions from others about this.

I don't think the superforecaster estimates being so low is a strong reason to significantly revise our views on X risk, until we have better insights into the huge discrepancy in risk estimates.

One possible explanation for the disparity is the sampling of participants: 42% of the domain experts had attended EA meetups, whereas only 9% of the superforecasters had (page 9 of report). This could have caused a systematic shift in opinion. 

Another explanation: Anchoring bias. The general public changed their estimates of x-risk six orders of magnitude from 5% to 1 in 15 million when the question was phrased differently (page 29). Presumably at least some of this effect would persist for experts as well. Participants were given a list of previous predictions of AI x-risk which were mostly around the 5% range (page 132). I propose that the domain experts anchored to this value, whereas the superforecasters were more willing to deviate. 

titotal - thanks for these helpful observations. Both sound plausible!

There's a giant financial and status incentive for AI safety workers to inflate the dangers. It's also more likely that someone becomes an ai safety expert if they over-estimate the risk.

This study wasn't recruiting AI safety workers? Rather it had AI domain experts, many of whom appeared to have thought about AI x-risk not much more than I'd have expected the median AI researcher to have thought of AI x-risk.[EDIT 2023/07/16: I'm less sure that this is true] 

There was a follow up study with both superforecasters and people who have thought about or worked in AI safety (or adjacent fields). I was involved as a participant. That study had some more (though arguably still limited) engagement between the two camps, and I think there was more constructive dialogue and useful updates in comparison. 

John - yes, it is plausible that there could be selection effects, such that only people with a P(doom) over 1% even bother becoming AI safety researchers. 

But this cuts both ways: any 'forecasting experts' who think the P(doom) is over 1% might have already become AI safety researchers, rather than remaining general forecasting experts.

Also, I'm a bit baffled by this narrative that there are 'giant financial and status incentives' for AI safety researchers to inflate the dangers. 

If somebody wanted to become rich and famous, becoming an AI safety researcher wouldn't even make the Top 1000 list of good career strategies.

The entirety of their job security and status in society depends on the risk being high. You don't view that as a strong incentive to create the impression that the risk is high?

To explain my disagree-vote, this kind of explanation isn't a good one in isolation

I could also say it benefits AI developers to downplay[1] risk, as that means their profits and status will be high, and society will have a more positive view of them as people who are developing fantastic technologies rather than raising existential risks

And what makes this a bad explanation is that it is so easy to vary. Like above, you can flip the sign. I can also easily swap out the area for any other existential risk (e.g. Nuclear War or Climate Change), and the argument could run exactly the same.

Of course, I think motivated reasoning is something that exists and may play a role in explaining the gap between superforecasters and experts in this survey. But on the whole I don't find it convincing without further evidence.

  1. ^

    consciously or not

I wouldn't expect a lot of scarcity mindset, because there's a lot of generically in demand talent and experience among AI x-risk orgs. Status may be a more reasonable question, but job security doesn't really make sense.

Curated and popular this week