Metaculus Predicts Weak AGI in 2 Years and AGI in 10

Chris Leong

Just a quick update on predicted timelines. Obviously, there’s no guarantee that Metaculus is 100% reliable + you should look at other sources as well, but I find this concerning.

Weak AGI is now predicted in a little over two years:

https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/

AGI predicted in about 10: https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/

Also, these are predicted dates until these systems publicly known, not the date until they exist. Things are getting crazy.

Even though Eliezer claims that there was no fire alarm for AGI, perhaps this is the fire alarm?

27 Reactions

Comments13

Sorted by

New & upvoted

Click to highlight new comments since: Today at 4:06 AM

titotalMar 26 202318

Might as well make an alternate prediction here:

There will be no AGI in the next 10 years. There will be an AI bubble over the next couple of years as new applications for deep learning proliferate, creating a massive hype cycle similar to the dot-com boom.

This bubble will die down or burst when people realize the limitations of deep learning in domains that lack gargantuan datasets. It will fail to take hold in domains where errors cause serious damage (see the unexpected difficulty of self-driving cars). Like with the burst of the dot-com bubble, people will continue to use AI a lot for the applications that it is actually good at.

If AGI does occur, it will be decades away at least, and require further conceptual breakthroughs and/or several orders of magnitude higher computing power.

Yarrow Bouchard 🔸Nov 112

The jury's still out on the AI bubble, but two and a half years later I think this prediction still looks good!

Greg_Colbourn ⏸️ Mar 25 20237

I think, in hindsight, the Fire Alarm first started ringing in a DeepMind building in 2017. Or perhaps an OpenAI building in 2020. It's certainly going off all over Microsoft now. It's also going off in many other places. To some of us it is already deafening. A huge, ominous, distraction from our daily lives. I really want to do something to shut the damn thing off.

Guy RavehMar 24 20236

This isn't personal, but I downvoted because I think Metaculus forecasts about this aren't more reliable than chance, and people shouldn't defer to them.

GabeMMar 24 202312

aren't more reliable than chance

Curious what you mean by this. One version of chance is "uniform prediction of AGI over future years" which obviously seems worse than Metaculus, but perhaps you meant a more specific baseline?

Personally, I think forecasts like these are rough averages of what informed individuals would think about these questions. Yes, you shouldn't defer to them, but it's also useful to recognize how that community's predictions have changed over time.

Vasco Grilo🔸Mar 24 202311

Hi Gabriel,

I am not sure how much to trust Metaculus' in general, but I do not think it is obvious that their AI predictions should be ignored. For what is worth, Epoch attributed a weight of 0.23 to Metaculus in the judgement-based forecasts of their AI Timelines review. Holden, Ajeya and AI Impacts got smaller weights, whereas Samotsvety got a higher one:

However, one comment I made here may illustrate what Guy presumably is referring to:

The mean Brier scores of Metaculus' predictions (and Metaculus' community predictions) are (from here):
For all the questions:
At resolve time (N = 1,710), 0.087 (0.092).
For 1 month prior to resolve time (N = 1,463), 0.106 (0.112).
For 6 months (N = 777), 0.109 (0.127).
For 1 year (N = 334), 0.111 (0.145).
For 3 years (N = 57), 0.104 (0.133).
For 5 years (N = 8), 0.182 (0.278).
For the questions of the category artificial intelligence:
At resolve time (N = 46), 0.128 (0.198).
For 1 month prior to resolve time (N = 40), 0.142 (0.205).
For 6 months (N = 21), 0.119 (0.240).
For 1 year (N = 13), 0.107 (0.254).
For 3 years (N = 1), 0.007 (0.292).
Note:
For the questions of the category artificial intelligence:
Metaculus' community predictions made earlier than 6 months prior to resolve time perform as badly or worse than always predicting 0.5, as their mean Brier score is similar or higher than 0.25. [Maybe this is what Guy is pointing to.]
Metaculus' predictions perform significantly better than Metaculus' community predictions.
Questions for which the Brier score can be assessed for a longer time prior to resolve, i.e. the ones with longer lifespans, tend to have lower base rates (I found a correlation of -0.129 among all questions). This means it is easier to achieve a lower Brier score:
Predicting 0.5 for a question whose base rate is 0.5 will lead to a Brier score of 0.25 (= 0.5*(0.5 - 1)^2 + (0.5 - 0)*(0.5 - 0)^2).
Predicting 0.1 for a question whose base rate is 0.1 will lead to a Brier score of 0.09 (= 0.1*(0.1 - 1)^2 + (1 - 0.1)*(0.1 - 0)^2).

GabeMMar 25 20233

Agree that they shouldn't be ignored. By "you shouldn't defer to them," I just meant that it's useful to also form one's own inside view models alongside prediction markets (perhaps comparing to them afterwards).

Guy RavehMar 25 20237

What I mean is "these forecasts give no more information than flipping a coin to decide whether AGI would come in time period A vs. time period B".

I have my own, rough, inside views about if and when AGI will come and what it would be able to do, and I don't find it helpful to quantify them into a specific probability distribution. And there's no "default distribution" here that I can think of either.

GabeMMar 27 20231

Gotcha, I think I still disagree with you for most decision-relevant time periods (e.g. I think they're likely better than chance on estimating AGI within 10 years vs 20 years)

[anonymous]Mar 25 20234

Can someone please explain why we're still forecasting the weak AGI timeline? I thought "sparks" of AGI as Microsoft claimed GPT-4 achieved should already be more than the level of intelligence implied by "weak".

Erich_Grunewald 🔸Mar 25 20236

The answer is that the question in question is not actually forecasting weak AGI, it's forecasting these specific resolution criteria:

For these purposes we will thus define "AI system" as a single unified software system that can satisfy the following criteria, all easily completable by a typical college-educated human.
Able to reliably pass a Turing test of the type that would win the Loebner Silver Prize.
Able to score 90% or more on a robust version of the Winograd Schema Challenge, e.g. the "Winogrande" challenge or comparable data set for which human performance is at 90+%
Be able to score 75th percentile (as compared to the corresponding year's human students; this was a score of 600 in 2016) on all the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages and having less than ten SAT exams as part of the training data. (Training on other corpuses of math problems is fair game as long as they are arguably distinct from SAT exams.)
Be able to learn the classic Atari game "Montezuma's revenge" (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play (see closely-related question.)

David Mathers🔸Mar 25 20233

Remember that AGI is a pretty vague term by itself, and some people are forecasting on the specific definition under the Metaculus questions. This matters because those definitions don't require anything inherently transformative like us being able to automate all labour, or scientific research. Rather they involve a bunch of technical benchmarks that aren't that important on their own, which are being presumed to correlate with the transformative stuff we actually care about.

Greg_Colbourn ⏸️ Mar 25 20231

See also the recent Lex Fridman Twitter poll [H/T Max Ra]: