Just a quick update on predicted timelines. Obviously, there’s no guarantee that Metaculus is 100% reliable + you should look at other sources as well, but I find this concerning.

Weak AGI is now predicted in a little over two years:

https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/

AGI predicted in about 10: https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/

Also, these are predicted dates until these systems publicly known, not the date until they exist. Things are getting crazy.

Even though Eliezer claims that there was no fire alarm for AGI, perhaps this is the fire alarm?

27

0
0

Reactions

0
0
Comments12


Sorted by Click to highlight new comments since:

Might as well make an alternate prediction here: 

There will be no AGI in the next 10 years. There will be an AI bubble over the next couple of years as new applications for deep learning proliferate, creating a massive hype cycle similar to the dot-com boom. 

This bubble will die down or burst when people realize the limitations of deep learning in domains that lack gargantuan datasets. It will fail to take hold in domains where errors cause serious damage (see the unexpected difficulty of self-driving cars). Like with the burst of the dot-com bubble, people will continue to use AI a lot for the applications that it is actually good at. 

If AGI does occur, it will be decades away at least, and require further conceptual breakthroughs and/or several orders of magnitude higher computing power. 

I think, in hindsight, the Fire Alarm first started ringing in a DeepMind building in 2017. Or perhaps an OpenAI building in 2020. It's certainly going off all over Microsoft now. It's also going off in many other places. To some of us it is already deafening. A huge, ominous, distraction from our daily lives. I really want to do something to shut the damn thing off.

This isn't personal, but I downvoted because I think Metaculus forecasts about this aren't more reliable than chance, and people shouldn't defer to them.

aren't more reliable than chance

Curious what you mean by this. One version of chance is "uniform prediction of AGI over future years" which obviously seems worse than Metaculus, but perhaps you meant a more specific baseline?

Personally, I think forecasts like these are rough averages of what informed individuals would think about these questions. Yes, you shouldn't defer to them, but it's also useful to recognize how that community's predictions have changed over time.

Hi Gabriel,

I am not sure how much to trust Metaculus' in general, but I do not think it is obvious that their AI predictions should be ignored. For what is worth, Epoch attributed a weight of 0.23 to Metaculus in the judgement-based forecasts of their AI Timelines review. Holden, Ajeya and AI Impacts got smaller weights, whereas Samotsvety got a higher one:

However, one comment I made here may illustrate what Guy presumably is referring to:

The mean Brier scores of Metaculus' predictions (and Metaculus' community predictions) are (from here):

  • For all the questions:
    • At resolve time (N = 1,710), 0.087 (0.092).
    • For 1 month prior to resolve time (N = 1,463), 0.106 (0.112).
    • For 6 months (N = 777), 0.109 (0.127).
    • For 1 year (N = 334), 0.111 (0.145).
    • For 3 years (N = 57), 0.104 (0.133).
    • For 5 years (N = 8), 0.182 (0.278).
  • For the questions of the category artificial intelligence:
    • At resolve time (N = 46), 0.128 (0.198).
    • For 1 month prior to resolve time (N = 40), 0.142 (0.205).
    • For 6 months (N = 21), 0.119 (0.240).
    • For 1 year (N = 13), 0.107 (0.254).
    • For 3 years (N = 1), 0.007 (0.292).

Note:

  • For the questions of the category artificial intelligence:
    • Metaculus' community predictions made earlier than 6 months prior to resolve time perform as badly or worse than always predicting 0.5, as their mean Brier score is similar or higher than 0.25. [Maybe this is what Guy is pointing to.]
    • Metaculus' predictions perform significantly better than Metaculus' community predictions.
  • Questions for which the Brier score can be assessed for a longer time prior to resolve, i.e. the ones with longer lifespans, tend to have lower base rates (I found a correlation of -0.129 among all questions). This means it is easier to achieve a lower Brier score:
    • Predicting 0.5 for a question whose base rate is 0.5 will lead to a Brier score of 0.25 (= 0.5*(0.5 - 1)^2 + (0.5 - 0)*(0.5 - 0)^2).
    • Predicting 0.1 for a question whose base rate is 0.1 will lead to a Brier score of 0.09 (= 0.1*(0.1 - 1)^2 + (1 - 0.1)*(0.1 - 0)^2).

Agree that they shouldn't be ignored. By "you shouldn't defer to them," I just meant that it's useful to also form one's own inside view models alongside prediction markets (perhaps comparing to them afterwards).

What I mean is "these forecasts give no more information than flipping a coin to decide whether AGI would come in time period A vs. time period B".

I have my own, rough, inside views about if and when AGI will come and what it would be able to do, and I don't find it helpful to quantify them into a specific probability distribution. And there's no "default distribution" here that I can think of either.

Gotcha, I think I still disagree with you for most decision-relevant time periods (e.g. I think they're likely better than chance on estimating AGI within 10 years vs 20 years)

[anonymous]4
0
0

Can someone please explain why we're still forecasting the weak AGI timeline? I thought "sparks" of AGI as Microsoft claimed GPT-4 achieved should already be more than the level of intelligence implied by "weak".

The answer is that the question in question is not actually forecasting weak AGI, it's forecasting these specific resolution criteria:

For these purposes we will thus define "AI system" as a single unified software system that can satisfy the following criteria, all easily completable by a typical college-educated human.

  • Able to reliably pass a Turing test of the type that would win the Loebner Silver Prize.
  • Able to score 90% or more on a robust version of the Winograd Schema Challenge, e.g. the "Winogrande" challenge or comparable data set for which human performance is at 90+%
  • Be able to score 75th percentile (as compared to the corresponding year's human students; this was a score of 600 in 2016) on all the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages and having less than ten SAT exams as part of the training data. (Training on other corpuses of math problems is fair game as long as they are arguably distinct from SAT exams.)
  • Be able to learn the classic Atari game "Montezuma's revenge" (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play (see closely-related question.)

Remember that AGI is a pretty vague term by itself, and some people are forecasting on the specific definition under the Metaculus questions. This matters because those definitions don't require anything inherently transformative like us being able to automate all labour, or scientific research. Rather they involve a bunch of technical benchmarks that aren't that important on their own, which are being presumed to correlate with the transformative stuff we actually care about.

See also the recent Lex Fridman Twitter poll [H/T Max Ra]:

Curated and popular this week
 ·  · 20m read
 · 
Advanced AI could unlock an era of enlightened and competent government action. But without smart, active investment, we’ll squander that opportunity and barrel blindly into danger. Executive summary See also a summary on Twitter / X. The US federal government is falling behind the private sector on AI adoption. As AI improves, a growing gap would leave the government unable to effectively respond to AI-driven existential challenges and threaten the legitimacy of its democratic institutions. A dual imperative → Government adoption of AI can’t wait. Making steady progress is critical to: * Boost the government’s capacity to effectively respond to AI-driven existential challenges * Help democratic oversight keep up with the technological power of other groups * Defuse the risk of rushed AI adoption in a crisis → But hasty AI adoption could backfire. Without care, integration of AI could: * Be exploited, subverting independent government action * Lead to unsafe deployment of AI systems * Accelerate arms races or compress safety research timelines Summary of the recommendations 1. Work with the US federal government to help it effectively adopt AI Simplistic “pro-security” or “pro-speed” attitudes miss the point. Both are important — and many interventions would help with both. We should: * Invest in win-win measures that both facilitate adoption and reduce the risks involved, e.g.: * Build technical expertise within government (invest in AI and technical talent, ensure NIST is well resourced) * Streamline procurement processes for AI products and related tech (like cloud services) * Modernize the government’s digital infrastructure and data management practices * Prioritize high-leverage interventions that have strong adoption-boosting benefits with minor security costs or vice versa, e.g.: * On the security side: investing in cyber security, pre-deployment testing of AI in high-stakes areas, and advancing research on mitigating the ris
saulius
 ·  · 22m read
 · 
Summary In this article, I estimate the cost-effectiveness of five Anima International programs in Poland: improving cage-free and broiler welfare, blocking new factory farms, banning fur farming, and encouraging retailers to sell more plant-based protein. I estimate that together, these programs help roughly 136 animals—or 32 years of farmed animal life—per dollar spent. Animal years affected per dollar spent was within an order of magnitude for all five evaluated interventions. I also tried to estimate how much suffering each program alleviates. Using SADs (Suffering-Adjusted Days)—a metric developed by Ambitious Impact (AIM) that accounts for species differences and pain intensity—Anima’s programs appear highly cost-effective, even compared to charities recommended by Animal Charity Evaluators. However, I also ran a small informal survey to understand how people intuitively weigh different categories of pain defined by the Welfare Footprint Institute. The results suggested that SADs may heavily underweight brief but intense suffering. Based on those findings, I created my own metric DCDE (Disabling Chicken Day Equivalent) with different weightings. Under this approach, interventions focused on humane slaughter look more promising, while cage-free campaigns appear less impactful. These results are highly uncertain but show how sensitive conclusions are to how we value different kinds of suffering. My estimates are highly speculative, often relying on subjective judgments from Anima International staff regarding factors such as the likelihood of success for various interventions. This introduces potential bias. Another major source of uncertainty is how long the effects of reforms will last if achieved. To address this, I developed a methodology to estimate impact duration for chicken welfare campaigns. However, I’m essentially guessing when it comes to how long the impact of farm-blocking or fur bans might last—there’s just too much uncertainty. Background In
 ·  · 2m read
 · 
In my opinion, we have known that the risk of AI catastrophe is too high and too close for at least two years. At that point, it’s time to work on solutions (in my case, advocating an indefinite pause on frontier model development until it’s safe to proceed through protests and lobbying as leader of PauseAI US).  Not every policy proposal is as robust to timeline length as PauseAI. It can be totally worth it to make a quality timeline estimate, both to inform your own work and as a tool for outreach (like ai-2027.com). But most of these timeline updates simply are not decision-relevant if you have a strong intervention. If your intervention is so fragile and contingent that every little update to timeline forecasts matters, it’s probably too finicky to be working on in the first place.  I think people are psychologically drawn to discussing timelines all the time so that they can have the “right” answer and because it feels like a game, not because it really matters the day and the hour of… what are these timelines even leading up to anymore? They used to be to “AGI”, but (in my opinion) we’re basically already there. Point of no return? Some level of superintelligence? It’s telling that they are almost never measured in terms of actions we can take or opportunities for intervention. Indeed, it’s not really the purpose of timelines to help us to act. I see people make bad updates on them all the time. I see people give up projects that have a chance of working but might not reach their peak returns until 2029 to spend a few precious months looking for a faster project that is, not surprisingly, also worse (or else why weren’t they doing it already?) and probably even lower EV over the same time period! For some reason, people tend to think they have to have their work completed by the “end” of the (median) timeline or else it won’t count, rather than seeing their impact as the integral over the entire project that does fall within the median timeline estimate or
Recent opportunities in AI safety
20
Eva
· · 1m read