AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

Greg_Colbourn ⏸️

Artificial General Intelligence (AGI) poses an extinction risk to all known biological life. Given the stakes involved -- the whole world -- we should be looking at 10% chance-of-AGI-by timelines as the deadline for catastrophe prevention (a global treaty banning superintelligent AI), rather than 50% (median) chance-of-AGI-by timelines, which seem to be the default^[1].

It’s way past crunch time already: 10% chance of AGI this year!^[2] AGI will be able to automate further AI development, leading to rapid recursive self-improvement to ASI (Artificial Superintelligence). Given alignment/control is not going to be solved in 2026, and if anyone builds it [ASI], everyone dies (or at the very least, the risk of doom is uncomfortably high by most estimates), a global Pause of AGI development is an urgent immediate priority. This is an emergency. Thinking that we have years to prevent catastrophe is gambling a huge amount of current human lives, let alone all future generations and animals.

To borrow from Stuart Russell's analogy: if there was a 10% chance of aliens landing this year^[3], humanity would be doing a lot more than we are currently doing^[4]. AGI is akin to an alien species more intelligent than us that is unlikely to share our values.

^{^}
This is an updated version of this post of mine from 2022.
^{^}
In the answer under “Why 80% Confidence?” on the linked page, it says “there's roughly a 10% chance AGI arrives before [emphasis mine] the lower bound”, so before 2027; i.e. 2026. See also: the task time horizon trends from METR. You might want to argue that 10% is actually next year (2027), based on other forecasts such as this one, but that only makes things slightly less urgent - we’re still in a crisis if we might only have 18 months.
^{^}
This is different to the original analogy, which was an email saying: "People of Earth: We will arrive on your planet in 50 years. Get ready." Say astronomers spotted something that looked like a spacecraft, heading in our direction, and estimated there was 10% chance that it was indeed an alien spacecraft.

^{^}

Although perhaps we wouldn't. Maybe people would endlessly argue about whether the evidence is strong enough to declare a 10%(+) probability. Or flatly deny it.

Show all footnotes

16 Reactions

Comments19

Sorted by

New & upvoted

Click to highlight new comments since: Today at 8:40 AM

fergusqJan 620

The conclusions of this post are based on a misunderstanding on the definition of AGI. The linked forecasts mainly contain bad indicators of AGI instead of a robust definition. None of these indicators actually imply that the "AGI" meeting them would be dangerous or catastrophic to humanity and do not merit the sensationalist tone of the text.

Indicators

The "Weak AGI" Metaculus question includes four indicators:

Passing a "silver" Turing test,
human-level score on a "robust version" of the Winograd Schema Challenge,
score 75th percentile in a "circa-2015-2020" standard SAT exam mathematics section,^[1] and
"learn" and complete a game of Montezuma's Revenge in less than 100 hours of real-time play.

Aside from the Turing test, the three other criteria are simple narrow tasks that contain no element of learning^[2], there is nothing to indicate that such a system would be good at any other task. Since these tasks are not dangerous, a system able to perform them wouldn't be dangerous either, unless we take into account further assumptions, which the question does not mention. Since training a model on specific narrow tasks is much easier than creating a true AGI, it is to be expected that if someone creates such as system, it is likely not an AGI.

It is also not only this "Weak AGI" question that is like this. In fact, the "Strong AGI" question from Metaculus is also simply a list of indicators, none of which implies any sort of generality. Aside from an "adversarial" Turing test, it contains the task of assembling a model car with instructions, performing programming challenges and answering multiple-choice questions, none of which requires the model to be able to generalize outside of these tasks.

It would not surprise me if some AI lab specifically made a system that performs well on these indicators just to gain media attention for their supposed "AGI".

Turing Test

In addition to the narrow tasks, the other indicator used by these forecasts is the Turing test. While the Turing test is not a narrow task, it has a lot of other issues: the result is highly dependent on the persons conducting the test (including the interrogator and the human interviewee) and their knowledge of the system and of each other. While an ideal adversarial Turing test would be a very difficult task for an AI system, ensuring these ideal conditions is often not feasible. Therefore, I'm certainly going to expect news that AI systems will pass some form of the adversarial test, but this is to be taken only as limited evidence of the system's generality.

^{^}
It puzzles me why they include a range of years. Since models are trained on vast datasets, it is very likely that they have seen most SAT exams from this range. It therefore makes no sense to use an old exam as a benchmark.
^{^}
Montezuma's Revenge contains an element of "learning" the game in a restricted amount of time. However, the question fails to constrain this by any means: for example, training the model with very similar games and then fine-tuning it with less than 100 hours of Montezuma's Revenge would be enough for passing the criterion.

Greg_Colbourn ⏸️ Jan 6*2

None of these indicators actually imply that the "AGI" meeting them would be dangerous or catastrophic to humanity

Thanks of pointing this out. There was indeed a reasoning step missing from the text. Namely: such AGI would be able to automate further AI development, leading to rapid recursive self-improvement to ASI (Artificial Superintelligence). And it is ASI that will be lethally intelligent to humanity (/all biological life). I've amended the text.

there is nothing to indicate that such a system would be good at any other task

The whole point of having the 4 disparate indicators is that they have to be done by a single unified system (not specifically trained for only those tasks)^[1]. Such a system would implicitly be general enough to do many other tasks. Ditto with the Strong AGI question.

While an ideal adversarial Turing test would be a very difficult task for an AI system, ensuring these ideal conditions is often not feasible. Therefore, I'm certainly going to expect news that AI systems will pass some form of the adversarial test

That is what both the Turing Test questions are all about! (Look at the success conditions in the fine print.)

^{^}
Metaculus: "By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on an SAT problem or Winograd schema question, or verbally report its progress and identify objects during videogame play. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)"

fergusqJan 67

The whole point of having the 4 disparate indicators is that they have to be done by a single unified system (not specifically trained for only those tasks)^[1]. Such a system would implicitly be general enough to do many other tasks. Ditto with the Strong AGI question.

If you read the formulation carefully, you'll notice that it actually doesn't say anything about the system not being trained specifically for those tasks. It only says that it must be a single unified system. It is entirely possible to train a single neural network on four separate tasks and have it perform well on all of those without it generalizing well on other categories of tasks.

Amusingly they even exclude introspection from their definition even though that is a property that a real general intelligence should have. A system without some introspection couldn't know what tasks it cannot perform or identify flaws in its operation, and thus couldn't really learn new capabilities in a targeted way. They quite explicitly say that its reasoning or reports on its progress can be hallucinated.

That is what both the Turing Test questions are all about! (Look at the success conditions in the fine print.)

Their conditions are really vague and leave a lot of practicalities out. There are a lot of footguns in conducting a Turing test. It is also uncertain what does passing a Turing test, even if it is indeed rigorous, mean. It's not clear that this would imply the sort of dangerous consequences you talk about in your post.

Thanks of pointing this out. There is indeed a reasoning step missing from the text. Namely: such AGI would be able to automate further AI development, leading to rapid recursive self-improvement to ASI (Artificial Superintelligence). And it is ASI that will be lethally intelligent to humanity (/all biological life). I've amended the text.

Because the forecasts do not concern a kind of system that would be able to do recursive self-improvements (none of the indicators have anything to do with it), I don't see how this reasoning can work.

Greg_Colbourn ⏸️ Jan 61

See the quote in the footnote: "a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems."

the forecasts do not concern a kind of system that would be able to do recursive self-improvements (none of the indicators have anything to do with it)

The indicators are all about being human level at ~everything kind of work a human can do. That includes AI R&D. And AIs are already known to think (and act) much faster than humans, and that will only become more pronounced as the AGI improves itself; hence the "rapid recursive self-improvement".

Even if it takes a couple of years, we would probably cross a point of no return not long after AGI.

fergusqJan 69

I do see the quote. It seems there is something unclear about its meaning. A single neural net trained on multiple tasks is not a "cobbled together set of sub-systems". Neural nets are unitary systems in the sense that you cannot separate them into multiple subsystems, as opposed to ensemble systems that do have clear subsystems.

Modern LLMs are a good example of such unitary neural nets. It is possible to train (or fine-tune) an LLM for certain tasks, and the same weights would perform all those tasks without any subsystems. Due to the generalization property of neural network training, the LLM might also be good at tasks resembling the tasks in the training set. But this is quite limited: in fact, fine-tuning on one task probably makes the network worse at non-similar tasks.

Quite concretely speaking, it is imaginable that someone could take an existing LLM, GPT-5 for example, and fine-tune it to solve SAT math questions, Winogrande schemas, and play Montezuma's Revenge. The fine-tuned GPT-5 would be a unitary system: there wouldn't be a separate Montezuma subsystem that could be identified from the network, the same weights would handle all of those tasks. And the system could do all the things they mention ("explain its reasoning on an SAT problem or Winograd schema question, or verbally report its progress and identify objects during videogame play").

My critique is based on how they have formulated their Metaculus question. Now, it is possible that some people interpret it differently than I and assume things that are not explicitly said in their formulation. In that case, the whole forecast becomes unreliable as we cannot agree that all forecasters have the same interpretation, in which case we couldn't use the forecast for argumentation at all.

Greg_Colbourn ⏸️ Jan 72

Ok, I take your point. But no one seems to be actually doing this (seems like it would be possible to do already, for this example; yet it hasn't been done.)

What do you think a good resolution criteria for judging a system as being AGI should be?

Most relevant to X-risk concerns would be the ability to do A(G)I R&D as good as top AGI company workers. But then of course we run into the problem of crossing the point of no return in order to resolve the prediction market. And we obviously shouldn't do that (unless superalignment/control is somehow solved).

fergusqJan 78

AGI is a pretty meaningless word as people define it so differently (if they bother to define it at all). I think people should more accurately describe what they mean it when they use it.

In your case, since automated AI research is what you care about, it would make most sense to forecast that directly (or some indicator assuming it is a good indicator). For automated research to be useful, it should produce some significant and quantifiable breakthroughs. How this should exactly be defined is up for debate and would require a lot of work and careful thoughts, which sadly isn't given for an average Metaculus question.

To give an example for how difficult it is to define such a question properly, look a this Metaculus forecast that concerns AI systems that can design other AI systems. It has the following condition:

This question will resolve on the date when an AI system exists that could (if it chose to!) successfully comply with the request "build me a general-purpose programming system that can write from scratch a deep-learning system capable of transcribing human speech."

In the comment section, there are people arguing that this condition is already met. It is in fact not very difficult to train an AI system (it just requires a lot of compute). You can just pull top ASR datasets from Huggingface, use a <100 hundred line standard training script for a standard neural architecture, and you have your deep-learning system capable of transcribing human speech, completely "from scratch". Any modern coding LLM can write this program for you.

Adding the additional bootstrapping step of first training a coding model and then training the ASR model is no issue, just pull standard pretraining and coding datasets and use the similar procedure. (Training coding LLMs is not practical for most people since it requires an enormous amount of compute, but this is not relevant for the resolve condition.)

Of course, none of this is really useful, because while you can do what the Metaculus question asks, all this can do is train subpar models with standard architectures. So I think some people interpret the question differently. Maybe they take "from scratch" to mean that the neural architecture should be novel, designed anew by the AI. That would indeed be much more reasonable, since that kind of system could be used to do research on possible new architectures. This is supported by the following paragraph in the background section (emphasis original):

If an AI/ML system could become competent enough at programming that it could design a system (to some specification) that can itself design other systems, then it would presumably be sophisticated enough that it could also design upgrades or superior alternatives to itself, leading to recursive self-improvement that could dramatically increase the system's capability on a potentially short timescale.

The logic in this paragraph does not work. It assumes that a system that can design a system to some specification (and this system could design a system...) can also design upgrades and this would lead to recursive self-improvement. But I cannot see how it follows that designing a system based on a specification (e.g., a known architecture) leads to the ability to design a system without a specification.

Recursive self-improvement would also require that the new designed system is better than the old system, but this is by default not the case. Indeed, it is very easy to just produce randomized neural architectures that work but are just bad. Any modern coding LLM can write you a code for a hallucinated architecture. The ability to design a system is not the same as the ability to design a "good" system, which itself is a very difficult thing to define.

The bottom line here is that this question is written with unstated assumptions. One of these assumptions seems to be that the system can design a system better than itself, but this is not included in the resolve condition. Since we can only guess what the original intention was, and there certainly seem to be multiple interpretations among the forecasters, this question as a whole doesn't really forecast anything. It would require a lot of work and effort to define these questions properly to avoid these issues.

Marcus Abramovitch 🔸Jan 511

I don't think the probability of AGI this year is above 10%, nor do I think that doom, given AGI is above 10%

Greg_Colbourn ⏸️ Jan 53

But a lot of informed people do (i.e. an aggregation of forecasts). What would you do if you did believe both of those things?

Marcus Abramovitch 🔸Jan 52

I'm not sure. I definitely think there is a chance that if I earnestly believed those things, I would go fairly crazy. I empathize with those who, in my opinion, are overreacting.

Greg_Colbourn ⏸️ Jan 63

I think this is pretty telling. I've also had a family member say a similar thing. If your reasoning is (at least partly) motivated by wanting to stay sane, you probably aren't engaging with the arguments impartially.

I would bet a decent amount of money that you would not in fact, go crazy. Look to history to see how few people went crazy over the threat of nuclear annihilation in the Cold War (and all the other things C.S. Lewis refers to in the linked quote).

Marcus Abramovitch 🔸Jan 112

I don't think my reasoning is around wanting to stay sane at all. Most of my reasoning revolves around base rates, examining current systems, and assessing current progress.

I think I'm in this peculiar position where I have ~medium (maybe many reading this would consider them to be long) timelines and fairly low "p(doom)", and I still think AI risk is among the most important things to work on.

Greg_Colbourn ⏸️ Jan 122

I don't think many people biased in such a way are going to even be particularly aware of it when making arguments, let alone admit to it. It's mostly a hidden bias. You really don't want it to be true because of how you think it will affect you if it was.

Thinking AI Risk is among the most important things to work on is one thing. Thinking your life depends on minimising it is another.

Marcus Abramovitch 🔸Jan 122

I obviously can't prove I am not biased in such a way, but I don't think that is a fair assumption.

Greg_Colbourn ⏸️ Jan 65

One option, if you want to do a lot more about it than you currently are, is Pause House. Another is donating to PauseAI (US, Global). In my experience, being pro-active about the threat does help.

Greg_Colbourn ⏸️ Jan 54

If you want to share this, especially to people outside the EA community, I've also posted it on X and Substack.

Yarrow Bouchard 🔸Jan 5*-5