F

fergusq

134 karmaJoined

Comments
12

While it might feel to you that AI progress has been rapid in the past decade, most innovations behind it such as neural networks, gradient descent, backpropagation, and the concept of language models are very old innovations. The only major innovation in the past decade is the Transformer architecture from 2017, and almost everything else is just incremental progress and scaling on larger models and datasets. Thus, the pace of AI architecture development is very slow and the idea that a groundbreaking new AGI architecture will surface has a low probability.

ASI is the ultimate form of AI and in some sense computer science as a whole. Claiming that we will reach it just because we've just got started in computer science seems premature, akin to claiming that physics is soon solved just because we've made so much progress recently. Science (and AI in particular) is often compared to an infinite ladder: you can take as many steps as you like, and there will still be infinite steps ahead. I don't believe there are literally infinite steps to ASI, but assuming there must be only a few steps ahead just because there are a lot of steps behind is a fallacy.

I was recently reading Ada Lovelace's "Translator's Notes" from 1843, and came across this timeless quote (emphasis original):

It is desirable to guard against the possibility of exaggerated ideas that might arise as to the powers of the Analytical Engine. In considering any new subject, there is frequently a tendency, first, to overrate what we find to be already interesting or remarkable; and, secondly, by a sort of natural reaction, to undervalue the true state of the case, when we do discover that our notions have surpassed those that were really tenable.

This is a comment to the text by Luigi Menabrea she was translating, in which he was hyping that the "conceptions of intelligence" could be encoded into the instructions of the Analytical Engine[1]. Having a much better technical understanding of the machine than Menabrea, Lovelace was skeptical of his ideas and urged him to calm down.

The rest of their discussion is much more focused on concrete programs the machine could execute, but this short quote stroke me as very familiar of our current discussion. There existed (some level of) scientific discussion of artificial intelligence in 1840s, and their talking points seem so similar to ours, with some hyping and others being skeptical!

From the perspective of Lovelace and Menabrea, computer science progressed incredibly fast. Babbage's Analytical Engine was a schematic for a working computer that was much better than earlier plans such as the Differential Engine. Designing complex programs became possible. I can feel their excitement while reading their texts. But even despite this, it took a hundred years until ENIAC, the first general purpose digital computer, was built in 1945. The fact that a field progresses fast in its early days does not mean much when predicting its progress in future.

  1. ^

    The quote she was commenting: "Considered under the most general point of view, the essential object of the machine being to calculate, according to the laws dictated to it, the values of numerical coefficients which it is then to distribute appropriately on the columns which represent the variables, it follows that the interpretation of formulae and of results is beyond its province, unless indeed this very interpretation be itself susceptible to expression by means of the symbols which the machine employs. Thus, although it is not itself the being that reflects, it may yet be considered as the being which executes the conceptions of intelligence."

Since you ask the viewpoint of those who disagree, here is a summary of my objections to your argument. It consists of two parts, first my objection to your probability of AI risk and then my objection to your conclusion.

  1. It’s just a matter of time until humanity develops artificial superintelligence (ASI). There’s no in-principle barrier to such technology, nor should we by default expect sociopolitical barriers to automatically prevent the innovation.
    1. Indeed, we can’t even be confident that it’s more than a decade away.
    2. Reasonable uncertainty should allow at least a 1% chance that it occurs within 5 years (let alone 10).

A reasonable prior is that we will not develop ASI in near future (out of all possible decades, each single decade has a very small probability of ASI being developed, way less than 1%). To overcome this prior, we would need evidence. However, there is little to no evidence that suggests that any AGI/ASI technologies are possible in near future.

It is clear that our current LLM tech is not sufficient for AGI, lacking several properties that an AGI would require, such as learning-planning[1]. Since the current progress is not going towards an AGI, it does not count as good evidence for AGI technology surfacing in near future.

  1. We should not neglect credible near-term risks of human disempowerment or even extinction. Such risks warrant urgent further investigation and investment in precautionary measures.
    1. If there’s even a 1% chance that, within a decade, we’ll develop technology that we can’t be confident humanity would survive—that easily qualifies as a “credible near-term risk” for purposes of applying this principle.

I'm a firm believer of the neglectedness, tractability and importance framework whenever deciding on possible interventions. Therefore, if the question is should we neglect a risk, first thing to ask is, do others neglect it. In the case of AI risk, the answer is, in my opinion, no. AI risk is not neglected. It is, in fact, taken very seriously by major AI companies, numerous other organizations, and even some governments. AI is researched in almost every university on our planet, and massive funds are used for AI safety research. So I believe AI risk fails the neglectedness criterion.

But even more crucially, I think it also fails tractability. Because the AGI technology does not exist, we cannot research it. Most so called "AI safety research" focuses on unimportant sidetracks that do not have any measurable effect on AI risk. Similarly, it is very difficult to establish any governmental policy to limit AI development, as we do not even know what kind of technology we need to regulate aside from a blanket ban on AI research, which most our politicians correctly deem would be an overreaching and harmful policy, since current AI tech is harmless from the X-risk viewpoint (and there would be no way out of that ban since we cannot research safety of non-existing tech).

I do not believe AI risk is important as there is no good reason to believe we will develop ASI in near future. But even if we believed so, it fails the two other criteria of the ITN framework and thus would not be a good target for interventions.

  1. ^

    Learning-planning is what I call the ability to assess one's own abilities and efficiently learn missing abilities in a targeted way. Currently, machine learning algorithms are extremely inefficient, and models lack introspection capabilities required to assess missing abilities.

AGI is a pretty meaningless word as people define it so differently (if they bother to define it at all). I think people should more accurately describe what they mean it when they use it.

In your case, since automated AI research is what you care about, it would make most sense to forecast that directly (or some indicator assuming it is a good indicator). For automated research to be useful, it should produce some significant and quantifiable breakthroughs. How this should exactly be defined is up for debate and would require a lot of work and careful thoughts, which sadly isn't given for an average Metaculus question.


To give an example for how difficult it is to define such a question properly, look a this Metaculus forecast that concerns AI systems that can design other AI systems. It has the following condition:

This question will resolve on the date when an AI system exists that could (if it chose to!) successfully comply with the request "build me a general-purpose programming system that can write from scratch a deep-learning system capable of transcribing human speech."

In the comment section, there are people arguing that this condition is already met. It is in fact not very difficult to train an AI system (it just requires a lot of compute). You can just pull top ASR datasets from Huggingface, use a <100 hundred line standard training script for a standard neural architecture, and you have your deep-learning system capable of transcribing human speech, completely "from scratch". Any modern coding LLM can write this program for you.

Adding the additional bootstrapping step of first training a coding model and then training the ASR model is no issue, just pull standard pretraining and coding datasets and use the similar procedure. (Training coding LLMs is not practical for most people since it requires an enormous amount of compute, but this is not relevant for the resolve condition.)

Of course, none of this is really useful, because while you can do what the Metaculus question asks, all this can do is train subpar models with standard architectures. So I think some people interpret the question differently. Maybe they take "from scratch" to mean that the neural architecture should be novel, designed anew by the AI. That would indeed be much more reasonable, since that kind of system could be used to do research on possible new architectures. This is supported by the following paragraph in the background section (emphasis original):

If an AI/ML system could become competent enough at programming that it could design a system (to some specification) that can itself design other systems, then it would presumably be sophisticated enough that it could also design upgrades or superior alternatives to itself, leading to recursive self-improvement that could dramatically increase the system's capability on a potentially short timescale.

The logic in this paragraph does not work. It assumes that a system that can design a system to some specification (and this system could design a system...) can also design upgrades and this would lead to recursive self-improvement. But I cannot see how it follows that designing a system based on a specification (e.g., a known architecture) leads to the ability to design a system without a specification.

Recursive self-improvement would also require that the new designed system is better than the old system, but this is by default not the case. Indeed, it is very easy to just produce randomized neural architectures that work but are just bad. Any modern coding LLM can write you a code for a hallucinated architecture. The ability to design a system is not the same as the ability to design a "good" system, which itself is a very difficult thing to define.

The bottom line here is that this question is written with unstated assumptions. One of these assumptions seems to be that the system can design a system better than itself, but this is not included in the resolve condition. Since we can only guess what the original intention was, and there certainly seem to be multiple interpretations among the forecasters, this question as a whole doesn't really forecast anything. It would require a lot of work and effort to define these questions properly to avoid these issues.

I do see the quote. It seems there is something unclear about its meaning. A single neural net trained on multiple tasks is not a "cobbled together set of sub-systems". Neural nets are unitary systems in the sense that you cannot separate them into multiple subsystems, as opposed to ensemble systems that do have clear subsystems.

Modern LLMs are a good example of such unitary neural nets. It is possible to train (or fine-tune) an LLM for certain tasks, and the same weights would perform all those tasks without any subsystems. Due to the generalization property of neural network training, the LLM might also be good at tasks resembling the tasks in the training set. But this is quite limited: in fact, fine-tuning on one task probably makes the network worse at non-similar tasks.

Quite concretely speaking, it is imaginable that someone could take an existing LLM, GPT-5 for example, and fine-tune it to solve SAT math questions, Winogrande schemas, and play Montezuma's Revenge. The fine-tuned GPT-5 would be a unitary system: there wouldn't be a separate Montezuma subsystem that could be identified from the network, the same weights would handle all of those tasks. And the system could do all the things they mention ("explain its reasoning on an SAT problem or Winograd schema question, or verbally report its progress and identify objects during videogame play").

My critique is based on how they have formulated their Metaculus question. Now, it is possible that some people interpret it differently than I and assume things that are not explicitly said in their formulation. In that case, the whole forecast becomes unreliable as we cannot agree that all forecasters have the same interpretation, in which case we couldn't use the forecast for argumentation at all.

The whole point of having the 4 disparate indicators is that they have to be done by a single unified system (not specifically trained for only those tasks)[1]. Such a system would implicitly be general enough to do many other tasks. Ditto with the Strong AGI question.


If you read the formulation carefully, you'll notice that it actually doesn't say anything about the system not being trained specifically for those tasks. It only says that it must be a single unified system. It is entirely possible to train a single neural network on four separate tasks and have it perform well on all of those without it generalizing well on other categories of tasks.

Amusingly they even exclude introspection from their definition even though that is a property that a real general intelligence should have. A system without some introspection couldn't know what tasks it cannot perform or identify flaws in its operation, and thus couldn't really learn new capabilities in a targeted way. They quite explicitly say that its reasoning or reports on its progress can be hallucinated.

That is what both the Turing Test questions are all about! (Look at the success conditions in the fine print.)

Their conditions are really vague and leave a lot of practicalities out. There are a lot of footguns in conducting a Turing test. It is also uncertain what does passing a Turing test, even if it is indeed rigorous, mean. It's not clear that this would imply the sort of dangerous consequences you talk about in your post.

Thanks of pointing this out. There is indeed a reasoning step missing from the text. Namely: such AGI would be able to automate further AI development, leading to rapid recursive self-improvement to ASI (Artificial Superintelligence). And it is ASI that will be lethally intelligent to humanity (/all biological life). I've amended the text.

Because the forecasts do not concern a kind of system that would be able to do recursive self-improvements (none of the indicators have anything to do with it), I don't see how this reasoning can work.

The conclusions of this post are based on a misunderstanding on the definition of AGI. The linked forecasts mainly contain bad indicators of AGI instead of a robust definition. None of these indicators actually imply that the "AGI" meeting them would be dangerous or catastrophic to humanity and do not merit the sensationalist tone of the text.

Indicators

The "Weak AGI" Metaculus question includes four indicators:

  1. Passing a "silver" Turing test,
  2. human-level score on a "robust version" of the Winograd Schema Challenge,
  3. score 75th percentile in a "circa-2015-2020" standard SAT exam mathematics section,[1] and
  4. "learn" and complete a game of Montezuma's Revenge in less than 100 hours of real-time play.

Aside from the Turing test, the three other criteria are simple narrow tasks that contain no element of learning[2], there is nothing to indicate that such a system would be good at any other task. Since these tasks are not dangerous, a system able to perform them wouldn't be dangerous either, unless we take into account further assumptions, which the question does not mention. Since training a model on specific narrow tasks is much easier than creating a true AGI, it is to be expected that if someone creates such as system, it is likely not an AGI.

It is also not only this "Weak AGI" question that is like this. In fact, the "Strong AGI" question from Metaculus is also simply a list of indicators, none of which implies any sort of generality. Aside from an "adversarial" Turing test, it contains the task of assembling a model car with instructions, performing programming challenges and answering multiple-choice questions, none of which requires the model to be able to generalize outside of these tasks.

It would not surprise me if some AI lab specifically made a system that performs well on these indicators just to gain media attention for their supposed "AGI".

Turing Test

In addition to the narrow tasks, the other indicator used by these forecasts is the Turing test. While the Turing test is not a narrow task, it has a lot of other issues: the result is highly dependent on the persons conducting the test (including the interrogator and the human interviewee) and their knowledge of the system and of each other. While an ideal adversarial Turing test would be a very difficult task for an AI system, ensuring these ideal conditions is often not feasible. Therefore, I'm certainly going to expect news that AI systems will pass some form of the adversarial test, but this is to be taken only as limited evidence of the system's generality.

  1. ^

    It puzzles me why they include a range of years. Since models are trained on vast datasets, it is very likely that they have seen most SAT exams from this range. It therefore makes no sense to use an old exam as a benchmark.

  2. ^

    Montezuma's Revenge contains an element of "learning" the game in a restricted amount of time. However, the question fails to constrain this by any means: for example, training the model with very similar games and then fine-tuning it with less than 100 hours of Montezuma's Revenge would be enough for passing the criterion.

It seems to me that you are missing my point. I'm not trying to dismiss or debunk Aschenbrenner. My point is to call out that what he is doing is harmful to everyone, including those who believe AGI is imminent.

If you believe that AGI is coming soon, then shouldn't you try to convince other people of this? If so, shouldn't you be worried that people like Aschenbrenner ruin that by presenting themselves like conspiracy theorists?

We must engage at the object level. [...] We will have plenty of problems with the rest of the world doing its standard vibes-based thinking and policy-making. The EA community needs to do better.

Yes! That is why what Aschenbrenner is doing is so harmful, he is using an emotional or narrative argument instead of a real object-level argument. Like you say, we need to do better.

The author's object-level claim is that they don't think AGI is immanent. Why? How sure are you? How about we take some action or at least think about the possibility [...]

I have read the technical claims made by Aschenbrenner and many other AI optimists, and I'm not convinced. There is no evidence for any kind of general intelligence abilities surfacing in any of the current AI systems. People have been trying to do that for decades, and for the part couple of years, but there has been almost no progress on that front at all (in-context learning is one of the biggest ones I can think, and it can hardly even be called learning). While I do think that some action can be taken, what Aschenbrenner suggests is, as I iterate in my text, too much given our current evidence. Extraordinary claims require extraordinary evidence, as it is said.

Yeah, with the word "capability" I meant completely new capabilities (in Aschenbrenner's case, the relevant new capabilities would be general intelligence abilities such as the learning-planning ability), but I can see that for example object permanence could be called a new capability. I maybe should have used a better word there. Basically, my argument is that while the image generators have become better at generating images, they haven't gotten anything that would take them nearer towards AGI.

I'll grant you, as does he, that unhobbling is hand-wavy and hard to measure (although that by no means implies it isn't real).

I'm not claiming that unhobbling isn't real, and I think that the mentioned improvements such as CoT and scaffolding etc. really do make models better. But do they make them exponentially better? Can we expect the increases to continue exponentially in the future? I'm going to say no. So I think it's unsubstantiated to measure them with orders of magnitude.

But we can certainly measure floating point operations! So accusing him of using "OOMs" as a unit, and one that is unmeasurable/detached from reality, surprises me.

Most of the time, when he says "OOM", he doesn't refer to FLOPs, he refers to the abstract OOMs that somehow encompass all three axes he mentioned. So while some of it is measurable, as a whole it is not.

The problem is not what "order of magnitude" means in general. The problem is that the text leaves it unclear what is being measured. Order of magnitude of what? Compute? Effective compute? Capabilities?

What I meant by "made up" is that it's not any real, actual thing which we can measure. It is not a technical/mathematical unit, it is a narrative unit. The narrative is that something (effective compute, capabilities or some other ill-defined thing) grows exponentially. It is a story, not a real technical argument substantiated by real-life evidence. As I say in my text, many of the examples given by him are actually counterexamples to the presented argument.

So "made up" means "exist inside the narrative" instead of "exist in the real world". I should have made this point clearer in my post, or figure out a better word than "made up".

Load more