All of Greg_Colbourn ⏸️ 's Comments + Replies

I don't think many people biased in such a way are going to even be particularly aware of it when making arguments, let alone admit to it. It's mostly a hidden bias. You really don't want it to be true because of how you think it will affect you if it was.

Thinking AI Risk is among the most important things to work on is one thing. Thinking your life depends on minimising it is another.

2
Marcus Abramovitch 🔸
I obviously can't prove I am not biased in such a way, but I don't think that is a fair assumption.

Just thinking: surely to be fair, we should be aggregating all the AI results into an "AI panel"? I wonder how much overlap there is between wrong answers amongst the AIs, and what the aggregate score would be? 

Right now, as things stand with the scoring, "AGI" in ARC-AGI-2 means "equivalent to the combined performance of a team of 400 humans", not "(average) human level".

3
Yarrow Bouchard 🔸
ARC-AGI-2 is not a test of whether a system is an AGI or not. Getting 100% on ARC-AGI-2 would not imply a system is AGI. I guess the name is potentially misleading in that regard. But Chollet et al. are very clear about this. The arxiv.org pre-print explains how the human testing worked. See the section "Human-facing calibration testing" on page 5. The human testers only had a maximum of 90 minutes: The median time spent attempting or solving each task was around 2 minutes: I'm still not entirely sure how the human test process worked from the description in the pre-print, but maybe rather than giving up and walking away, testers gave up on individual tasks in order to solve as many as possible in their allotted time. I think you're probably right about how they're defining "human panel", but I wish this were more clearly explained in the pre-print, on the website, or in the presentations they've done. ---------------------------------------- I can't respond to your comments in the other thread because of the downvoting, so I’ll reply here: 1) Metaculus and Manifold have a huge overlap with the EA community (I'm not familiar with Kashi) and, outside the EA community, people who are interested in AGI often far too easily accept the same sort of extremely flawed stuff that presents itself as way more serious and scientific than it really is (e.g. AI 2027, Situational Awareness, Yudkowsky/MIRI's stuff). 2) I think it's very difficult to know if one is engaging in motivated reasoning, or what other psychological biases are in play. People engage in wishful thinking to avoid unpleasant realities or possibilities, but people also invent unpleasant realities/possibilities, including various scenarios around civilizational collapse or the end of the world (e.g. a lot of doomsday preppers seem to believe in profoundly implausible, pseudoscientific, or fringe religious doomsday scenarios). People seem to be both biased toward believing pleasant and unpleasant things.

Ok, I take your point. But no one seems to be actually doing this (seems like it would be possible to do already, for this example; yet it hasn't been done.)

What do you think a good resolution criteria for judging a system as being AGI should be?

Most relevant to X-risk concerns would be the ability to do A(G)I R&D as good as top AGI company workers. But then of course we run into the problem of crossing the point of no return in order to resolve the prediction market. And we obviously shouldn't do that (unless superalignment/control is somehow solved).

8
fergusq
AGI is a pretty meaningless word as people define it so differently (if they bother to define it at all). I think people should more accurately describe what they mean it when they use it. In your case, since automated AI research is what you care about, it would make most sense to forecast that directly (or some indicator assuming it is a good indicator). For automated research to be useful, it should produce some significant and quantifiable breakthroughs. How this should exactly be defined is up for debate and would require a lot of work and careful thoughts, which sadly isn't given for an average Metaculus question. ---------------------------------------- To give an example for how difficult it is to define such a question properly, look a this Metaculus forecast that concerns AI systems that can design other AI systems. It has the following condition: In the comment section, there are people arguing that this condition is already met. It is in fact not very difficult to train an AI system (it just requires a lot of compute). You can just pull top ASR datasets from Huggingface, use a <100 hundred line standard training script for a standard neural architecture, and you have your deep-learning system capable of transcribing human speech, completely "from scratch". Any modern coding LLM can write this program for you. Adding the additional bootstrapping step of first training a coding model and then training the ASR model is no issue, just pull standard pretraining and coding datasets and use the similar procedure. (Training coding LLMs is not practical for most people since it requires an enormous amount of compute, but this is not relevant for the resolve condition.) Of course, none of this is really useful, because while you can do what the Metaculus question asks, all this can do is train subpar models with standard architectures. So I think some people interpret the question differently. Maybe they take "from scratch" to mean that the neural architectu

The human testers were random people off the street who got paid $115-150 to show up and then an additional $5 per task they solved. I believe the ARC Prize Foundation’s explanation for the 40-point discrepancy is that many of the testers just didn’t feel that motivated to solve the tasks and gave up [my emphasis]. (I vaguely remember this being mentioned in a talk or interview somewhere.)

I'm sceptical of this when they were able to earn $5 for every couple of minutes' work (time to solve a task). This is far above the average hourly wage.

100% is the

... (read more)
2
Greg_Colbourn ⏸️
Just thinking: surely to be fair, we should be aggregating all the AI results into an "AI panel"? I wonder how much overlap there is between wrong answers amongst the AIs, and what the aggregate score would be?  Right now, as things stand with the scoring, "AGI" in ARC-AGI-2 means "equivalent to the combined performance of a team of 400 humans", not "(average) human level".

See the quote in the footnote: "a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems."

the forecasts do not concern a kind of system that would be able to do recursive self-improvements (none of the indicators have anything to do with it)

The indicators are all about being human level at ~everything kind of work a human can do. That includes AI R&D. And AIs are already known to think (and act) much faster than humans, and that will on... (read more)

9
fergusq
I do see the quote. It seems there is something unclear about its meaning. A single neural net trained on multiple tasks is not a "cobbled together set of sub-systems". Neural nets are unitary systems in the sense that you cannot separate them into multiple subsystems, as opposed to ensemble systems that do have clear subsystems. Modern LLMs are a good example of such unitary neural nets. It is possible to train (or fine-tune) an LLM for certain tasks, and the same weights would perform all those tasks without any subsystems. Due to the generalization property of neural network training, the LLM might also be good at tasks resembling the tasks in the training set. But this is quite limited: in fact, fine-tuning on one task probably makes the network worse at non-similar tasks. Quite concretely speaking, it is imaginable that someone could take an existing LLM, GPT-5 for example, and fine-tune it to solve SAT math questions, Winogrande schemas, and play Montezuma's Revenge. The fine-tuned GPT-5 would be a unitary system: there wouldn't be a separate Montezuma subsystem that could be identified from the network, the same weights would handle all of those tasks. And the system could do all the things they mention ("explain its reasoning on an SAT problem or Winograd schema question, or verbally report its progress and identify objects during videogame play"). My critique is based on how they have formulated their Metaculus question. Now, it is possible that some people interpret it differently than I and assume things that are not explicitly said in their formulation. In that case, the whole forecast becomes unreliable as we cannot agree that all forecasters have the same interpretation, in which case we couldn't use the forecast for argumentation at all.

None of these indicators actually imply that the "AGI" meeting them would be dangerous or catastrophic to humanity

Thanks of pointing this out. There was indeed a reasoning step missing from the text. Namely: such AGI would be able to automate further AI development, leading to rapid recursive self-improvement to ASI (Artificial Superintelligence). And it is ASI that will be lethally intelligent to humanity (/all biological life). I've amended the text.

there is nothing to indicate that such a system would be good at any other task

The whole point of having t... (read more)

7
fergusq
If you read the formulation carefully, you'll notice that it actually doesn't say anything about the system not being trained specifically for those tasks. It only says that it must be a single unified system. It is entirely possible to train a single neural network on four separate tasks and have it perform well on all of those without it generalizing well on other categories of tasks. Amusingly they even exclude introspection from their definition even though that is a property that a real general intelligence should have. A system without some introspection couldn't know what tasks it cannot perform or identify flaws in its operation, and thus couldn't really learn new capabilities in a targeted way. They quite explicitly say that its reasoning or reports on its progress can be hallucinated. Their conditions are really vague and leave a lot of practicalities out. There are a lot of footguns in conducting a Turing test. It is also uncertain what does passing a Turing test, even if it is indeed rigorous, mean. It's not clear that this would imply the sort of dangerous consequences you talk about in your post. Because the forecasts do not concern a kind of system that would be able to do recursive self-improvements (none of the indicators have anything to do with it), I don't see how this reasoning can work.
7
Yarrow Bouchard 🔸
In footnote 2 on this post, I said I wouldn’t be surprised if, on January 1, 2026, the top score on ARC-AGI-2 was still below 60%. It did turn out to be below 60%, although only by 6%. (Elon Musk’s prediction of AGI in 2025 was wrong, obviously.) The score the ARC Prize Foundation ascribes to human performance is 100%, rather than 60%. 60% is the average for individual humans, but 100% is the score for a "human panel", i.e. a set of at least two humans. Note the large discrepancy between the average human and the average human panel. The human testers were random people off the street who got paid $115-150 to show up and then an additional $5 per task they solved. I believe the ARC Prize Foundation’s explanation for the 40-point discrepancy is that many of the testers just didn’t feel that motivated to solve the tasks and gave up. (I vaguely remember this being mentioned in a talk or interview somewhere.) ARC’s Grand Prize requires scoring 85% (and abiding by certain cost/compute efficiency limits). They say the 85% target score is "somewhat arbitrary". I decided to go with the 60% figure in this post to go easy on the LLMs. If you haven’t already, I recommend looking at some examples of ARC-AGI-2 tasks. Notice how simple they are. These are just little puzzles. They aren’t that complex. Anyone can do one in a few minutes, even a kid. It helps to see what we’re actually measuring here.  The computer scientist Melanie Mitchell has a great recent talk on this. The whole talk is worth watching, but the part about ARC-AGI-1 and ARC-AGI-2 starts at 21:50. She gives examples of the sort of mistakes LLMs (including o1-pro) make on ARC-AGI tasks and her team’s variations on them. These are really, really simple mistakes. I think you should really look at the example tasks and the example mistakes to get a sense of how rudimentary LLMs’ capabilities are. I am interested to watch when ARC-AGI-3 launches. ARC-AGI-3 is interactive and there is more variety in the tasks. J

One option, if you want to do a lot more about it than you currently are, is Pause House. Another is donating to PauseAI (US, Global). In my experience, being pro-active about the threat does help.

I have to think holding such a belief is incredibly distressing.

Have you considered that you might be engaging in motivated reasoning because you don't want to be distressed about this? Also, you get used to it. Humans are very adaptable.

The 10% comes from the linked aggregate of forecasts, from thousands of people's estimates/bets on Metaculus, Manifold and Kalshi; not the EA community.

I think this is pretty telling. I've also had a family member say a similar thing. If your reasoning is (at least partly) motivated by wanting to stay sane, you probably aren't engaging with the arguments impartially. 

I would bet a decent amount of money that you would not in fact, go crazy. Look to history to see how few people went crazy over the threat of nuclear annihilation in the Cold War (and all the other things C.S. Lewis refers to in the linked quote).

2
Marcus Abramovitch 🔸
I don't think my reasoning is around wanting to stay sane at all. Most of my reasoning revolves around base rates, examining current systems, and assessing current progress. I think I'm in this peculiar position where I have ~medium (maybe many reading this would consider them to be long) timelines and fairly low "p(doom)", and I still think AI risk is among the most important things to work on.

But a lot of informed people do (i.e. an aggregation of forecasts). What would you do if you did believe both of those things?

2
Marcus Abramovitch 🔸
I'm not sure. I definitely think there is a chance that if I earnestly believed those things, I would go fairly crazy. I empathize with those who, in my opinion, are overreacting.

If you want to share this, especially to people outside the EA community, I've also posted it on X and Substack.

See also (somewhat ironically), the AI roast:

its primary weakness is underexploring how individual rationalization might systematically lead safety-concerned researchers to converge on similar justifications for joining labs they believe pose existential threats.

That's possible, but the responses really aren't good. For example: 

some of the ethics (and decision-theory) can get complicated (see footnote for a bit more discussion[10]

And then there's a whole lot of moral philosophical-rationalist argument in the footnote. But he completely ignores an obvious option - working to oppose the potentially net-negative organisation. Or in this case: working towards getting an international treaty on AGI/ASI, that can rein in Anthropic and all the others engaged in the suicide race. I think Carlsmith could actually be ... (read more)

7
Greg_Colbourn ⏸️
See also (somewhat ironically), the AI roast:

Meta note: it's odd that my comment has got way more disagree votes than agree votes (16 vs 3 as of writing), but OP has also got more disagree votes than agree votes (6 vs 3). I guess it's different people? Or most the people disagreeing with my comment can't quite get themselves to agree with the main post?

The content of your comment is not just "I disagree with the post". It's "I disagree with the post plus some implicature that the author is rationalizing an immoral decision". So it's quite reasonable for people to disagree with both the post and you.

Some choice quotes:

The first concern is that Anthropic as an institution is net negative for the world (one can imagine various reasons for thinking this, but a key one is that frontier AI companies, by default, are net negative for the world due to e.g. increasing race dynamics, accelerating timelines, and eventually developing/deploying AIs that risk destroying humanity – and Anthropic is no exception), and that one shouldn’t work at organizations like that.

...

Another argument against working for Anthropic (or for any other AI lab) comes from approaches

... (read more)

I think it's more like he disagrees with you about the relative strengths of the objections and responses. (fwiw, I'm inclined to agree with him, and I don't have any personal stake in the matter.)

This is incredible: it reads as a full justification for not working at Anthropic, yet the author concludes the opposite!

3
Greg_Colbourn ⏸️
Meta note: it's odd that my comment has got way more disagree votes than agree votes (16 vs 3 as of writing), but OP has also got more disagree votes than agree votes (6 vs 3). I guess it's different people? Or most the people disagreeing with my comment can't quite get themselves to agree with the main post?

Some choice quotes:

The first concern is that Anthropic as an institution is net negative for the world (one can imagine various reasons for thinking this, but a key one is that frontier AI companies, by default, are net negative for the world due to e.g. increasing race dynamics, accelerating timelines, and eventually developing/deploying AIs that risk destroying humanity – and Anthropic is no exception), and that one shouldn’t work at organizations like that.

...

Another argument against working for Anthropic (or for any other AI lab) comes from approaches

... (read more)

That's good to see, but the money, power and influence is critical here[1], and that seems to be far too corrupted by investments in Anthropic, or just plain wishful techno-utopian thinking.

  1. ^

    The poll respondents are not representative of that for EA. There is no one representing OpenPhil, CEA or 80k, no large donors, and only one top 25 karma account.

  • There is widespread discontent at the current trajectory of advanced AI development, with only 5% in support of the status quo of fast, unregulated development;
  • Almost two-thirds (64%) feel that superhuman AI should not be developed until it is proven safe and controllable, or should never be developed;
  • There is overwhelming support (73%) for robust regulation on AI. The fraction opposed to strong regulation is only 12%.

[Source]. I imagine global public opinion is similar. What we need to do now is mobilise a critical mass of that majority. If you agree, ple... (read more)

(I think if we’d gotten to human-level algorithmic efficiency at the Dartmouth conference, that would have been good, as compute build-out is intrinsically slower and more controllable than software progress (until we get nanotech). And if we’d scaled up compute + AI to 10% of the global economy decades ago, and maintained it at that level, that also would have been good, as then the frontier pace would be at the rate of compute-constrained algorithmic progress, rather than the rate we’re getting at the moment from both algorithmic progress AND compute sca

... (read more)

the better strategy of focusing on the easier wins

I feel that you are not really appreciating the point that such "easier wins" aren't in fact wins at all, in terms of keeping us all alive. They might make some people feel better, but they are very unlikely to reduce AI takeover risk to, say, a comfortable 0.1% (In fact I don't think they will reduce it to below 50%).

I think I’m particularly triggered by all this because of a conversation I had last year with someone who takes AI takeover risk very seriously and could double AI safety philanthropy if they

... (read more)

It just looks a lot like motivated reasoning to me - kind of like they started with the conclusion and worked backward. Those examples are pretty unreasonable as conditional probabilities. Do they explain why "algorithms for transformative AGI" are very unlikely to meaningfully speed up software and hardware R&D?

Saying they are conditional does not mean they are. For example, why is P(We invent a way for AGIs to learn faster than humans|We invent algorithms for transformative AGI) only 40%? Or P(AGI inference costs drop below $25/hr (per human equivalent)[1]|We invent algorithms for transformative AGI) only 16%!? These would be much more reasonable as unconditional probabilities. At the very least, "algorithms for transformative AGI" would be used to massively increase software and hardware R&D, even if expensive at first, such that inference costs would quick... (read more)

2
David Mathers🔸
I don't think you can possibly know whether they really are actually thinking of the unconditional probabilities or whether they just have very different opinions and instincts from you about the whole domain which make very different genuinely conditional probabilities seem reasonable. 

If they were already aware, they certainly didn't do anything to address it, given their conclusion is basically a result of falling for it.

It's more than just intuitions, it's grounded in current research and recent progress in (proto) AGI. To validate the opposing intuitions (long timelines) requires more in the way of leaps of faith (to say that things will suddenly stop working as they have been). Longer timelines intuitions have also been proven wrong consistently over the last few years (e.g. AI constantly doing things people predicted were "decades away" just a few years, or even months, before).

I found this paper which attempts a similar sort of exercise as the AI 2027 report and gets a very different result.

This is an example of the multiple stages fallacy (as pointed out here), where you can get arbitrarily low probabilities for anything by dividing it up enough and assuming things are uncorrelated.

I don't find accusations of fallacy helpful here. The author's say in the abstract explicitly that they estimated the probability of each step conditional on the previous ones. So they are not making a simple, formal error like multiplying a bunch of unconditional probabilities whilst forgetting that only works if the probabilities are uncorrelated. Rather, you and Richard Ngo think that they're estimates for the explicitly conditional probabilities are too low, and you are speculating that this is because they are still really think of the unconditional p... (read more)

1
Yarrow Bouchard 🔸
One of the authors responds to the comment you linked to and says he was already aware of the concept of the multiple stages fallacy when writing the paper. But the point I was making in my comment above is how easy it is for reasonable, informed people to generate different intuitions that form the fundamental inputs of a forecasting model like AI 2027. For example, the authors intuit that something would take years, not decades, to solve. Someone else could easily intuit it will take decades, not years. The same is true for all the different intuitions the model relies on to get to its thrilling conclusion. Since the model can only exist by using many such intuitions as inputs, ultimately the model is effectively a re-statement of these intuitions, and putting these intuitions into a model doesn’t make them any more correct. In 2-3 years, when it turns out the prediction of AGI in 2027 is wrong, it probably won’t be because of a math error in the model but rather because the intuitions the model is based on are wrong.

For what it's worth, I think you are woefully miscalibrated about what the right course of action is if you care about the people you love. Preventing ASI from being built for at least a few years should be a far bigger priority (and Mechanize's goal is ~the opposite of that). Would be interested to hear more re why you think violent AI takeover is unlikely.

Where I say "some of which I borrow against now (with 100% interest over 5 years)", I'm referring to the bet.

if you think the world is almost certainly doomed

I think it's maybe 60% doomed.

it seems crazy not to just spend it and figure out the reputational details on the slim chance we survive.

Even if I thought it was 90%+ doomed, it's this kind of attitude that has got us into this whole mess in the first place! People burning the commons for short term gain is directly leading to massive amounts of x-risk.

you couldn’t ask for someone better than Yann LeCun, no?

Really? I've never seen any substantive argument from LeCun. He mostly just presents very weak arguments (and ad hominem) on social media, that are falsified within months (e.g. his claims about LLMs not being able to world model). Please link to the best written one you know of.

6
Yarrow Bouchard 🔸
I don't think it's a good idea to engage with criticism of an idea in the form of meme videos from Reddit designed to dunk on the critic. Is that intellectually healthy? I don't think the person who made that video or other people who want to dunk on Yann LeCun for that quote understand what he was trying to say. (Benjamin Todd recently made the same mistake here.) I think people are interpreting this quote hyper-literally and missing the broader point LeCun was trying to make. Even today, in April 2025, models like GPT-4o and o3-mini don't have a robust understanding of things like time, causality, and the physics of everyday objects. They will routinely tell you absurd things like that an event that happened in 2024 was caused by an event in 2025, while listing the dates of the events. Why don't LLMs, still, in April 2025 consistently understand that causes precede effects and not vice versa? If anything, this makes it seem like what LeCun said in January 2022 seem prescient. Despite a tremendous amount of scaling of training data and training compute, and, more recently, significant scaling of test-time compute, the same fundamental flaw LeCun called out over 3 years ago remains a flaw in the latest LLMs. All that being said... I think even if LeCun had made the claim that I think people are mistakenly interpreting him as making and he had turned out to have been wrong about that, discrediting him based on him being wrong about that one thing would be ridiculously uncharitable.

Ilya's company website says "Superintelligence is within reach." I think it's reasonable to interpret that as having a short timeline. If not an even stronger claim that he thinks he knows how to actually build it.

The post gives a specific example of this: the “software intelligence explosion” concept.

Right, and doesn't address any of the meat in the methodology section.

1
Yarrow Bouchard 🔸
Looking at the methodology section you linked to, this really just confirms the accuracy of nostalgebraist's critique, for me. (nostalgebraist is the Tumblr blogger.) There are a lot of guesses and intuitions. Such as: Okay? I'm not necessarily saying this is an unreasonable opinion. I don't really know. But this is fundamentally a process of turning intuitions into numbers and turning numbers into a mathematical model. The mathematical model doesn't make the intuitions any more (or less) correct. Why not 2-15 months? Why not 20-150 years? Why not 4-30 years? It's ultimately about what the authors intuitively find plausible. Other well-informed people could reasonably find very different numbers plausible. And if you swap out more of the authors' intuitions for other people's intuitions, the end result might be AGI in 2047 or 2077 or 2177 instead of 2027. ---------------------------------------- Edit: While looking up something else, I found this paper which attempts a similar sort of exercise as the AI 2027 report and gets a very different result.

I don't think it's nitpicky at all. A trend showing small, increasing numbers, just above 0, is very different (qualitatively) to a trend that is all flat 0s, as Ben West points out.

I am curious to see what will happen in 5 years when there is no AGI.

If this happens, we will at least know a lot more about how AGI works (or doesn't). I'll be happy to admit I'm wrong (I mean, I'll be happy to still be around, for a start[1]).

  1. ^

     I think the most likely reason we won't have AGI in 5 years is that there will be a global moratorium on further development. Th

... (read more)
-1
Yarrow Bouchard 🔸
Then it's a good thing I didn't claim there was "a trend that is all flat 0s" in the comment you called "disingenuous". I said: This feels like such a small detail to focus on. It feels ridiculous.

I think Chollet has shifted the goal posts a bit from when he first developed ARC [ARC-AGI 1]. In his original paper from 2019, Chollet says:

"We argue that ARC [ARC-AGI 1] can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans."

And the original announcement (from June 2024) says:

A solution to ARC-AGI [1], at a minimum, opens up a completely new programming paradigm where programs can perfectly and reliably generalize from an arbitrary set of priors. We a

... (read more)
-1
Yarrow Bouchard 🔸
In his interview with Dwarkesh Patel in June 2024 to talk about the launch of the ARC Prize, Chollet emphasized how easy the ARC-AGI tasks were for humans, saying that even children could do them. This is not something he’s saying only now in retrospect that the ARC-AGI tasks have been mostly solved. That first quote, from the 2019 paper, is consistent with Chollet’s January 2025 Bluesky post. That second quote is not from Chollet, but from Mike Knoop. I don’t know what the first sentence is supposed to mean, but the second sentence is also consistent with the Bluesky post. In response to the graph… Just showing a graph go up does not amount to a “trajectory to automating AGI development”. The kinds of tasks AI systems can do today are very limited in their applicability to AGI research and development. That has only changed modestly between ChatGPT’s release in November 2022 and today. In 2018, you could have shown a graph of go performance increasing from 2015 to 2017 and that also would not have been evidence of a trajectory toward automating AGI development. Nor would AlphaZero’s tripling of the games a single AI system can master from go to go, chess, and shogi. Measuring improved performance on tasks only provides evidence for AGI progress if the tasks you are measuring test for general intelligence.

I was not being disingenuous and I find your use of the word "disingenuous" here to be unnecessarily hostile.

I was going off of the numbers in the recent blog post from March 24, 2025. The numbers I stated were accurate as of the blog post.

GPT-2 is not mentioned in the blog post. Nor is GPT-3. Or GPT3.5. Or GPT-4. Or even GPT-4o! You are writing 0.0% a lot for effect. In the actual blog post, there are only two 0.0% entries, for "gpt-4.5 (Pure LLM)", and "o3-mini-high (Single CoT)"; and note the limitations in parenthesis, which you also neglect to include... (read more)

1
Yarrow Bouchard 🔸
It seems like you are really zeroing in on nitpicky details that make barely any difference to the substance of what I said in order to accuse me of being intentionally deceptive. This is not a cool behaviour. I am curious to see what will happen in 5 years when there is no AGI. How will people react? Will they just kick their timelines 5 years down the road and repeat the cycle? Will some people attempt to resolve the discomfort by defining AGI as whatever exists in 5 years? Will some people be disillusioned and furious? I hope that some people engage in soul searching about why they believed AGI was imminent when it wasn’t. And near the top of the list of reasons why will be (I believe) intolerance of disagreement about AGI and hostility to criticism of short AGI timelines.

In another comment you accuse me of being "unnecessarily hostile". Yet to me, your whole paragraph in the OP here is unnecessarily hostile (somewhat triggering, even):

The community of people most focused on keeping up the drumbeat of near-term AGI predictions seems insular, intolerant of disagreement or intellectual or social non-conformity (relative to the group's norms), and closed-off to even reasonable, relatively gentle criticism (whether or not they pay lip service to listening to criticism or perform being open-minded). It doesn't feel like a scient

... (read more)
0
Yarrow Bouchard 🔸
I think you are misusing the concept of charity. Or maybe we just disagree on what it means to be charitable or uncharitable in this context because we strongly disagree on the subject matter. You linked to the website for Ilya Sutskever’s company as a citation for the claim that Ilya Sutskever has a relatively short AGI timeline. The website doesn’t mention a timeline and I can’t find an instance of Ilya Sutskever mentioning a specific timeline. Yoshua Bengio gave a timeline of 5 to 20 years in 2023, so that’s 3 to 18 years now. He says he’s 95% confident in this prediction. Okay. Geoffrey Hinton also says 5 to 20 years, but only with 50% confidence. Hmm. Well, 95% vs. 50% is a big discrepancy, right? Also, he’s been saying "5 to 20 years" since 2023, which, if we just take that at face value, means he’s actually been pushing back his timeline by about 1-2 years over the past 1-2 years. I think the person who wrote the Tumblr post is pretty clear on what their problem with the AI 2027 report is. To treat the report as an actual prediction about the future, it requires you to be on board with a lot of modelling assumptions. And if you’re not already on board with those modelling assumptions, the report doesn’t do much to try to convince you. The post gives a specific example of this: the “software intelligence explosion” concept.

This is somewhat disingenuous. o3-mini (high) is actually on 1.5%, and none of the other models are reasoning (CoT / RL / long inference time) models (oh, and GPT 4.5 is actually on 0.8%). The actual leaderboard looks like this:

Yes the scores are still very low, but it could just be a case of the models not yet "grokking" such puzzles. In a generation or two they might just grok them and then jump up to very high scores (many benchmarks have gone like this in the past few years).

2
Yarrow Bouchard 🔸
I was not being disingenuous and I find your use of the word "disingenuous" here to be unnecessarily hostile. I was going off of the numbers in the recent blog post from March 24, 2025. The numbers I stated were accurate as of the blog post. So that we don't miss the bigger point, I want to reiterate that ARC-AGI-2 is designed to be solved by near-term, sub-AGI AI models with some innovation on the status quo, not to stump them forever. This is François Chollet describing the previous version of the benchmark, ARC-AGI, in a post on Bluesky from January 6, 2025: To reiterate, ARC-AGI and ARC-AGI-2 are not tests of AGI. It is a test of whether a small, incremental amount of progress toward AGI has occurred. The idea is for ARC-AGI-2 to be solved, hopefully within the next few years and not, like, ten years from now, and then to move on to ARC-AGI-3 or whatever the next benchmark will be called. Also, ARC-AGI was not a perfectly designed benchmark (for example, Chollet said about half the tasks turned out to be flawed in a way that made them susceptible to "brute-force program search") and ARC-AGI-2 is not a perfectly designed benchmark, either. ARC-AGI-2 is worth talking about because most, if not all, of the commonly used AI benchmarks have very little usefulness for quantifying general intelligence or quantifying AGI progress. It's the problem of bad operationalization leading to distorted conclusions, as I discussed in my previous comment. I don't know of other attempts to benchmark general intelligence (or "fluid intelligence") or AGI progress with the same level of carefulness and thoughtfulness as ARC-AGI-2. I would love to hear if there are more benchmarks like this. One suggestion I've read is that a benchmark should be created with a greater diversity of tasks, since all of ARC-AGI-2 tasks are part of the same "puzzle game" (my words). There's a connection between frontier AI models' failures on a relatively simple "puzzle game" like ARC-AGI-2 and why

It seems like a group of people just saying increasingly small numbers to each other (10 years, 5 years, 3 years, 2 years), hyping each other up

This is very uncharitable. Especially in light of the recent AI 2027 report, which goes into a huge amount of detail (see also all the research supplements).

4
Yarrow Bouchard 🔸
There is a good post about the AI 2027 report here. I do not think I am being uncharitable.

No, just saying that without their massive injection of cash, Anthropic might not be where they are today. I think the counterfactual where there wasn't any "EA" investment into Anthropic would be significantly slower growth of the company (and, arguably, one fewer frontier AI company today).

Re Anthropic and (unpopular) parallels to FTX, just thinking that it's pretty remarkable that no one has brought up the fact that SBF, Caroline Ellison and FTX were major funders of Anthropic. Arguably Anthropic wouldn't be where they are today without their help! It's unfortunate the journalist didn't press them on this.

6
Cullen 🔸
Is your claim that somehow FTX investing in Anthropic has caused Anthropic to be FTX-like in the relevant ways? That seems implausible.

Anthropic leadership probably does lack the integrity needed to do complicated power-seeking stuff that has the potential to corrupt.

Yes. It's sad to see, but Anthropic is going the same way as OpenAI, despite being founded by a group that split from OpenAI over safety concerns. Power (and money) corrupts. How long until another group splits from Anthropic and the process repeats? Or actually, one can hope that such a group splitting from Anthropic might actually have integrity and instead work on trying to stop the race.

5
Jason
What surprises me about this whole situation is that people seem surprised at the executive leadership at a corporation worth an estimated $61.5B would engage in big-corporation PR-speak. The base rate for big-corporation execs engaging in such conduct in their official capacities seems awfully close to 100%. Hence, it does not feel like anything to update on for me. I'm getting the sense that a decent number of people assume that being "EA aligned" is somehow a strong inoculant against the temptations of money and power. Arguably the FTX scandal -- which after all involved multiple EAs, not just SBF -- should have already caused people to update on how effective said inoculant is, at least when billions of dollars were floating around.[1]   1. ^ This is not to suggest that most EAs would act in fraudulent ways if surrounded by billions of dollars, but it does provide evidence that EAs are not super-especially resistant to the corrosive effects of money and power at that level of concentration. FTX was only one cluster of people, but how many people have been EAs first and then been exposed to the amount of money/power that FTX or Anthropic had/have?

I was thinking less in terms of fraud/illegality, and more in terms of immorality/negative externalities (i.e. they are recklessly endangering everyone's lives).

No, but the main orgs in EA can still act in this regard. E.g. Anthropic shouldn't be welcome at EAG events. They shouldn't have their jobs listed on 80k. They shouldn't be collaborated with on research projects etc that allow them to "safety wash" their brand. In fact, they should be actively opposed and protested (as PauseAI have done).

Fair points. I was more thinking in broad terms of supporting something that will most likely turn out hugely negative. I think it's pretty clear already that Anthropic is massively negative expected value for the future of humanity. And we've already got the precedent of OpenAI and how that's gone (and Anthropic seems to be going the same way in broad terms - i.e. not caring about endangering 8 billion people's lives with reckless AGI/ASI development).

Downvoters note: there was actually far less publicly available information to update on FTX being bad in early 2022.

4
Davidmanheim
You should add an edit to clarify the the claim, not just reply.
4
Holly Elmore ⏸️ 🔸
I think almost nobody had the info needed to predict FTX besides the perpetrators. I think we already know all we need to oppose Anthropic.
6
MichaelDickens
I just think Anthropic leaders being un-candid about their connection to EA is pretty weak evidence that they're doing fraud or something like what FTX did. (It's positive evidence, but weak.)

I don't think it is like being pro-FTX in early 2022

1) Back then hardly anyone knew about the FTX issues. Here we're discussing issues where there is a lot of public information
2) SBF was hiding a mass fraud that was clearly both illegal and immorral. Here we are not discussing illegailities or fraud, but whether a company is being properly honest, transparent and safe? 
3) SBF was a promotor of EA and to some degree held up on an EA pedestal. Here Anthropic is the opposite, trying to distance themselves from the movement.

Seems very different to me.

2
Greg_Colbourn ⏸️
Downvoters note: there was actually far less publicly available information to update on FTX being bad in early 2022.

It appears that Anthropic has made a communications decision to distance itself from the EA community, likely because of negative associations the EA brand has in some circles.

This works both ways. EA should be distancing itself from Anthropic, given recent pronouncements by Dario about racing China and initiating recursive self-improvement. Not to mention their pushing of the capabilities frontier.

6
Davidmanheim
As always, and as I've said in other cases, I.don't think it makes sense to ask a disparate movement to make pronouncements like this.

I am guessing you agree with this abstract point (but furthermore think that AI takeover risk is extremely high, and as such we should ~entirely focus on preventing it).

Yes (but also, I don't think the abstract point is adding anything, because of the risk actually being significant.)

Maybe I'm splitting hairs, but “x-risk could be high this century as a result of AI” is not the same claim as “x-risk from AI takeover is high this century”, and I read you as making the latter claim (obviously I can't speak for Wei Dai).

This does seem like splitting hairs. Mo... (read more)

before it is aligned

This is begging the question! My whole objection is that alignment of ASI hasn't been established to be possible.

as long as the AI is caught with non-negligible probability, the AI has to be very cautious, because it is way worse for the AI to be caught than to be successful or the game just ending.

So it will worry about being in a kind of panopticon? Seems pretty unlikely. Why should the AI care about being caught any more than it should about any given runtime instance of it being terminated?

3
Sharmake
  A couple of things I'll say here: 1. You do not need a strong theory for why something must be possible in order to put non-trivial credence on it being possible, and if you hold a prior that scientific difficulty of doing something is often overrated, especially if you believe in the idea that alignment is possibly automatable and that a lot of people overrate the difficulty of automating something, that's enough to cut p(doom) by a lot, arguably 1 OOM, but at the very least nowhere near your 90 p(doom)%. That doesn't mean that we are going to make it out of ASI alive, but it does mean that even in situations where there is no established theory or plan to survive, you can still possibly do something. 2. If I wanted to make the case that ASI alignment is possible, I'd probably read these 3 posts by Joshua Clymer first on how automated alignment schemes could work (with some discussion by Habryka and Eliezer Yudkowsky and Jeremy Gillen the comments, and Joshua Clymer's responses): https://www.lesswrong.com/posts/8vgi3fBWPFDLBBcAx/planning-for-extreme-ai-risks https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai https://www.lesswrong.com/posts/5gmALpCetyjkSPEDr/training-ai-to-do-alignment-research-we-don-t-already-know The basic reason for this is that you can gain way more information on the AI once you have escaped, combined with the ability to use much more targeted countermeasures that are more effective once you have caught the AI red handed. As a bonus, this can also eliminate threat models like sandbagging, if you have found a reproducible signal for when an AI will try to overthrow a lab. More discussion by Ryan Greenblatt and Buck here: https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed

Most of the intelligence explosion (to the point where it becomes an unavoidable existential threat) happens in Joshua Clymer's original story. I quote it at length for this reason. My story is basically an alternative ending to his. One that I think is more realistic (I think the idea of 3% of humanity surviving is mostly "wishful thinking"; an ending that people can read and hope to be in that number, rather than just dead with no possible escape.)

Load more