Red-teaming Holden Karnofsky's AI timelines

Vasco Grilo🔸; Simon Holm

Summary

This is a red teaming exercise on Holden Karnofsky's AI Timelines: Where the Arguments, and the "Experts," Stand (henceforth designated “HK's AI Timelines”), completed in the context of the Red Team Challenge by Training for Good^[1].

Our key conclusions are:

The predictions for the probability of transformative AI^[2] (TAI) presented in “HK's AI Timelines” seem reasonable in light of the forecasts presented in the linked technical reports, according to our own reading and numerical analysis (see 1. Prediction).
All but two of the technical reports of Open Philanthropy which informed Holden’s predictions were reviewed by what we judge to be credible experts (see 2. Reviewers of the technical reports).
The act of forecasting and discussing AI progress publicly may be an information hazard, due to the risk of shortening (real and predicted) timelines. However, the forecasts of “HK's AI Timelines” seem more likely to lengthen than to shorten AI timelines, since they predict longer ones than the Metaculus community prediction for AGI (see 3. Information hazards).

Our key recommendations are:

Defining “transformative AI” more clearly, such that the predictions of “HK's AI Timelines” could be better interpreted and verified against evidence (see Interpretation). This could involve listing several criteria (as in this Metaculus question), or a single condition (as in Bio anchors^[3]).
Including an explanation about the inference of the predictions from the sources mentioned in the “one-table summary” (see Inference). This could include explicitly stating the weight given to each of them (quantitatively or qualitatively), and describing the influence of other sources.
Investigating whether research on AI timelines carried out in China might have been overlooked due to language barriers (see Representativeness).

We welcome comments on our key conclusions and recommendations, as well as on reasoning transparency, strength of arguments, and red-teaming efforts.

Author contributions

The contributions by author are as follows:

Simon: argument mapping, complementary background research, discussion, and structural and semantic editing of the text.
Vasco: argument mapping, background research, technical analysis, and structuring and writing of all sections.

Acknowledgements

Thanks to:

For discussions which informed this text, Kasey Shibayama, and Saksham Singhi.
For organising the Red Team Challenge, Training for Good.
For comments, Aaron Bergman, Anonymous Person, Cillian Crosson, Hasan Asim, Jac Liew, and Robert Praas.

Introduction

We have analysed Holden Karnofsky's blog post AI Timelines: Where the Arguments, and the "Experts," Stand with the goal of constructively criticising Holden's claims and the way they were communicated. In particular, we investigated:

1. Prediction: Holden's timelines for transformative AI.
2. Reviewers of the technical reports: the credibility of the reviewers of the technical reports which informed Holden's timelines.
3. Information hazards: the potential of Holden's timelines to be an information hazard.

The key motivations for red-teaming this particular article are:

AI timelines^[4] being relevant to understand the extent to which positively shaping the development of artificial intelligence is one of the (or the) world’s most pressing problems.
“HK's AI Timelines” and Holden Karnofsky arguably being influential in informing views about AI timelines and prioritisation amongst longtermist causes.
Holden Karnofsky arguably being influential in informing views on other important matters related to improving the world, thus making it appealing to contribute to any improvements of his ideas and writing.

1. Prediction

Holden Karnofsky estimates that:

There is more than a 10% chance we'll see transformative AI within 15 years (by 2036); a ~50% chance we'll see it within 40 years (by 2060); and a ~2/3 chance we'll see it this century (by 2100).

Karnofsky bases his forecast on a number of technical reports, and we analysed it by answering the following:

Interpretation: are the technical reports being accurately interpreted?
Inference: is the forecast consistent with the interpretations of the technical reports?
Representativeness: are the technical reports representative of the best available evidence?

The following sections deal with each of these questions. However, for the interpretation and inference, only 3 of the 9 in-depth pieces presented in the “one-table summary” of “HK's AI Timelines” are studied:

When Will AI Exceed Human Performance? Evidence from AI Experts from Katja Grace et al. (“AI experts”).
Forecasting TAI with biological anchors from Ajeya Cotra (“Bio anchors”).
Semi-informative Priors from Tom Davidson (“SIP”).

These seem to be the only in-depth pieces that provide quantitative forecasts for the year by which TAI will be seen, which facilitates comparisons. Nevertheless, they do not cover all the evidence based on which Holden Karnofsky's forecasts were made.

Interpretation

Are the technical reports being accurately interpreted?

We interpreted the numerical predictions made by the technical reports to be essentially in agreement with those made in “HK's AI Timelines”.

Our interpretation of the forecasts for the probability of TAI given in the aforementioned reports (see the tab “AI Timelines predictions” of this Sheets), together with the one presented in the “one-table summary” of “HK's AI Timelines”, is provided in the table below.

Report	Interpretation
Report	“HK's AI Timelines”	Us
AI experts^[5]	~20 % by 2036. ~50 % by 2060. ~70 % by 2100.	25 % by 2036. 49 % by 2060. 69 % by 2100.
Bio anchors^[6]	> 10 % probability by 2036. ~ 50 % chance by 2055. ~ 80 % chance by 2100.	18 % by 2036. 50 % by 2050. 80 % by 2100.
SIP	8% by 2036. 13% by 2060. 20% by 2100.	8 % by 2036. 18 % by 2100.

For all the forecasts, our interpretation is in agreement with that of “HK's AI Timelines” (when rounded to one significant digit).

However, it is worth noting the extent to which the “most aggressive” and “most conservative” estimates of “Bio anchors” differ from the respective “best guesses”^[7] (see Part 4 of the report). This is illustrated in the table below (footnotes 8-11 are all citations from Ajeya Cotra).

Probability of TAI by the year…	Conservative estimate	Best guess	Aggressive estimate
2036^[8]	2%	18%	45%
2100^[9]	60%	80%	90%
Median forecast	2090^[10]	2050	2040^[11]

Indeed, the uncertainty of “Bio anchors” is acknowledged by Holden Karnofsky here.

There is also the question of the comparability of the differing definitions of transformative AI in the different technical reports, and if Holden's interpretation of these justifies his overall estimate. We mostly agree with Holden's claim in one of the footnotes of “HK's AI Timelines” that:

In general, all of these [the reports’ predicted] probabilities refer to something at least as capable as PASTA, so they directionally should be underestimates of the probability of PASTA (though I don't think this is a major issue).^[12]

Regarding the first part of the above quote, the aforementioned probabilities refer:

In “AI experts”, to “high-level machine intelligence” (HLMI), which “is achieved when unaided machines can accomplish every task better and more cheaply than human workers”.
- Since HLMI would perform “all of the human activities needed to speed up scientific and technological advancement” unaided, we would have PASTA.
In “Bio anchors”, to “transformative AI”, which “must bring the [global] growth rate to 20%-30% per year if used everywhere it would be profitable to use”^[13].
- If sustained, this growth seems fast enough “to bring us into a new, qualitatively different future”, thus being in agreement with the definition provided in “HK's AI Timelines” for “transformative AI”.
- However, PASTA could conceivably be more capable than AGI as defined in the operationalisation of this Metaculus question. If this is the case, Holden’s forecasts might not be “underestimates of the probability of PASTA”.
In “SIP”, to “artificial general intelligence” (AGI), i.e. “computer program(s) that can perform virtually any cognitive task as well as any human, for no more money than it would cost for a human to do it”.
- This definition is similar to that of HLMI, although the emphasis on physical machines is less clear. However, it still seems to encompass “all of the human activities needed to speed up scientific and technological advancement”, hence we think this would also bring PASTA.

A more concrete definition of TAI in “HK's AI Timelines” would have been useful to understand the extent to which its predictions are comparable with those of other sources.

Moreover, in the second half of the quotation above, Holden claims that he does not think it a “major issue” that the predicted probabilities should be “underestimates of the probability of PASTA”. We think a justification for this would be valuable, especially if the timelines for PASTA are materially shorter than those for TAI as defined in the technical reports (which could potentially be a major issue).

Inference

Is the forecast consistent with the interpretations of the technical reports?

We found Holden Karnofsky’s estimate to be consistent with our interpretation of the technical reports, even when accounting for the uncertainty of the forecasts of the individual reports.

Methodology

The inference depends not only on the point estimates of the technical reports (see Interpretation), but also on their uncertainty. Having this in mind, probability distributions representing the year by which TAI will be seen were fitted to the forecasts regarding our interpretation of the technical reports^[14] (rows 2-4 and 7-9 in the table below). Moreover, “mean” and “aggregated” distributions which take into account all the three reports were calculated as follows:

Aggregated lognormal (5) (according to this):
- Mean^[15]: mean of the means of the fitted distributions weighted by the reciprocal of their variances.
- Standard deviation: square root of the reciprocal of the sum of the reciprocals of the variances of the fitted distributions.
Aggregated loguniform (10):
- Minimum: maximum of the minima of the fitted distributions.
- Maximum: minimum of the maxima of the fitted distributions.
Mean lognormal/loguniform (6 and 11):
- Cumulative distribution function (CDF) equal to the mean of the CDFs of the fitted lognormal/loguniform distributions weighted by the reciprocal of the variance of the fitted lognormal distributions.

The data points relative to the forecasts for 2036 and 2100 were used to estimate the parameters of such distributions. Estimates for the probability of TAI by these years are provided in the three reports and “HK's AI Timelines”, which enables consistency. The parameters of the derived distributions are presented in the tab “Derived distributions parameters”.

Results and discussion

The forecasts for the probability of TAI by 2036, 2060 and 2100 are presented in the table below. In addition, values for all the years from 2025 to 2100 are given in the tab “Derived distributions CDFs” for all the derived distributions.

Distribution	Probability that TAI will be seen (%) by…
Distribution	2036	2060	2100
1. “HK’s AI timelines”	> 10	50	67
2. AI experts lognormal	25	41	69
3. Bio anchors lognormal	18	40	80
4. SIP lognormal	8	11	18
5. Aggregated lognormal	8	27	77
6. Mean lognormal	19	39	74
7. AI experts loguniform	25	42	69
8. Bio anchors loguniform	18	41	80
9. SIP loguniform	8	12	18
10. Aggregated loguniform	18	41	80
11. Mean loguniform	19	40	74
Range	8 - 25	11 - 42	18 - 80

The forecasts of “HK's AI Timelines” are aligned with those of the derived distributions^[16]. These predict that the probability of TAI is:

By 2036, 8 % to 25 %, which agrees with the probability of more than 10 % predicted in “HK's AI Timelines”.
By 2060, 11 % to 42 %, which is lower than the probability of 50 % predicted in “HK's AI Timelines”. However, this seems reasonable for the following reasons:
- The forecasts of “AI experts”, “Bio anchors” and “SIP” for 2060 were 49 %, > 50 %, and < 18 % (see Interpretation).
- Giving more weight to “AI experts” and “Bio anchors” (whose timelines are longer) agrees with the relative weights estimated from the reciprocal of the variance of the fitted lognormal distributions (see E3:E5 of tab “Derived distributions parameters”):
  - 30 % for “AI experts”.
  - 65 % for “Bio anchors”.
  - 5 % for “SIP”.
- The forecasts of the derived distributions for 2060 are not as accurate as for years closer to either 2036 or 2100, whose data points were used to determine the parameters of the fitted distributions.
By 2100, 18 % to 80 %, which contains the probability of roughly 70 % predicted in “HK's AI Timelines”. A forecast closer to the upper bound also seems reasonable:
- The range is 67 % to 80 % excluding the forecasts which only rely on data from “SIP” (rows 4 and 9), which should arguably have a lower weight according to the above.

Nevertheless, we think “HK's AI Timelines” would benefit from including an explanation about how Holden’s forecasts were derived from the sources of the “one-table summary”. For example, explicitly stating the weight given to each of the sources mentioned in the “one-table summary” (quantitatively or qualitatively).

Representativeness

Are the technical reports representative of the best available evidence?

We think there may be further sources Holden Karnofsky could have considered for his base of evidence to be more representative, but that it seemingly strikes a good balance between being representative and succinct.

6 of the 9 pieces linked in the “one-table summary” of “HK's AI Timelines” regard analyses from Open Philanthropy. This is noted in “HK's AI Timelines”^[17], and could reflect:

The value and overarching nature of Open Philanthropy’s analyses, which often cover multiple research fields.
The higher familiarity of Holden Karnofsky with such analyses.

These are valid reasons, but it would arguably be beneficial to include/consider other sources. For example:

Publications from MIRI related to AI forecasting besides “AI experts”, such as:
- The Asilomar Conference: A Case Study in Risk Mitigation.
- Formalizing Convergent Instrumental Goals.
AI timelines publications from AI Impacts.
Recent surveys about AI existential risk, such as this and this (but note that “HK's AI Timelines” forecasts TAI, not x-risk).
Metaculus forecasts for the questions of the series:
- Economic Impacts of Artificial General Intelligence.
- Will transformative AI come with a bang?.

We did not, however, look into whether the conclusions of these publications would significantly update Holden's claims.

It would also be interesting to know whether:

Research on AI timelines carried out in China might have been overlooked due to language barriers.
Developments in AI research after the publication of “HK's AI Timelines” (see e.g. 1st paragraph of this post) might have updated Holden’s AI timelines.
“AI experts” is representative of other surveys (e.g. Gruetzemacher 2019).

That being said:

The “one-table summary” of “HK's AI Timelines” was supposed to summarise the angles on AI forecasting discussed in The “most important century” series, rather than being a collection of all the relevant sources.
The predictions made in “HK's AI Timelines” arguably do not solely rely on the sources listed there.

All in all, we essentially agree with the interpretations of the technical reports, and think Holden Karnofsky’s predictions could justifiably be inferred from their results. In addition, the sources which informed the predictions appear representative of the best available evidence. Consequently, the forecasts for TAI of “HK's AI Timelines” seem reasonable.

2. Reviewers of the technical reports

We have not analysed the reviews of the technical reports from Open Philanthropy referred by Holden Karnofsky. However, their reviewers are seemingly credible. Brief descriptions are presented below:

“Bio anchors” reviews:
- Anonymous.
- Jennifer Lin: Senior Research Fellow at FHI.
- Julian Michael: PhD student at Paul G. Allen School of Computer Science & Engineering at the University of Washington.
- Marius Hobbhahn: received funding for independent research on AI safety from the Long-Term Future Fund.
- Roger Grosse: Assistant Professor of Computer Science at the University of Toronto, and a founding member of the Vector Institute.
- Rohin Shah: Research Scientist on the technical AGI safety team at DeepMind.
- Sam Bowman: visiting researcher at Anthropic.
“SIP” reviews:
- Alan Hajek: Professor of Philosophy at Australian National University.
- Jeremy Strasser: PhD student at the School of Philosophy of Australian National University.
- Robin Hanson: Associate Professor of Economics at George Mason University.
- Joe Halpern: Professor of Computer Science at Cornell University.
Explosive Growth reviews:
- Ben Jones: Professor of Entrepreneurship and Professor of Strategy at Northwestern University.
- Dietrich Vollrath: Professor and Chair of the Economics Department at the University of Houston.
- Paul Gaggl: Associate Professor of Economics at University of North Carolina at Charlotte.
- Leopold Aschenbrenner: Research Affiliate at GPI.
- Anton Korinek: Economics of AI Lead at Centre for the Governance of AI.
- Jakub Growiec: Professor at SGH Warsaw School of Economics.
- Phillip Trammell: Research Associate at GPI.
- Ben Garfinkel: Research Fellow at FHI.
Brain Computation:
- According to “HK's AI Timelines”, “Brain Computation was reviewed at an earlier time when we hadn't designed the process to result in publishing reviews, but over 20 conversations with experts that informed the report are available here”.
- We found 23 conversations here (searching for “Brain computation”) involving experts in neuroscience, AI, physics, neurosurgery, ophthalmology, neurobiology, biological science, computer science, biomedical engineering, integrated knowledge, psychiatry, and behavioural science.
Human Trajectory:
- According to “HK's AI Timelines”, “Human Trajectory hasn't been reviewed, although a lot of its analysis and conclusions feature in Explosive Growth, which has been”.
Past AI Forecasts:
- According to “HK's AI Timelines”, “Past AI Forecasts hasn't been reviewed”^[18].

For transparency, it seems worth mentioning the reasons for Past AI Forecasts not having been reviewed.

3. Information hazards

In the context of how to act in the absence of a robust expert consensus, it is argued in “HK's AI Timelines” that the “most important century” hypothesis should be taken seriously until and unless a “field of AI forecasting” develops, based on what is known now. The following reasons are presented:

“We don't have time to wait for a robust expert consensus”.
“Cunningham's Law (“the best way to get a right answer is to post a wrong answer”) may be our best hope for finding the flaw in these arguments”.
“Skepticism this general seems like a bad idea”.

Even if the above points are true, AI forecasting could be an information hazard. As noted in Forecasting AI progress: a research agenda from Ross Gruetzemacher et al., “high probability forecasts of short timelines to human-level AI might reduce investment in safety as actors scramble to deploy it first to gain a decisive strategic advantage”^[19] (see Superintelligence from Nick Bostrom).

That being said, the forecasts of “HK's AI Timelines” seem more likely to lengthen than to shorten AI timelines^[20]. On the one hand, it could be argued that they are shorter than those of most citizens. On the other hand:

The median forecast for TAI of “HK's AI Timelines”, 2060, has concerned a later date than the median Metaculus’ community prediction for AGI (see tab “Metaculus predictions” of this Sheets), which was 2055 on the date on which “HK's AI Timelines” was published (7 September 2021), and 2040 on 29 May 2022.
Holden Karnofsky's timelines are longer than those of the other three “Public Figure Predictions” linked to Metaculus’ question about the Date of Artificial General Intelligence^[21].

^{^}
We have not analysed in detail other posts from Cold Takes (Holden’s blog) related to AI forecasting. However, I (Vasco) read The Most Important Century in its entirety.
^{^}
“By “transformative AI”, I [Holden Karnofsky] mean “AI powerful enough to bring us into a new, qualitatively different future”. I specifically focus on what I'm calling PASTA: AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement. I've argued that advanced AI could be sufficient to make this the most important century, via the potential for a productivity explosion as well as risks from misaligned AI”.
^{^}
For “Bio anchors”, see Part 1, section “Definitions for key abstractions used in the model”.
^{^}
AI timelines is an instance of AI forecasting which includes predicting when human-level AI will emerge.
^{^}
Forecasts for “high level machine intelligence (all human tasks)” taken from analysing Fig. 1 (see tab “AI experts”).
^{^}
The TAI forecasts are provided in Part 4 of the Bio anchors report.
^{^}
Note that Holden mentions here that: “Overall, my best guesses about transformative AI timelines are similar to those of Bio Anchors”.
^{^}
“I think a very broad range, from ~2% to ~45%, could potentially be defensible”.
^{^}
“Ultimately, I could see myself arriving at a view that assigns anywhere from ~60% to ~90% probability that TAI is developed this century; this view is even more tentative and subject to revision than my view about median TAI timelines. My best guess right now is about 80%”.
^{^}
“~2090 for my “most conservative plausible median”.
^{^}
“My “most aggressive plausible median” is ~2040”.
^{^}
PASTA qualifies as “transformative AI”, since it is an “AI powerful enough to bring us into a new, qualitatively different future”.
^{^}
Such growth rate is predicted to coincide with the emergence of AGI according to this Metaculus question. As of 25 June 2022, the community prediction for the time between the world real gross domestic product being 25% higher than every previous year and the development of artificial general intelligence (AGI) was one month, hence supporting Ajeya Cotra’s definition (although we are wary of inverse causation).
^{^}
The approach followed to determine the parameters of the fitted distributions is explained here.
^{^}
Here, “mean” is written in italic whenever it refers to the mean of the logarithm. Likewise for other statistics.
^{^}
This method is primarily used as a sense check (i.e. “Is Karnofsky’s estimate reasonable?”), and is not intended to precisely quantify deviations.
^{^}
“For transparency, note that many of the technical reports are Open Philanthropy analyses, and I am co-CEO of Open Philanthropy”.
^{^}
This was subsequently added by Holden (in this footnote) to address a key recommendation of a previous version of this analysis: “mentioning the reviews of Past AI Forecasts, or the reasons for it not having been reviewed”.
^{^}
A relevant related post is Are you really in a race? The Cautionary Tales of Szilárd and Ellsberg.
^{^}
Both the prediction and realisation of TAI.
^{^}
These respect Ray Kurzweil, Eliezer Yudkowsky, and Bryan Caplan.

Peter WildefordJun 25 202229

Thanks for putting this together! I think more scrutiny on these ideas is incredibly important so I'm delighted to see you approach it.

So meta to red team a red team, but some things I want to comment on:

Your median estimate for the conservative and aggressive bioanchor reports in your table are accidentally flipped (2090 is the conservative median, not the aggressive one - and vice versa for 2040).
Looking literally at Cotra's sheet the median year occurs is 2053. Though in Cotra's report, you're right that she rounds this to 2050 and reports this as her official median year. So I think the only differences between your interpretation and Holden's interpretation is just different rounding.
I do agree more precise definitions would be helpful.
I don't think it makes sense to deviate from Cotra's best guess and create a mean out of aggregating between the conservative and aggressive estimates. We shouldn't assume these estimates are symmetric where the mean lies in the middle using some aggregation method, instead I think we should take Cotra's report literally where the mean of the distribution is where she says it is (it is her distribution to define how she wants), which would be the "best guess". In particular, her aggressive vs. conservative range does not represent any sort of formal confidence interval so we can't interpret it that way. I have some unpublished work where I re-run a version of Cotra's model where the variables are defined by formal confidence intervals - I think that would be the next step for this analysis.
The "Representativeness" section is very interesting and I'd love to see more timelines analyzed concretely and included in aggregations. For more reviews and analysis that include AI timelines, you should also look to "Reviews of “Is power-seeking AI an existential risk?”". I also liked this LessWrong thread where multiple people stated their timelines.

Vasco Grilo🔸Jun 25 20225

Thanks for commenting, Peter!

Your median estimate for the conservative and aggressive bioanchor reports in your table are accidentally flipped (2090 is the conservative median, not the aggressive one - and vice versa for 2040).

Corrected, thanks!

I don't think it makes sense to deviate from Cotra's best guess and create a mean out of aggregating between the conservative and aggressive estimates.

I agree. (Note the distribution we fitted to "Bio anchors" (row 4 of the 1st table of this section) only relies on Cotra's "best guesses" for the probability of TAI by 2036 (18 %) and 2100 (80 %).)

The "Representativeness" section is very interesting and I'd love to see more timelines analyzed concretely and included in aggregations.

Thanks for the sources! Regarding the aggregation of forecasts, I thought this article to be quite interesting.

Effective Altruism Forum
EA Forum