“Biological anchors” is about bounding, not pinpointing, AI timelines

Holden Karnofsky

I previously summarized Ajeya Cotra’s “biological anchors” method for forecasting for transformative AI, aka “Bio Anchors.” Here I want to try to clarify why I find this method so useful, even though I agree with the majority of the specific things I’ve heard people say about its weaknesses (sometimes people who can’t see why I’d put any stock in it at all).

A couple of preliminaries:

This post is probably mostly of interest for skeptics of Bio Anchors, and/or people who feel pretty confused/agnostic about its value and would like to see a reply to skeptics.
I don’t want to give the impression that I’m leveling new criticisms of “Bio Anchors” and pushing for a novel reinterpretation. I think the author of “Bio Anchors” mostly agrees with what I say both about the report’s weaknesses and about how to best use it (and I think the text of the report itself is consistent with this).

Summary of what the framework is about

Just to re-establish context, here are some key quotes from my main post about biological anchors:

The basic idea is:

Modern AI models can "learn" to do tasks via a (financially costly) process known as "training." You can think of training as a massive amount of trial-and-error. For example, voice recognition AI models are given an audio file of someone talking, take a guess at what the person is saying, then are given the right answer. By doing this millions of times, they "learn" to reliably translate speech to text. More: Training
The bigger an AI model and the more complex the task, the more the training process [or “training run”] costs. Some AI models are bigger than others; to date, none are anywhere near "as big as the human brain" (what this means will be elaborated below). More: Model size and task type
The biological anchors method asks: "Based on the usual patterns in how much training costs, how much would it cost to train an AI model as big as a human brain to perform the hardest tasks humans do? And when will this be cheap enough that we can expect someone to do it?" More: Estimating the expense

...The framework provides a way of thinking about how it could be simultaneously true that (a) the AI systems of a decade ago didn't seem very impressive at all; (b) the AI systems of today can do many impressive things but still feel far short of what humans are able to do; (c) the next few decades - or even the next 15 years - could easily see the development of transformative AI.

Additionally, I think it's worth noting a couple of high-level points from Bio Anchors that don't depend on quite so many estimates and assumptions:

In the coming decade or so, we're likely to see - for the first time - AI models with comparable "size" to the human brain.
If AI models continue to become larger and more efficient at the rates that Bio Anchors estimates, it will probably become affordable this century to hit some pretty extreme milestones - the "high end" of what Bio Anchors thinks might be necessary. These are hard to summarize, but see the "long horizon neural net" and "evolution anchor" frameworks in the report.
One way of thinking about this is that the next century will likely see us go from "not enough compute to run a human-sized model at all" to "extremely plentiful compute, as much as even quite conservative estimates of what we might need." Compute isn't the only factor in AI progress, but to the extent other factors (algorithms, training processes) became the new bottlenecks, there will likely be powerful incentives (and multiple decades) to resolve them.

Things I agree with about the framework’s weaknesses/limitations

Bio Anchors “acts as if” AI will be developed in a particular way, and it almost certainly won’t be

Bio Anchors, in some sense, “acts as if” transformative AI will be built in a particular way: simple brute-force trial-and-error of computationally intensive tasks (as outlined here). Its main forecasts are based on that picture: it estimates when there will be enough compute to run a certain amount of trial and error, and calls that the “estimate for when transformative AI will be developed.”

I think it’s unlikely that if and when transformative AI is developed, the way it’s developed will resemble this kind of blind trial-and-error of long-horizon tasks.

If I had to guess how transformative AI will be developed, it would be more like:

First, narrow AI systems prove valuable at a limited set of tasks. (This is already happening, to a limited degree, with e.g. voice recognition, translation and search.)
This leads to (a) more attention and funding in AI; (b) more integration of AI into the economy, such that it becomes easier to collect data on how humans interact with AIs that can be then used for further training; (c) increased general awareness of what it takes for AI to usefully automate key tasks, and hence increased awareness of (and attention to) the biggest blockers to AI being broader and more capable.
Different sorts of narrow AIs become integrated into different parts of the economy. Over time, the increased training data, funding and attention leads to AIs that are less and less narrow, taking on broader and broader parts of the tasks they’re doing. These changes don’t just happen via AI models (and training runs) getting bigger and bigger; they are also driven by innovations in how AIs are designed and trained.
At some point, some combination of AIs is able to automate enough of scientific and technological advancement to be transformative. There isn’t a single “master run” where a single AI is trained to do the very hardest, broadest tasks via blind trial-and-error.

Bio Anchors “acts as if” compute availability is the only major blocker to transformative AI development, and it probably isn’t

As noted in my earlier post:

Bio Anchors could be too aggressive due to its assumption that "computing power is the bottleneck":

It assumes that if one could pay for all the computing power to do the brute-force "training" described above for the key tasks (e.g., automating scientific work), transformative AI would (likely) follow.
Training an AI model doesn't just require purchasing computing power. It requires hiring researchers, running experiments, and perhaps most importantly, finding a way to set up the "trial and error" process so that the AI can get a huge number of "tries" at the key task. It may turn out that doing so is prohibitively difficult.

It is very easy to picture worlds where transformative AI takes much more or less time than Bio Anchors implies, for reasons that are essentially not modeled in Bio Anchors at all

As implied above, transformative AI could take a very long time for reasons like “it’s extremely hard to get training data and environments for some crucial tasks” or “some tasks simply aren’t learnable even by large amounts of trial-and-error.”

Transformative AI could also be developed much more quickly than Bio Anchors implies. For example, some breakthrough in how we design AI algorithms - perhaps inspired by neuroscience - could lead to AIs that are able to do ~everything human brains can, without needing the massive amount of trial-and-error that Bio Anchors estimates (based on extrapolation from today’s machine learning systems).

I’ve listed more considerations like these here.

Bio Anchors is not “pinpointing” the most likely year transformative AI will be developed

My understanding of climate change models is that they try to examine each major factor that could cause the temperature to be higher or lower in the future; produce a best-guess estimate for each; and put them all together into a prediction of where the temperature will be.

In some sense, you can think of them as “best-guess pinpointing” (or even “simulating”) the future temperature: while they aren’t certain or precise, they are identifying a particular, specific temperature based on all of the major factors that might push it up or down.

Many other cases where someone estimates something uncertain (e.g., the future population) have similar properties.

Bio Anchors isn’t like that. There are factors it ignores that are identifiable today and almost certain to be significant. So in some important sense, it isn’t “pinpointing” the most likely year for transformative AI to be developed.

(Not the focus of this piece) The estimates in Bio Anchors are very uncertain

Bio Anchors estimates some difficult-to-estimate things, such as:

How big an AI model would have to be to be “as big as the human brain” in some relevant sense. (For this it adapts Joe Carlsmith’s detailed report.)
How fast we should expect algorithmic efficiency, hardware efficiency, and “willingness to spend on AI” to increase in the future - all of which affect the question of “how big an AI training run will be affordable.” Its estimates here are very simple and I think there is lots of room for improvement, though I don’t expect the qualitative picture to change radically.

I acknowledge significant uncertainty in these estimates, and I acknowledge that (all else equal) uncertainty means we should be skeptical.

That said:

I think these estimates are probably reasonably close to the best we can do today with the information we have.
I think these estimates are good enough for the purposes of what I’ll be saying below about transformative AI timelines.

I don’t plan to defend this position more here, but may in the future if I get a lot of pushback on it.

Bio Anchors as a way of bounding AI timelines

With all of the above weaknesses acknowledged, here are some things I believe about AI timelines, that are largely based on the Bio Anchors analysis:

I would be at least mildly surprised if transformative AI weren’t developed by 2060. I put the probability of transformative AI by then at 50% (I explain below how the connection works between "mild surprise" and "50%"); I could be sympathetic to someone who said it was 25% or 75%, but would have a hard time seeing where someone was coming from if they went outside that range. More
I would be significantly surprised if transformative AI weren’t developed by 2100. I put the probability of transformative AI by then at 2 in 3; I could be sympathetic to someone who said it was 1 in 3 or 80-90%, but would have a hard time seeing where someone was coming from if they went outside that range. More
Transformative AI by 2036 seems plausible and concretely imaginable, but doesn’t seem like a good default expectation. I think the probability of transformative AI by then is at least 10%; I could be sympathetic to someone who said it was 40-50%, but would have a hard time seeing where someone was coming from if they said it was <10% or >50%. More

I’d be at least mildly surprised if transformative AI weren’t developed by 2060

This is mostly because, according to Bio Anchors, it will then be affordable to do some absurdly big training runs - arguably the biggest ones one could imagine needing to do, based on using AI models 10x the size of human brains and tasks that require massive numbers of computations to do even once. In some important sense, we’ll be “swimming in compute.” (More on this intuition at Fun with +12 OOMs of compute.)

But it also matters that 2060 is 40 years from now, which is 40 years to:

Develop ever more efficient AI algorithms, some of which could be big breakthroughs.
Increase the number of AI-centric companies and businesses, collecting data on human interaction and focusing increasing amounts of attention on the things that currently block broad applications.

Given the already-rising amount of investment, talent, and potential applications for today’s AI systems, 40 years seems like a pretty long time to make big progress on these fronts. For context, 40 years is around the amount of time that has elapsed between the Apple IIe release and now.

When it comes to translating my “sense of mild surprise” into a probability (see here for a sense of what I’m trying to do when talking about probabilities; I expect to write more on this topic in the future):

On most topics, I equate “I’d be mildly surprised if X didn’t happen” with something like a 60-65% chance of X. But on this topic, I do think there's a burden of proof (which I consider significant though not overwhelming), and I'm inclined to shade my estimates downward somewhat. So I am saying there's about a 50% chance of transformative AI by 2060.
I’d be sympathetic if someone said “40 years doesn’t seem like enough to me; I think it’s more like a 25% chance that we’ll see transformative AI by 2060.” But if someone put it at less than 25%, I’d start to think: “Really? Where are you getting that? Why think there’s a <25% chance that we’ll develop transformative AI by a year in which it looks like we’ll be swimming in compute, with enough for the largest needed runs according to our best estimates, with 40 years elapsed between today’s AI boom and 2060 to figure out a lot of the other blockers?”
On the flip side, I’d be sympathetic if someone said “This estimate seems way too conservative; 40 years should be easily enough; I think it’s more like a 75% chance we’ll have transformative AI by 2060.” But if someone put it at more than 75%, I’d start to think: “Really? Where are you getting that? Transformative AI doesn’t feel around the corner, so this seems like kind of a lot of confidence to have about a 40-year-out event.”

I would be significantly surprised if transformative AI weren’t developed by 2100

By 2100, Bio Anchors projects that it will be affordable not only to do almost comically large-seeming training runs (again based on the hypothesized size of the models and cost-per-try of the tasks), but to do as many computations as all animals in history combined, in order to re-create the progress that was made by natural selection.

In addition, 2100 is 80 years from now - longer than the time that has elapsed since programmable digital computers were developed in the first place. That’s a lot of time to find new approaches to AI algorithms, integrate AI into the economy, collect training data, tackle cases where the current AI systems don’t seem able to learn particular tasks, etc.

To me, it feels like 2100 is something like “About as far out as I could tell a reasonable-seeming story for, and then some.” Accordingly, I’d be significantly surprised if transformative AI weren’t developed by then, and I assign about a 2/3 chance that it will be. And:

I’d be sympathetic if someone said “Well, there’s a lot we don’t know, and a lot that needs to happen - I only think there’s a 50% chance we’ll see transformative AI by 2100.” I’d even be somewhat sympathetic if they gave it a 1 in 3 chance. But if someone put it at less than 1/3, I’d really have trouble seeing where they were coming from.
I’d be sympathetic if someone put the probability for “transformative AI by 2100” at more like 80-90%, but given the difficulty of forecasting this sort of thing, I’d really have trouble seeing where they were coming from if they went above 90%.

Transformative AI by 2036 seems plausible and concretely imaginable, but doesn’t seem like a good default expectation

Bio Anchors lays out concrete, plausible scenarios in which there is enough affordable compute to train transformative AI by 2036 (link). I know some AI researchers who feel these scenarios are more than plausible - their intuitions tell them that the giant training runs envisioned by Bio Anchors are unnecessary and that the more aggressive anchors in the report are being underrated.

I also think Bio Anchors understates the case for “transformative AI by 2036” a bit, because it’s hard to tell what consequences the current boom of AI investment and interest will have. If AI is about to become a noticeably bigger part of the economy (definitely an “if”, but compatible with recent market trends), this could result in rapid improvements along many possible dimensions. In particular, there could be a feedback loop in which new profitable AI applications spur more investment in AI, which in turn spurs faster-than-expected improvements in the efficiency of AI algorithms and compute, which in turn leads to more profitable applications … etc.

With all of this in mind, I think the probability of transformative AI by 2036 is at least 10%, and I don't have a lot of sympathy for someone saying it is less.

And that said, all of the above is a set of “coulds” and “mights” - every case I’ve heard for “transformative AI by 2036” seems to require a number of uncertain pieces to click into place.

If “long-horizon” tasks turn out to be important, Bio Anchors shows that it’s hard to imagine there will be enough compute for the needed training runs.
Even if there is plenty of compute, 15 years might not be enough time to resolve challenges like assembling the right training data and environments.
It’s certainly possible that some completely different paradigm will emerge - perhaps inspired by neuroscience - and transformative AI will be developed in ways that don’t require Bio-Anchors-like “training runs” at all. But I don’t see any particular reason to expect that to happen in the next 15 years.

So I also don’t have a lot of sympathy for people who think that there’s a >50% chance of transformative AI by 2036.

Bottom line

Bio Anchors is a bit different from the “usual” approach to estimating things. It doesn’t “pinpoint” likely dates for transformative AI; it doesn’t model all the key factors.

But I think it is very useful - in conjunction with informal reasoning about the factors it doesn’t model - for “bounding” transformative AI timelines: making a variety of statements along the lines of “It would be surprising if transformative AI weren’t developed by ___” or “You could defend a ___% probability by such a date, but I think a ___% probability would be hard to sympathize with.”

And that sort of “bounding” seems quite useful for the purpose I care most about: deciding how seriously to take the possibility of the most important century. My take is that this possibility is very serious, though far from a certainty, and Bio Anchors is an important part of that picture for me.

38 Reactions

AI Timelines: Where the Arguments, and the "Experts," Stand

3 comments90 karma

Mentioned in

144AI Could Defeat All Of Us Combined

58Red-teaming Holden Karnofsky's AI timelines

43Grokking “Forecasting TAI with biological anchors”

12What role should evolutionary analogies play in understanding AI takeoff speeds?

Comments9

Sorted by

New & upvoted

Click to highlight new comments since: Today at 4:45 PM

kokotajlod4y5

I'm going on record as someone who would be mildly surprised if we didn't have AI-PONR by 2036. :) (That is, an AI-induced point of no return.) I also think TAI would follow within 5 years of such an event, possibly within 5 seconds depending on how fast takeoff goes.

And that said, all of the above is a set of “coulds” and “mights” - every case I’ve heard for “transformative AI by 2036” seems to require a number of uncertain pieces to click into place.
If “long-horizon” tasks turn out to be important, Bio Anchors shows that it’s hard to imagine there will be enough compute for the needed training runs.
Even if there is plenty of compute, 15 years might not be enough time to resolve challenges like assembling the right training data and environments.
It’s certainly possible that some completely different paradigm will emerge - perhaps inspired by neuroscience - and transformative AI will be developed in ways that don’t require Bio-Anchors-like “training runs” at all. But I don’t see any particular reason to expect that to happen in the next 15 years.
So I also don’t have a lot of sympathy for people who think that there’s a >50% chance of transformative AI by 2036.

I think the case for AI-PONR by 2036 is more disjunctive than conjunctive, or to put it another way, every case I've heard for "No AI-PONR by 2036" seems to require a number of uncertain pieces to click into place. It's something like "All PONR-inducing tasks are long-horizon, AND we won't figure out a way to get human-level performance at any PONR-inducing task via generalization, AND we won't figure out a way to get human-level performance at any PONR-inducing task via decomposition, AND we won't have a new paradigm that is more data-efficient, AND we won't be able to significantly accelerate AI R&D despite having human-brain-sized NN's that are superhuman at every short-horizon task we have data for" (And note that each of those conjuncts was itself pretty conjunctive, e.g. AI-PONR doesn't even require AGI or agency.)

(There are some disjuncts, such as "OR maybe the scaling laws will break down soon, OR there'll be a massive world war or something that stifles AI progress" I'd be interested to hear a list of such disjuncts.)

Holden Karnofsky4y20

I think the sense in which your case is disjunctive is mostly that there are multiple potential "PONR-inducing tasks," and multiple potential ways to get to each one (brute-force trial-and-error on the full task, generalization from easier-to-learn tasks, decomposition into easier-to-learn tasks, breakthrough new paradigm). But this sort of disjunctiveness seems like it was fundamentally there in 1970 and in 1990 - if it didn't predict transformative AI (or PONR AI) within 15 years then, what's different today?

I'm guessing your answer is something like "Today, we are close to being able to train human-brain-sized models, if only on small-number-of-timestep tasks." I do think that's relevant. But with GPT-3 having been out for more than a year, within 1000x of the "human brain size" threshold, and with seemingly nobody having found a way to get it to do something that seems all that much like a human doing some economically relevant task, this doesn't seem like enough to get over 50% probability by 2036.

kokotajlod4y7

Hmm, good point. I wonder if this comes down to how "meta" we like to be: My initial reaction to your question was

"What's different today?!?? Loads of things! We have AlphaStar and GPT-3 and scaling laws and transfer learning and image recognition and theory underpinning the scaling laws and almost-human-brain-sized-NNs ... There are multiple PONR-inducing tasks that seem plausibly within reach now via multiple methods, whereas 15 years ago before the deep learning boom there was only one method: the 'maybe we'll have some huge unprecedented breakthrough' method!"

But I imagine you'd say: "Yes, those are all differences, but one can always find differences if one looks. In 2006 you would have been able to list a bunch of things that happened since 1991, for example. The meta-strategy you are employing of thinking about recent AI progress and then noting the various ways in which it brings us closer to AGI etc. is a bad one; it has consistently failed for half a century so we shouldn't expect it to work now."

The way to settle this would be to wipe my memory and take me back in time to 2006 and see if I am similarly bullish about AI timelines, i.e. see if I actually am using that meta-strategy. It doesn't feel like I am, but maybe I am self-deceived. (I had 20-30 year timelines up until about 2 years ago) Unfortunately we don't have the right equipment.

Anyhow, I dislike this meta stuff. I think it's better to reason on the object level, at least in cases like this where there aren't people with more expertise to defer to. And on the object level, it seems like there are now multiple plausible paths to AI-PONR whereas 15 years ago there were none, or maybe just the "maybe we'll have some unprecedented breakthrough" one. (This has a lot to do with the human brain size anchor, yes, but also with various other things like the scaling laws and the recent deep learning boom and GPT-3 etc.) That said, I wasn't thinking about these things 15 years ago and it's possible that if I was I'd have been raving about the impending singularity. :P So I admit you do have a point.

When would your skepticism cease? I feel like it'll always be true that APS-AI or AI-PONR will require a number of uncertain pieces to click into place, in some sense, until it's literally happening. What matters is the "in some sense." What sorts of signs and portents would convince you that there's a >50% chance of APS-AI or AI-PONR or TAI within 15 years?

To the point about GPT-3 being out for more than a year: One year is not a very long time and 1000x smaller than a human brain is not very big and I don't care about economic relevance primarily anyway (what I care about is AI-PONR, which I tentatively expect to come before GWP accelerates) and we are talking 10-year timelines not 2-year timelines. Suppose AI really will accelerate GWP 10 years from now. Does that confidently predict that GPT-3 would find massive economic application within 1 year? I don't think so. It's definitely evidence, but not strong evidence I think.

ETA: I forgot to mention that my timelines are generated from Ajeya's model, not from reasoning about disjunctiveness and impressiveness. I just put different weights into the different anchors than she does. The reasons I put different weights come down to different interpretations of evidence like the scaling laws, transfer learning, etc. and different intuitions about how hard it'll be, etc. It's of course possible that I'm biased and 15 years ago would have put loads of weight on milestones that have already passed... but it's more plausible IMO that actually I wouldn't have; this human brain anchor stuff plausibly would have appealed to me then as now, for example.

Holden Karnofsky4y12

Re: "When would your skepticism cease?", it is certainly hard to lay out hypothetical observations that would correspond to particular AI timelines! But I'll give a shot. Some example observations that seem like they should make my timelines a lot shorter than they are now, down to 15 years or shorter:

"There are cases of successful, impressive training runs that required key rewards to be very sparse (100,000+ timesteps each)." In this case, I'd doubt that compute was a bottleneck anymore, and think we were down to environment design.
"A large number of tasks have been trained without such sparse rewards; when looking at the list of tasks, I no longer find it plausible to think that there are key 'long-horizon' tasks that will need more compute than these tasks have needed, and it seems like a pretty good guess that it's affordable compute-wise to train any task." Similar to the previous case, with sparse-reward training turning out to be unnecessary for the kinds of training runs I would've guessed it'd be necessary for.
"People are actually running training runs that seem like they could theoretically generate an AGI." By default I expect that to be happening at least several years before we see actual AGI, as I expect it will take a while between "This sort of thing seems very concretely doable" and "This is working."
"We pretty much have a proof-of-concept for nearly every task an AI would need to do to be transformative/PONR; many are expensive and impractical and unreliable; people are engaged in massive data collection and environment design efforts to improve this." Again, I think we'd still be at least several years out by default here.
"Siri, Assistant, etc. can carry out a number of multi-step tasks about as well and reliably as a human virtual assistant would, and someone has made a decent case that their level of autonomy, creativity, etc. is on an upward trend that implies being able to do the kind of work top scientists do soon." I'm not currently aware of any analysis in such trends, and subjectively don't feel there's been impressive progress over the last few years.

I thought these up with Bio Anchors in mind, and accordingly, most explicitly involve some evidence that we don't still have orders of magnitudes of compute-affordability to go. But there are probably lots of other configurations of in-the-lab-but-not-really-working-yet experiments, out-in-the-world performance, opinions from people closest to the work, etc. that would lead me to have shorter timelines (and there are probably things that some people would argue qualify as the above, that wouldn't).

More prosaically, I can keep comparing reality to the implied predictions of my preferred weightings for Bio Anchors. As the size of the biggest affordable training run gets bigger - and/or I see examples of successful training runs that seem like they should've required more compute, causing me to feel that the "effective" size of the biggest affordable training run has gotten bigger - I hope to update accordingly.

I do think it's possible that we'll get such a sudden jump that none of these sorts of things happens far in advance. I just don't think it's more than 50% likely.

I wouldn't guess you'd have had the same timelines in 2006, and I don't think I would have either. I think a lot has changed. But the basic fact that there are a lot of imaginable paths to AGI doesn't seem to have changed.

The fact that there are several that now seem "plausible" has changed to some degree, but looking over your list, those paths do all seem quite unlikely to get us all the way to "PONR-inducing AI" by 2036 (and they're not independent either). It might be interesting to try to specify the probabilities you see for each potential path.

kokotajlod4y2

Thanks for the thoughtful reply, that's a good list! I'll make a list of my own below. Warning: Wall of text incoming, I won't be offended if you don't read it!

The fact that there are several that now seem "plausible" has changed to some degree, but looking over your list, those paths do all seem quite unlikely to get us all the way to "PONR-inducing AI" by 2036 (and they're not independent either). It might be interesting to try to specify the probabilities you see for each potential path.

This is the crux I guess, haha. Here's a stab:

Let's suppose it's 2030 and algorithmic and hardware progress have continued at the rates Ajeya projects and so has willingness-to-spend. Also let's suppose the scaling laws have continued to hold.

Here is a disjunctive list of paths-to-AI-PONR:

a. Some PONR-inducing task turns out to be short-horizon

b. Some PONR-inducing task turns out to work with smallish brains and medium horizons

c. Some PONR-inducing task can be reached via generalization (in short-horizon-pre-trained human-size brains)

d. Some PONR-inducing task can be reached via task decomposition (e.g. bureaucracies of AIs of the aforementioned types)

e. New algorithmic advancements appear that make it possible to do long-horizon training a few OOMs more data-efficiently (I guess I mean this to also be the catch-all category for paradigm shifts and the like)

I should now say what the main PONR-inducing tasks are in my opinion. They are:

--APS-AI [EDIT: Advanced, Planning, Strategically aware. See this report.]

--Persuasion tools good enough to cause major ideological strife and/or major degradation of public epistemology

--R&D acceleration

--Unknown/catchall

Technically R&D acceleration isn't PONR-inducing but it would lead to something PONR-inducing pretty quickly so I include it.

Ok, credences:

a. I think APS-AI is probably not short-horizon, but persuasion and R&D acceleration and unknown might be. (Maybe if we did AlphaFold but bigger and for AI R&D it would make a kickass tool for designing new AI architectures. Input hyperparameters, it predicts what training curve and performance on benchmarks will be!) Let's say 50% chance for persuasion, 25% for R&D acceleration, and 15% for unknown, and 65% for combined.

b. I worry that maybe a small neural net trained long-horizon-style to be APS-AI might actually succeed at some PONR-inducing task even though it is smaller than the human brain. I don’t worry too much about this, but… think about how GPT-2 is able to write sensible English even though it’s 5 OOMs smaller than the human brain. Or how AlphaStar an go toe-to-toe with human experts despite being 7 OOMs smaller! Let’s say 20%.

c. I’m more worried about big pre-trained brains generalizing (perhaps with a bit of fine-tuning.) I know there has been some research done into scaling laws for transfer, and Rohin extrapolated to calculate that this would only knock off 1.5 OOMs of cost from a hypothetical long-horizon training run… but I’m still nervous. Put it this way: Humans are FAR from optimal at long-horizon tasks anyway. There is no reason to think that we are as good as a human-brain-sized neural net trained for 10^14 data points each one the length of a subjective human lifetime. There’s every reason to think that neural net would instead be dramatically better than us. What sorts of things does an AI need to do to be APS-AI? Planning, strategically aware… arguably GPT-3 can already do those things, it just can’t do them well. But once it’s bigger, and fine-tuned… maybe it’ll be able to go toe-to-toe with humans, while still being far from optimal. Or even if it can’t be APS-AI, maybe it can be smart enough to accelerate AI R&D. (One could also imagine making a brain bigger than the human brain, and then pre-training it, and then using it as an oracle… ask it to predict which AI architecture will yield the best results, etc.) I say 60%.

d. I think bureaucracies of neural nets are pretty brittle and finicky now, but (a) that might change in the future as we get more practice with them, and (b) I get the impression that they do reasonably well when you can fine-tune them / retrain them into their new roles. See e.g. the recent OpenAI crawl-the-internet-and-do-research-with-which-to-answer-questions bot. I say 25%.

e: Let's suppose there have been 2 paradigm shifts in the last 60 years of AI research. Seems like the recent shift to deep learning was one. Seems very plausible that if we have a new shift that is to deep learning what deep learning was to the previous shitty stuff in the early 2000s, then we are going to get AI-PONR very shortly thereafter. So anyhow maybe this suggests something like a 33% chance of another such shift by 2030, going on base rates? Could go down if you think there have been fewer paradigm shifts in the past, could go up if you think there have been more. I'd love to see someone measure the recent increase in investment and calculate whether we are more likely to get paradigm shifts now than any time in the past, taking into account ideas-getting-harder-to-find effects. (Huh, you know, I don't think I realized how high the chance of paradigm shift is until now... I guess this means my timelines should be shorter...)

f. I’m not sure which category this fits in, but what about just scaling up EfficientZero? As far as I know its architecture is pretty damn general, not game-specific at all. You should be able to hook it up to a robot or a chatbot (perhaps with a pre-trained model like GPT-3 as a seed) and let rip. Napkin math time: Instead of spending 1 day training on hardware that costs $10,000, let's make a custom supercomputer that is 6 OOMs bigger. Cost: $10B. Run it for 100 days instead of 1. That gives us 8 OOMs more compute to work with than EfficientZero had. Use 5 OOMs to increase the subjective training time from 2 hours to 22 years. Use 3 OOMs to increase parameter count. Maybe this setup would work for something much more complex than Atari… I’m gonna say 20%.

Anyhow, all of this is off the cuff, out of my ass, etc. but it really does feel like it adds up to significantly more than 50% to me, more like 80% or so. So then why aren’t my timelines 80% by 2030? Well, remember all of this was conditioning on “algorithmic and hardware progress have continued at the rates Ajeya projects and so has willingness-to-spend. Also let's suppose the scaling laws have continued to hold.” Also I wish to be humble etc. and defer to people like yourself and Ajeya and Paul at least a little bit.

My promised list: Here are some example observations that would go a long way towards lengthening my timelines a lot longer, e.g. to 20-30 years instead of 10:

1. AI winter. Progress slows, investment dries up. People generally agree that the amount of compute used for the largest training runs will stop growing for the next decade or so, rather than grow by a couple OOMs as is currently expected.

2. Roadblock that doesn't quickly fall: My brief (5year) experience watching AI progress is a story of many repeated instances of purported roadblocks being smashed through almost as soon as I hear about them. E.g. transfer learning, imperfect-information games, common sense understanding, reasoning, real-time games, sim-2-real, ... the list goes on. Most recently people I respect a lot (Ajeya, Paul, etc.) taught me about horizon lengths and data inefficiency and I came to believe that modern AI methods were fundamentally less data-efficient than the human brain... but then along came EfficientZero! So, I'd lengthen my timelines if someone clearly articulates a major roadblock to all important milestones (AGI/TAI/APS-AI/etc.), DeepMind and OpenAI etc. throw themselves at overcoming it for a few years, and fail. (Maybe this has already happened and I haven't heard about it because of publication bias?) (Also it's important that the roadblock plausibly block us from AGI/TAI/APS-AI/etc. Data-efficiency is on thin ice by this metric because plausibly even if AI is dramatically less data-efficient than humans there might still be a way to make AGI/TAI/APS-AI/etc. out of it. Causal reasoning and common sense and imperfect-information games do much better by this metric; too bad we smashed through them so easily.)

3. Solid evidence that human intelligence comes from "special sauce" that needs to either be painstakingly imitated via much greater knowledge of neuroscience, or brute-force rediscovered via at least genome-anchor-like levels of artificial evolution. As far as I know there isn't really any solid evidence for the special sauce hypothesis; if actually AGI is really easy and there is no special sauce whatsoever, my brain would still look exactly the way it does. (To date there has been no experiment along the lines of “make a 100T parameter dense model and train it for a billion time steps,” not even close.) The best piece of evidence I know of is along the lines of "If there's no special sauce, then we should be able to make AIs as smart as animal brains of similar size, and we can't." Except that so far it seems like we can actually? We can make image recognizers better than bee brains, for example, as OpenPhil's investigation showed. I haven't yet heard of an intellectual task tiny-brained animals can do that we know current AI methods can't also do.

4. People trying to build AGI with a track record of success change their minds and start disagreeing with me about timelines: My impression is that the people actually trying to build AGI, especially the ones at the cutting edge with the best track records, tend to have even shorter timelines than me!

Holden Karnofsky4y4

Interesting, thanks! Yep, those probabilities definitely seem too high to me :) How much would you shade them down for 5 years instead of 15? It seems like if your 5-year probabilities are anywhere near your 15-year probabilities, then the next 5 years have a lot of potential to update you one way or the other (e.g., if none of the "paths to PONR" you're describing work out in that time, that seems like it should be a significant update).

I'm not going to comment comprehensively on the paths you laid out, but a few things:

I think EfficientZero is sample-efficient but not compute-efficient: it's compensating for its small number of data points by simulating a large number, and I don't think there are big surprises on how much compute it's using to do that. This doesn't to be competing with human "efficiency" in the most important (e.g., compute costs) sense.
I don't know what you mean by APS-AI.
I'm pretty skeptical that "Persuasion tools good enough to cause major ideological strife and/or major degradation of public epistemology" is a serious PONR candidate. (There's already a lot of ideological strife and public confusion ...) I think the level of persuasiveness needed here would need to be incredibly extreme - far beyond "can build a QAnon-like following" and more like "Can get more than half the population to take whatever actions one wants them to take." This probably requires reasoning about neuroscience or something, and doesn't seem to me to be adding much in the way of independent possibility relative to the R&D possibility.

kokotajlod4y3

Gaah, sorry, I keep forgetting to put links in -- APS-AI means Advanced, Planning, Strategically Aware AI -- the thing the Carlsmith report talks about. I'll edit to put links in retroactively.

I've written a short story about what I expect the next 5 years to look like. Insofar as AI progress is systematically slower and less impressive than what is depicted in that story, I'll update towards longer timelines, yeah.

I'm currently at something like 20% that AI-PONR will be crossed in the next 5 years, and so insofar as that doesn't seem to have happened 5 years from now then that'll be a 20%-sized blow to my timelines in the usual Bayesian way. It's important to note that this won't necessarily lengthen my timelines all things considered, because what happens in those 5 years might be more than a 20% blow to 20+year timelines. (For example, and this is what I actually think is most likely, 5 years from now the world could look like it does at the end of my short story, in which case I'd have become more confident that the point of no return will come sometime between 2026 and 2036 than I am now, not less, because things would be more on track towards that outcome than they currently seem to be.)

Re: persuasion tools: You seem to have a different model of how persuasion tools cause PONR than I do. What I have in mind is mundane, not exotic--I'm not imagining AIs building QAnon-like cult followings, I'm imagining the cost of censorship/propaganda* continuing to drop rapidly and the effectiveness continuing to increase rapidly, and (given a few years for society to catch up) ideological strife to intensify in general. This in turn isn't an x-risk by itself but it's certainly a risk factor, and insofar as our impact comes from convincing key parts of society (e.g. government, tech companies) to recognize and navigate a tricky novel problem (AI risk) it seems plausible to me that our probability of success diminishes rapidly as ideological strife in those parts of society intensifies. So when you say "there's already a lot of ideological strife and public confusion" my response is "yeah exactly, and isn't it already causing big problems and e.g. making our collective handling of COVID worse? Now imagine that said strife and confusion gets a lot worse in the next five years, and worse still in the five years after that."

*I mean these terms in a broad sense. I'm talking about the main ways in which ideologies strengthen their hold on existing hosts and spread themselves to new ones. For more on this see the aforementioned story, this post, and this comment.

Re: EfficientZero: Fair, I need to think about that more... I guess it would be really helpful to have examples of EfficientZero being done on more complex environments than Atari, such as e.g. real-world robot control or Starcraft or text prediction.

Holden Karnofsky4y4

Sorry for the long delay, I let a lot of comments to respond to pile up!

APS seems like a category of systems that includes some of the others you listed (“Advanced capability: they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation) … “). I still don’t feel clear on what you have in mind here in terms of specific transformative capabilities. If we condition on not having extreme capabilities for persuasion or research/engineering, I’m quite skeptical that something in the "business/military/political strategy" category is a great candidate to have transformative impact on its own.

Thanks for the links re: persuasion! This seems like a major theme for you and a big place where we currently disagree. I'm not sure what to make of your take, and I think I'd have to think a lot more to have stable views on it, but here are quick reactions:

If we made a chart of some number capturing "how easy it is to convince key parts of society to recognize and navigate a tricky novel problem" (which I'll abbreviate as "epistemic responsiveness") since the dawn of civilization, what would that chart look like? My guess is that it would be pretty chaotic; that it would sometimes go quite low and sometime sgo quite high; and that it would be very hard to predict the impact of a given technology or other development on epistemic responsiveness. Maybe there have been one-off points in history when epistemic responsiveness was very high; maybe it is much lower today compared to peak, such that someone could already claim we have passed the "point of no return"; maybe "persuasion AI" will drive it lower or higher, depending partly on who you think will have access to the biggest and best persuasion AIs and how they will use them. So I think even if we grant a lot of your views about how much AI could change the "memetic environment," it's not clear how this relates to the "point of no return."
I think I feel a lot less impressed/scared than you with respect to today's "persuasion techniques."
- I'd be interested in seeing literature on how big an effect size you can get out of things like focus groups and A/B testing. My guess is that going from completely incompetent at persuasion (e.g., basically modeling your audience as yourself, which is where most people start) to "empirically understanding and incorporating your audience's different-from-you characteristics" causes a big jump from a very low level of effectiveness, but that things flatten out quickly after that, and that pouring more effort into focus groups and testing leads to only moderate effects, such that "doubling effectiveness" on the margin shouldn't be a very impressive/scary idea.
- I think most media is optimizing for engagement rather than persuasion, and that it's natural for things to continue this way as AI advances. Engagement is dramatically easier to measure than persuasion, so data-hungry AI should help more with engagement than persuasion; targeting engagement is in some sense "self-reinforcing" and "self-funding" in a way that targeting persuasion isn't (so persuasion targeters need some sort of subsidy to compete with engagement targeters); and there are norms against targeting persuasion as well. I do expect some people and institutions to invest a lot in persuasion targeting (as they do today), but my modal expectation does not involve it becoming pervasive on nearly all websites, the way yours seems to.
- I feel like a lot of today's "persuasion" is either (a) extremely immersive (someone is raised in a social setting that is very committed to some set of views or practices); or (b) involves persuading previously-close-to-indifferent people to believe things that call for low-cost actions (in many cases this means voting and social media posting; in some cases it can mean more consequential, but still ultimately not-super-high-personal-cost, actions). (b) can lead over time to shifting coalitions and identities, but the transition from (b) to (a) seems long.
- I particularly don't feel that today's "persuaders" have much ability to accomplish the things that you're pointing to with "chatbots," "coaches," "Imperius curses" and "drugs." (Are there cases of drugs being used to systematically cause people to make durable, sustained, action-relevant changes to their views, especially when not accompanied by broader social immersion?)
I'm not really all that sure what the special role of AI is here, if we assume (for the sake of your argument that AI need not do other things to be transformative or PONR-y) a lack of scientific/engineering ability. What has/had higher ex ante probability of leading to a dramatic change in the memetic environment: further development of AI language models that could be used to write more propaganda, or the recent (last 20 years) explosion in communication channels and data, or many other changes over the last few hundred years such as the advent of radio and television, or the change in business models for media that we're living through now? This comparison is intended to be an argument both that "your kind of reasoning would've led us to expect many previous persuasion-related PONRs without needing special AI advances" and that "if we condition on persuasion-related PONRs being the big thing to think about, we shouldn't necessarily be all that focused on AI."

I liked the story you wrote! A lot of it seems reasonably likely to be reasonably on point to me - I especially liked your bits about AIs confusing people when asked about their internal lives. However:

I think the story is missing a kind of quantification or "quantified attitude" that seems important if we want to be talking about whether this story playing out "would mean we're probably looking at transformative/PONR-AI in the following five years." For example, I do expect progress in digital assistants, but it matters an awful lot how much progress and economic impact there is. Same goes for just how effective the "pervasive persuasion targeting" is. I think this story could be consistent with worlds in which I've updated a lot toward shorter transformative AI timelines, and with worlds in which I haven't at all (or have updated toward longer ones.)
As my comments probably indicate, I'm not sold on this section.
- I'll be pretty surprised if e.g. the NYT is using a lot of persuasion targeting, as opposed to engagement targeting.
- I do expect "People who still remember 2021 think of it as the golden days, when conformism and censorship and polarization were noticeably less than they are now" will be true, but that's primarily because (a) I think people are just really quick to hallucinate declinist dynamics and call past times "golden ages"; (b) 2021 does seem to have extremely little conformism and censorship (and basically normal polarization) by historical standards, and actually does kinda seem like a sort of epistemic golden age to me.
  - For people who are strongly and genuinely interested in understanding the world, I think we are in the midst of an explosion in useful websites, tools, and blogs that will someday be seen nostalgically;* a number of these websites/tools/blogs are remarkably influential among powerful people; and while most people are taking a lot less advantage than they could and seem to have pretty poorly epistemically grounded views, I'm extremely unconvinced that things looked better on this front in the past - here's one post on that topic.

I do generally think that persuasion is an underexplored topic, and could have many implications for transformative AI strategy. Such implications could include something like "Today's data explosion is already causing dramatic improvements in the ability of websites and other media to convince people of arbitrary things; we should assign a reasonably high probability that language models will further speed this in a way that transforms the world." That just isn't my guess at the moment.

*To be clear, I don't think this will be because websites/tools/blogs will be less useful in the future. I just think people will be more impressed with those of our time, which are picking a lot of low-hanging fruit in terms of improving on the status quo, so they'll feel impressive to read while knowing that the points they were making were novel at the time.

kokotajlod4y2

I'm a fan of lengthy asynchronous intellectual exchanges like this one, so no need to apologize for the delay. I hope you don't mind my delay either? As usual, no need to reply to this message.

If we condition on not having extreme capabilities for persuasion or research/engineering, I’m quite skeptical that something in the "business/military/political strategy" category is a great candidate to have transformative impact on its own.

I think I agree with this.

Re: quantification: I agree; currently I don't have good metrics to forecast on, much less good forecasts, for persuasion stuff and AI-PONR stuff. I am working on fixing that problem. :)

Re persuasion: For the past two years I have agreed with the claims made in "The misinformation problem seems like misinformation."(!!!) The problem isn't lack of access to information; information is more available than it ever was before. Nor is the problem "fake news" or other falsehoods. (Most propaganda is true.) Being politically polarized and extremist correlates positively with being well-informed, not negatively! (Anecdotally, my grad school friends with the craziest/most-extreme/most-dangerous/least-epistemically-virtuous political beliefs were generally the people best informed about politics. Analogous to how 9/11 truthers will probably know a lot more about 9/11 than you or me.) This is indeed an epistemic golden age... for people who are able to resist the temptations of various filter bubbles and the propaganda of various ideologies. (And everyone thinks themself one such person, so everyone thinks this is an epistemic golden age for them.)

I do disagree with your claim that this is currently an epistemic golden age. I think it's important to distinguish between ways in which it is and isn't. I mentioned above a way that it is.

If we made a chart of some number capturing "how easy it is to convince key parts of society to recognize and navigate a tricky novel problem" ... since the dawn of civilization, what would that chart look like? My guess is that it would be pretty chaotic; that it would sometimes go quite low and sometimes go quite high

Agreed. I argued this, in fact.

and that it would be very hard to predict the impact of a given technology or other development on epistemic responsiveness.

Disagree. I mean, I don't know, maybe this is true. But I feel like we shouldn't just throw our hands up in the air here, we haven't even tried! I've sketched an argument for why we should expect epistemic responsiveness to decrease in the near future (propaganda and censorship are bad for epistemic responsiveness & they are getting a lot cheaper and more effective & no pro-epistemic-responsiveness-force seems to be rising to counter it)

Maybe there have been one-off points in history when epistemic responsiveness was very high; maybe it is much lower today compared to peak, such that someone could already claim we have passed the "point of no return"; maybe "persuasion AI" will drive it lower or higher, depending partly on who you think will have access to the biggest and best persuasion AIs and how they will use them.

Agreed. I argued this, in fact. (Note: "point of no return" is a relative notion; it may be that relative to us in 2010 the point of no return was e.g. the founding of OpenAI, and nevertheless relative to us now the point of no return is still years in the future.)

So I think even if we grant a lot of your views about how much AI could change the "memetic environment," it's not clear how this relates to the "point of no return."

The conclusion I built was "We should direct more research effort at understanding and forecasting this stuff because it seems important." I think that conclusion is supported by the above claims about the possible effects of persuasion tools.

What has/had higher ex ante probability of leading to a dramatic change in the memetic environment: further development of AI language models that could be used to write more propaganda, or the recent (last 20 years) explosion in communication channels and data, or many other changes over the last few hundred years such as the advent of radio and television, or the change in business models for media that we're living through now? This comparison is intended to be an argument both that "your kind of reasoning would've led us to expect many previous persuasion-related PONRs without needing special AI advances" and that "if we condition on persuasion-related PONRs being the big thing to think about, we shouldn't necessarily be all that focused on AI."

Good argument. To hazard a guess:
1. Explosion in communication channels and data (i.e. the Internet + Big Data)
2. AI language models useful for propaganda and censorship
3. Advent of radio and television
4. Change in business models for media

However I'm pretty uncertain about this, I could easily see the order being different. Note that from what I've heard the advent of radio and television DID have a big effect on public epistemology; e.g. it partly enabled totalitarianism. Prior to that, the printing press is argued to have also had disruptive effects.

This is why I emphasized elsewhere that I'm not arguing for anything unprecedented. Public epistemology / epistemic responsiveness has waxed and waned over time and has occasionally gotten extremely bad (e.g. in totalitarian regimes and the freer societies that went totalitarian) and so we shouldn't be surprised if it happens again and if someone has an argument that it might be about to happen again it should be taken seriously and investigated. (I'm not saying you yourself need to investigate this, you probably have better things to do.) Also I totally agree that we shouldn't just be focused on AI; in fact I'd go further and say that most of the improvements in propaganda+censorship will come from non-AI stuff like Big Data. But AI will help too; it seems to make censorship a lot cheaper for example.

I'd be interested in seeing literature on how big an effect size you can get out of things like focus groups and A/B testing. My guess is that going from completely incompetent at persuasion (e.g., basically modeling your audience as yourself, which is where most people start) to "empirically understanding and incorporating your audience's different-from-you characteristics" causes a big jump from a very low level of effectiveness, but that things flatten out quickly after that, and that pouring more effort into focus groups and testing leads to only moderate effects, such that "doubling effectiveness" on the margin shouldn't be a very impressive/scary idea.

I think most media is optimizing for engagement rather than persuasion, and that it's natural for things to continue this way as AI advances. Engagement is dramatically easier to measure than persuasion, so data-hungry AI should help more with engagement than persuasion; targeting engagement is in some sense "self-reinforcing" and "self-funding" in a way that targeting persuasion isn't (so persuasion targeters need some sort of subsidy to compete with engagement targeters); and there are norms against targeting persuasion as well. I do expect some people and institutions to invest a lot in persuasion targeting (as they do today), but my modal expectation does not involve it becoming pervasive on nearly all websites, the way yours seems to.
I feel like a lot of today's "persuasion" is either (a) extremely immersive (someone is raised in a social setting that is very committed to some set of views or practices); or (b) involves persuading previously-close-to-indifferent people to believe things that call for low-cost actions (in many cases this means voting and social media posting; in some cases it can mean more consequential, but still ultimately not-super-high-personal-cost, actions). (b) can lead over time to shifting coalitions and identities, but the transition from (b) to (a) seems long.
I particularly don't feel that today's "persuaders" have much ability to accomplish the things that you're pointing to with "chatbots," "coaches," "Imperius curses" and "drugs." (Are there cases of drugs being used to systematically cause people to make durable, sustained, action-relevant changes to their views, especially when not accompanied by broader social immersion?)

These are all good points. This is exactly the sort of thing I wish there was more research into, and that I'm considering doing more research on myself.

Re: pervasiveness on almost all websites: Currently propaganda and censorship both seem pretty widespread and also seem to be on a trend of becoming more so. (The list of things that get censored is growing, not shrinking, for example.) This is despite the fact that censorship is costly and so theoretically platforms that do it should be outcompeted by platforms that just maximize engagement. Also, IIRC facebook uses large language models to do the censoring more efficiently and cheaply, and I assume the other companies do too. As far as I know they aren't measuring user opinions and directly using that as a feedback signal, thank goodness, but... is it that much of a stretch to think that they might? It's only been two years since GPT-3.