Hide table of contents

This is a post on OpenAI’s “AI and Compute” piece, as well as excellent responses by Ryan Carey and Ben Garfinkel, Research Fellows at the Future of Humanity Institute. (Crossposted on Less Wrong)

Intro: AI and Compute

Last May, OpenAI released an analysis on AI progress that blew me away. The key takeaway is this: the computing power used in the biggest AI research projects has been doubling every 3.5 months since 2012. That means that more recent projects like AlphaZero have tens of thousands of times the “compute” behind them as something like AlexNet did in 2012.

When I first saw this, it seemed like evidence that powerful AI is closer than we think. Moore’s Law doubled generally-available compute about every 18 months to 2 years, and has resulted in the most impressive achievements of the last half century. Personal computers, mobile phones, the Internet...in all likelihood, none of these would exist without the remorseless progress of constantly shrinking, ever cheaper computer chips, powered by the mysterious straight line of Moore’s Law.

So with a doubling cycle for AI compute that’s more than five times faster (let’s call it AI Moore’s Law), we should expect to see huge advances in AI in the relative blink of an eye...or so I thought. But OpenAI’s analysis has led some people to the exact opposite view.[1]

Interpreting the Evidence

Ryan Carey points out that while the compute used in these projects is doubling every 3.5 months, the compute you can buy per dollar is growing around 4-12 times slower. The trend is being driven by firms investing more money, not (for the most part) inventing better technology, at least on the hardware side. This means that the growing cost of projects will keep even Google and Amazon-sized companies from sustaining AI Moore’s Law for more than roughly 2.5 years. And that’s likely an upper bound, not a lower one; companies may try keep their research budgets relatively constant. This means that increased funding for AI research would have to displace other R&D, which firms will be reluctant to do.[2] But for lack of good data, for the rest of the post I’ll assume we’ve more or less been following the trend since the publication of “AI and Compute”.[3]

While Carey thinks that we’ll pass some interesting milestones for compute during this time which might be promising for research, Ben Garfinkel is much more pessimistic. His argument is that we’ve seen a certain amount of progress in AI research recently, so realizing that it’s been driven by huge increases in compute means we should reconsider how much adding more will advance the field. He adds that this also means AI advances at the current pace are unsustainable, agreeing with Carey. Both of their views are somewhat simplified here, and worth reading in full.

Thoughts on Garfinkel

To address Garfinkel’s argument, it helps to be a bit more explicit. We can think of the compute in an AI system and the computational power of a human brain as mediated by the effectiveness of their algorithms, which is unknown for both humans and AI systems. The basic equation is something like: Capability = Compute * Algorithms. Once AI’s Capability reaches a certain threshold, “Human Brain,” we get human-level AI. We can observe the level of Capability that AI systems have reached so far (with some uncertainty), and have now measured their Compute. My initial reaction to reading OpenAI’s piece was the optimistic one - Capability must be higher than we thought, since Compute is so much higher! Garfinkel seems to think that Algorithms must be lower than we thought, since Capability hasn’t changed. This shows that Garfinkel and I disagree on how precisely we can observe Capability. We can avoid lowering Algorithms to the extent that our observation of Capability is imprecise and has room for revision. I think he’s probably right that the default approach should be to revise Algorithms downward, though there’s some leeway to revise Capability upward.

Much of Garfinkel’s pessimism about the implications of “AI and Compute” comes from the realization that its trend will soon stop - an important point. But what if, by that time, the Compute in AI systems will have surpassed the brain’s?

Thoughts on Carey

Carey thinks that one important milestone for AI progress is when projects have compute equal to running a human brain for 18 years. At that point we could expect AI systems to match an 18-year-old human’s cognitive abilities, if their algorithms successfully imitated a brain or otherwise performed at its level. AI Impacts has collected various estimates of how much compute this might require - by the end of AI Moore's Law they should comfortably reach and exceed it. Another useful marker is the 300-year AlphaGo Zero milestone. The idea here is that AI systems might learn much more slowly than humans - it would take someone about 300 years to play as many Go games as AlphaGo Zero did before beating its previous version, which beat a top-ranked human Go player. A similar ratio might apply to learning to perform other tasks at a human-equivalent level (although AlphaGo Zero’s performance was superhuman). Finally we have the brain-evolution milestone; that is, how much compute it would take to simulate the evolution of a nervous system as complex as the human brain. Only this last milestone is outside the scope of AI Moore's Law.[4] I tend to agree with Carey that the necessary compute to reach human-level AI lies somewhere around the 18 and 300-year milestones.

But I believe his analysis likely overestimates the difficulty of reaching these computational milestones. The FLOPS per brain estimates he cites are concerned with simulating a physical brain, rather than estimating how much useful computation the brain performs. The level of detail of the simulations seems to be the main source of variance among these higher estimates, and is irrelevant for our purposes - we just want to know how well a brain can compute things. So I think we should take the lower estimates as more relevant - Moravec’s 10^13 FLOPS and Kurzweil’s 10^16 FLOPS (page 114) are good places to start,[5] though far from perfect. These estimates are calculated by comparing areas of the brain responsible for discrete tasks like vision to specialized computer systems - they represent something nearer the minimum amount of computation to equal the human brain than other estimates. If accurate, the reduction in required computation by 2 orders of magnitude has significant implications for our AI milestones. Using the estimates Kurzweil cites, we’ll comfortably pass the milestones for both 18 and 300-year human-equivalent compute by the time AI Moore's Law has finished in roughly 2.5 years.[6] There’s also some reason to think that AI systems’ learning abilities are improving, in the sense that they don’t require as much data to make the same inferences. DeepMind certainly seems to be saying that AlphaZero is better at searching a more limited set of promising moves than Stockfish, a traditional chess engine (unfortunately they don’t compare it to earlier versions of AlphaGo on this metric). On the other hand, board games like Chess and Go are probably the ideal case for reinforcement learning algorithms, as they can play against themselves rapidly to improve. It’s unclear how current approaches could transfer to situations where this kind of self-play isn’t possible.

Final Thoughts

So - what can we conclude? I don’t agree with Garfinkel that OpenAI’s analysis should make us more pessimistic about human-level AI timelines. While it makes sense to revise our estimate of AI algorithms downward, it doesn’t follow that we should do the same for our estimate of overall progress in AI. By cortical neuron count, systems like AlphaZero are at about the same level as a blackbird (albeit one that lives for 18 years),[7] so there’s a clear case for future advances being more impressive than current ones as we approach the human level. I’ve also given some reasons to think that level isn’t as high as the estimates Carey cites. However, we don’t have good data on how recent projects fit AI Moore’s Law. It could be that we’ve already diverged from the trend, as firms may be conservative about drastically changing their R&D budgets. There’s also a big question mark hovering over our current level of progress in the algorithms that power AI systems. Today’s techniques may prove completely unable to learn generally in more complex environments, though we shouldn’t assume they will.[8]

If AI Moore’s Law does continue, we’ll pass the 18 and 300-year human milestones in the next two years. I expect to see an 18-year-equivalent project in the next five, even if it slows down. After these milestones, we’ll have some level of hardware overhang[9] and be left waiting on algorithmic advances to get human-level AI systems. Governments and large firms will be able to compete to develop such systems, and costs will halve roughly every 4 years,[10] slowly widening the pool of actors. Eventually the relevant breakthroughs will be made. That they will likely be software rather than hardware should worry AI safety experts, as these will be harder to monitor and foresee.[11] And once software lets computers approach a human level in a given domain, we can quickly find ourselves completely outmatched. AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.


  1. Important to note that while Moore’s Law resulted in cheaper computers (while increasing the scale and complexity of the factories that make them), this doesn’t seem to be doing the same for AI chips. It’s possible that AI chips will also decrease in cost after attracting more R&D funding/becoming commercially available, but without a huge consumer market, it seems more likely that these firms will mostly have to eat the costs of their investments. ↩︎

  2. This assumes corporate bureaucracy will slow reallocation of resources, and could be wrong if firms prove willing to keep ratcheting up total R&D budgets. Both Amazon and Google are doing so at the moment. ↩︎

  3. Information about the cost and compute of AI projects since then would be very helpful for evaluating the continuation of the trend. ↩︎

  4. Cost and computation figures take AlphaGo Zero as the last available data point in the trend, since it’s the last AI system for which OpenAI has calculated compute. AlphaGo Zero was released in October 2017, but I’m plotting how things will go from now, March 2019, assuming that trends in cost and compute have continued. These estimates are therefore 1.5 years shorter than Carey’s, apart from our use of different estimates of the brain’s computation. ↩︎

  5. Moravec does his estimate by comparing the number of calculations machine vision software makes to the retina, and extrapolating to the size of the rest of the brain. This isn’t ideal, but at least it’s based on a comparison of machine and human capability, not simulation of a physical brain. Kurzweil cites Moravec’s estimate as well as a similar one by Lloyd Watts based on comparisons between the human auditory system and teleconferencing software, and finally one by the University of Texas replicating the functions of a small area of the cerebellum. These latter estimates come to 10^17 and 10^15 FLOPS for the brain. I know people are wary of Kurzweil, but he does seem to be on fairly solid ground here. ↩︎

  6. The 18-year milestone would be reached in under a year and the 300-year milestone in slightly over another. If the brain performs about 10^16 operations per second, 18 year’s worth would be roughly 10^25 FLOPS. AlphaGo Zero used about 10^23 FLOPS in October 2017 (1,000 Petaflop/s-days, 1 petaflop/s-day is roughly 10^20 ops). If the trend is holding, Compute is increasing roughly an order of magnitude per year. It’s worth noting that this would be roughly a $700M project in late 2019 (scaling AlphaZero up 100x and halving costs every 4 years), and something like $2-3B if hardware costs weren’t spread across multiple projects. Google has an R&D budget over $20B, so this is feasible, though significant. The AlphaGo Zero games milestone would take about 14 months more of AI Moore's Law to reach, or a few decades of cost decreases if it ends. ↩︎

  7. This is relative to 10^16 FLOPS estimates of the human brain’s computation and assuming computation is largely based on cortical neuron count - a blackbird would be at about 10^14 FLOPS by this measure. ↩︎

  8. An illustration of this point is found here, expressed by Richard Sutton, one of the inventors of reinforcement learning. He examines the history of AI breakthroughs and concludes that fairly simple search and learning algorithms have powered the most successful efforts, driven by increasing compute over time. Attempts to use models that take advantage of human expertise have largely failed. ↩︎

  9. This argument fails if the piece’s cited estimates of a human brain’s compute are too optimistic. If more than a couple extra orders of magnitude are needed to get brain-equivalent compute, we could be many decades away from having the necessary hardware. AI Moore’s Law can’t continue much longer than 2.5 years, so we’d have to wait for long-term trends in cost decreases to run more capable projects. ↩︎

  10. AI Impacts cost estimates, using the 10-16 year recent order of magnitude cost decreases. ↩︎

  11. If the final breakthroughs depend on software, we’re left with a wide range of possible human-level AI timelines - but one that likely precludes centuries in the future. We could theoretically be months away from such a system if current algorithms with more compute are sufficient. See this article, particularly the graphic on exponential computing growth. This completely violates my intuitions of AI progress but seems like a legitimate position. ↩︎

Comments12
Sorted by Click to highlight new comments since:

Thanks for the interesting post!

By cortical neuron count, systems like AlphaZero are at about the same level as a blackbird (albeit one that lives for 18 years)

That comparison makes me think AI algorithms need a lot of work, because blackbirds seem vastly more impressive to me than AlphaZero. Some reasons:

  1. Blackbirds can operate in the real world with a huge action space, rather than a simple toy world with a limited number of possible moves.
  2. Blackbirds don't need to play millions of rounds of games to figure things out. Indeed, they only have one shot to figure the most important things out or else they die. (One could argue that evolution has been playing millions/trillions/etc of rounds of the game over time, with most animals failing and dying, but it's questionable how much of that information can be transmitted to future generations through a limited number of genes.)
  3. Blackbirds seem to have "common sense" when solving problems, in the sense of figuring things out directly rather than stumbling upon them through huge amounts of trial and error. (This is similar to point 2.) Here's a random example of what I have in mind by common sense: "One researcher reported seeing a raven carry away a large block of frozen suet by using his beak to carve a circle around the entire chunk he wanted." Presumably the raven didn't have to randomly peck around on thousands of previous chunks of ice in order to discover how to do that.

Perhaps one could argue that if we have the hardware for it, relatively dumb trial and error can also get to AGI as long as it works, whether or not it has common sense. But this gets back to point #1: I'm skeptical that dumb trial and error of the type that works for AlphaZero would scale to a world as complex as a blackbird's. (Plus, we don't have realistic simulation environments in which to train such AIs.)

All of that said, I acknowledge there's a lot of uncertainty on these issues, and nobody really knows how long it will take to get the right algorithms.

My pleasure!

Yeah, I agree - I'd rather have a blackbird than AlphaZero. For one thing, it'd make our current level of progress in AI much clearer. But on your second and third points, I think of ML training as somewhat analogous to evolution, and the trained agent as analogous to an animal. Both the training process and evolution are basically blind but goal-directed processes with a ton of iterations (I'm bullish on evolution's ability to transmit information through generations) that result in well-adapted agents.

If that's the right analogy, then we can compare AlphaZero's superhuman board game abilities with a blackbird's subhuman-but-general performance. If we're not meaningfully compute-constrained, then the question is: what kinds of problems will we soon be able to train AI systems to solve? AI research might be one such problem. There are a lot of different training techniques out in the wild, and many of the more impressive recent developments have come from combining multiple techniques in novel ways (with lots of compute). That strikes me as the kind of search space that an AI system might be able to explore much faster than human teams.

DeepMind certainly seems to be saying that AlphaZero is better at searching a more limited set of promising moves than Stockfish, a traditional chess engine (unfortunately they don’t compare it to earlier versions of AlphaGo on this metric).

Only at test time. AlphaZero has much more experience gained from its training phase. (Stockfish has no training phase, though you could think of all of the human domain knowledge encoded in it as a form of "training".)

AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.

Humans are extremely poorly optimized for playing chess.

I don’t agree with Garfinkel that OpenAI’s analysis should make us more pessimistic about human-level AI timelines. While it makes sense to revise our estimate of AI algorithms downward, it doesn’t follow that we should do the same for our estimate of overall progress in AI. By cortical neuron count, systems like AlphaZero are at about the same level as a blackbird (albeit one that lives for 18 years),[7] so there’s a clear case for future advances being more impressive than current ones as we approach the human level.

Sounds like you are using a model where (our understanding of) current capabilities and rates of progress of AI are not very relevant for determining future capabilities, because we don't know the absolute quantitative capability corresponding to "human-level AI". Instead, you model it primarily on the absolute amount of compute needed.

Suppose you did know the absolute capability corresponding to "human-level AI", e.g. you can say something like "once we are able to solve Atari benchmarks using only 10k samples from the environment, we will have human-level AI", and you found that metric much more persuasive than the compute used by a human brain. Would you then agree with Garfinkel's point?

Thanks for the comment! In order:

I think that its performance at test time is one of the more relevant measures - I take grandmasters' considering fewer moves during a game as evidence that they've learned something more of the 'essence' of chess than AlphaZero, and I think AlphaZero's learning was similarly superior to Stockfish's relatively blind approach. Training time is also an important measure - but that's why Carey brings up the 300-year AlphaGo Zero milestone.

Indeed we are. And it's not clear to me that we're much better optimized for general cognition. We're extremely bad at doing math that pocket calculators have no problem with, yet it took us a while to build a good chess and Go-playing AI. I worry we have very little idea how hard different cognitive tasks will be to something with a brain-equivalent amount of compute.

I'm focusing on compute partly because it's the easiest to measure. My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive the results of major papers intuitively seem. And when AI can use something like the amount of compute a human brain has, we should eventually get a similar level of capability, so I think compute is a good yardstick.

I'm not sure I fully understand how the metric would work. For the Atari example, it seems clear to me that we could easily reach it without making a generalizable AI system, or vice versa. I'm not sure what metric could be appropriate - I think we'd have to know a lot more about intelligence. And I don't know if we'll need a completely different computing paradigm from ML to learn in a more general way. There might not be a relevant capability level for ML systems that would correspond to human-level AI.

But let's say that we could come up with a relevant metric. Then I'd agree with Garfinkel, as long as people in the community had known roughly the current state of AI in relation to it and the rate of advance toward it before the release of "AI and Compute".

Mostly agree with all of this; some nitpicks:

My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive the results of major papers intuitively seem.

I claim that this is not how I think about AI capabilities, and it is not how many AI researchers think about AI capabilities. For a particularly extreme example, the Go-explore paper out of Uber had a very nominally impressive result on Montezuma's Revenge, but much of the AI community didn't find it compelling because of the assumptions that their algorithm used.

I'm not sure I fully understand how the metric would work. For the Atari example, it seems clear to me that we could easily reach it without making a generalizable AI system, or vice versa.

Tbc, I definitely did not intend for that to be an actual metric.

But let's say that we could come up with a relevant metric. Then I'd agree with Garfinkel, as long as people in the community had known roughly the current state of AI in relation to it and the rate of advance toward it before the release of "AI and Compute".

I would say that I have a set of intuitions and impressions that function as a very weak prediction of what AI will look like in the future, along the lines of that sort of metric. I trust timelines based on extrapolation of progress using these intuitions more than timelines based solely on compute.To the extent that you hear timeline estimates from people like me who do this sort of "progress extrapolation" who also did not know about how compute has been scaling, you would want to lengthen their timeline estimates. I'm not sure how timeline predictions break down on this axis.

I claim that this is not how I think about AI capabilities, and it is not how many AI researchers think about AI capabilities. For a particularly extreme example, the Go-explore paper out of Uber had a very nominally impressive result on Montezuma's Revenge, but much of the AI community didn't find it compelling because of the assumptions that their algorithm used.

Sorry, I meant the results in light of which methods were used, implications for other research, etc. The sentence would better read, "My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive major papers seem."

Tbc, I definitely did not intend for that to be an actual metric.

Yeah, totally got that - I just think that making a relevant metric would be hard, and we'd have to know a lot that we don't know now, including whether current ML techniques can ever lead to AGI.

I would say that I have a set of intuitions and impressions that function as a very weak prediction of what AI will look like in the future, along the lines of that sort of metric. I trust timelines based on extrapolation of progress using these intuitions more than timelines based solely on compute.

Interesting. Yeah, I don't much trust my own intuitions on our current progress. I'd love to have a better understanding of how to evaluate the implications of new developments, but I really can't do much better than, "GPT-2 impressed me a lot more than AlphaStar." And just to be 100% clear - I tend to think that the necessary amount of compute is somewhere in the 18-to-300-year range. After we reach it, I'm stuck using my intuition to guess when we'll have the right algorithms to create AGI.

Nice post! I don't think we should assume that AI Moore's law would be capped by the regular Moore's law of total compute. If there is a new application of processors that is willing to pay a lot of money for a huge number of processors, I think we would build more chip fabs to keep up with demand. This does not necessarily accelerate the original Moore's law (transistors per chip), but it would accelerate total compute. This would be consistent with Robin Hanson's vision of doubling of value (and I believe ~compute) roughly every month. It's not even clear to me that the chips would be more expensive in such a scenario assuming we actually planned well for it, because generally we have learning where the cost per unit decreases with the cumulative production.

Thanks! Yeah, it might have been a bad idea to take general chip cost decreases as super relevant for specialized AI chips' cost efficiency. I read Carey's estimates for cost decreases as applying to AI chips, when upon closer inspection he was referring to general chips. Probably we'll see faster gains in AI chips' cost efficiency for a while as the low-hanging fruit is picked.

My point was something like, "Development costs to make AI chips will largely be borne by leading AI companies. If this is right, then they won't be able to take advantage of cheaper, better chips in the same way that consumers have with Moore's Law - i.e. passively benefiting from the results without investing their own capital into R&D". I didn't mean for it to sound like I was focusing on chip production capacity - I think cost efficiency is the key metric.

But I don't have a sense of how much money will be spent on development costs for a certain increase in chips' cost efficiency. It might be that early on, unit costs swamp development costs.

Frankly, I'm starting to think that my ideas about development costs may not be accurate. It looks like traditional chip companies are entering the AI chip business in force, although they could be 10% of the market or 90% for all I know. That could change things from the perspective of how much compute leading AI firms could afford to buy. This coupled with the aforementioned difference in cost efficiency rates between general chips and AI chips means I may have underestimated future increases in the cost efficiency of AI chips.

Without commenting on your wider message, I want to pick on two specific factual claims that you are making.

AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.

Training time of the final program is a deeply misleading metric, as these programs have been through endless reruns and tests to get the setup right. I think it is most honest to count total engineering time.

I know people are wary of Kurzweil, but he does seem to be on fairly solid ground here.

Extrapolating FLOPS is inherently fraught, as is the very idea of FLOPS being a useful unit. The problem is best illustrated by the following CS proverb: "A supercomputer is a device for turning computational complexity into communication complexity." In particular, estimates for the complexity of imitating a small, mostly separate, part of a brain don't linearly scale to estimates of imitating the much more interconnected whole.

I don't think I quite follow your criticism of FLOP/s; can you say more about why you think it's not a useful unit? It seems like you're saying that a linear extrapolation of FLOP/s isn't accurate to estimate the compute requirements of larger models. (I know there are a variety of criticisms that can be made, but I'm interested in better understanding your point above)

The issue is that FLOPS cannot accurately represent computing power across different computing architectures, in particular between single CPUs versus computing clusters. As an example, let's compare 1 computer of 100 MFLOPS with a cluster of 1000 computers of 1 MFLOPS each. The latter option has 10 times as many FLOPS, but there is a wide variety of computational problems in which the former will always be much faster. This means that FLOPS don't meaningfully tell you which option is better, it will always depend on how well the problem you want to solve maps onto your hardware.

In large-scale computing, the bottleneck is often the communication speed in the network. If the calculations you have to do don't neatly fall apart into roughly separate tasks, the different computers have to communicate a lot, which slows everything down. Adding more FLOPS (computers) won't prevent that in the slightest.

You can not extrapolate FLOPS estimates without justifying why the communication overhead doesn't make the estimated quantity meaningless on parallel hardware.

I remember looking into communication speed, but unfortunately I can't find the sources I found last time! As I recall, when I checked the communication figures weren't meaningfully different from processing speed figures.

Edit: found it! AI Impacts on TEPS (traversed edges per second): https://aiimpacts.org/brain-performance-in-teps/

Yeah, basically computers are closer in communication speed to a human brain than they are in processing speed. Which makes intuitive sense - they can transfer information at the speed of light, while brains are stuck sending chemical signals in many (all?) cases.

2nd edit: On your earlier point about training time vs. total engineering time..."Most honest" isn't really the issue. It's what you care about - training time illustrates that human-level performance can be quickly surpassed by an AI system's capabilities once it's built. Then the AI will keep improving, leaving us in the dust (although the applicability of current algorithms to more complex tasks is unclear). Total engineering time would show that these are massive projects which take time to develop...which is also true.

Curated and popular this week
Relevant opportunities