Here's a blogpost by Arjun Ramani and Zhengdong Wang* rounding up some reasons for skepticism about transformative AI or explosive economic growth arriving in the next few decades. It's pretty shallow, but most of the lines of thought in it are worth thinking through in my opinion:

#### A collection of the best technical, social, and economic arguments

Humans have a good track record of innovation. The mechanization of agriculture, steam engines, electricity, modern medicine, computers, and the internet—these technologies radically changed the world. Still, the trend growth rate of GDP per capita in the world's frontier economy has [never exceeded three percent per year](

It is of course possible for growth to accelerate.[^davidson] There was time before [growth began](, or at least when it was [far closer to zero]( But the fact that past game-changing technologies have yet to break the three percent threshold gives us a baseline. Only strong evidence should cause us to expect something hugely different.

Yet many people are optimistic that artificial intelligence is up to the job. AI is different from prior technologies, they say, because it is *generally capable*—able to perform a much wider range of tasks than previous technologies, including the process of innovation itself. Some think it could lead to a “[Moore’s Law for everything](”, or even risks on [on par with those of pandemics and nuclear war]( Sam Altman shocked investors when he said that OpenAI would become profitable by first inventing general AI, and then [asking it how to make money]( Demis Hassabis described DeepMind’s mission at Britain’s Royal Academy four years ago in two steps: “[1. Solve Intelligence. 2. Use it to solve everything else.](”

This order of operations has powerful appeal.

Should AI be set apart from other great inventions in history? Could it, as the great academics John Von Neumann and I.J. Good speculated, one day self-improve, cause an intelligence explosion, and lead to an economic growth singularity?

Neither this essay nor the economic growth literature rules out this possibility. Instead, our aim is to simply temper your expectations. We think AI can be “transformative” in the same way the internet was, raising productivity and changing habits. But many daunting hurdles lie on the way to the accelerating growth rates predicted by some.

In this essay we assemble the best arguments that we have encountered for why transformative AI is hard to achieve. To avoid lengthening an already long piece, we often refer to the original sources instead of reiterating their arguments in depth. We are far from the first to suggest these points. Our contribution is to organize a well-researched, multidisciplinary set of ideas others first advanced into a single integrated case. Here is a brief outline of our argument:

1. The transformational potential of AI is constrained by its hardest problems
2. Despite rapid progress in some AI subfields, major technical hurdles remain
3. Even if technical AI progress continues, social and economic hurdles may limit its impact

## 1. The transformative potential of AI is constrained by its hardest problems

Visions of transformative AI start with a system that is as good as or better than humans at all economically valuable tasks. A review from Harvard’s Carr Center for Human Rights Policy notes that many top AI labs [explicitly have this goal]( Yet measuring AI’s performance on a predetermined set of tasks is risky—what if real world impact requires doing tasks we are not even aware of?

Thus, we define transformative AI in terms of its observed economic impact. Productivity growth almost definitionally captures when a new technology efficiently performs useful work. A powerful AI could one day perform all productive cognitive and physical labor. If it could automate the process of innovation itself, [some economic growth models]( predict that GDP growth would not just break three percent per capita per year—it would accelerate.

Such a world is hard to achieve. As the economist William Baumol [first]( [noted]( in the 1960s, productivity growth that is unbalanced may be constrained by the weakest sector. To illustrate this, consider a simple economy with two sectors, writing think-pieces and constructing buildings. Imagine that AI speeds up writing but not construction. Productivity increases and the economy grows. However, a think-piece is not a good substitute for a new building. So if the economy still demands what AI does not improve, like construction, those sectors become relatively more valuable and eat into the gains from writing. A 100x boost to writing speed may only lead to a 2x boost to the size of the economy.[^elasticity]

This toy example is not all that different from the broad pattern of productivity growth over the past several decades. Eric Helland and Alex Tabarrok wield Baumol in their book [*Why Are the Prices So Damn High?*]( to explain how technology has boosted the productivity of sectors like manufacturing and agriculture, driving down the relative price of their outputs, like TVs and food, and raising average wages. Yet TVs and food are not good substitutes for labor-intensive services like healthcare and education. Such services have remained important, just like constructing buildings, but have proven hard to make more efficient. So their relative prices have grown, taking up a larger share of our income and weighing on growth. [Acemoglu, Autor, and Patterson]( confirm using historical US economic data that uneven innovation across sectors has indeed slowed down aggregate productivity growth.[^bottlenecking]

      <img src="" width="900">
      <figcaption>The Baumol effect, visualized. <a href="">American Enterprise Institute</a> (2022)</figcaption>

[Aghion, Jones, and Jones]( explain that the production of ideas itself has steps which are vulnerable to bottlenecks.[^gustafson] Automating *most* tasks has very different effects on growth than automating *all* tasks:

> *...economic growth may be constrained not by what we do well but rather by what is essential and yet hard to improve... When applied to a model in which AI automates the production of ideas, these same considerations can prevent explosive growth.*

Consider a two-step innovation process that consists of summarizing papers on arXiv and pipetting fluids into test tubes. Each step depends on the other. Even if AI automates summarizing papers, humans would still have to pipette fluids to write the next paper. (And in the real world, we would also need to wait for the IRB to approve our grants.) In “[What if we could automate invention](,” Matt Clancy provides a final dose of intuition:

> *Invention has started to resemble a class project where each student is responsible for a different part of the project and the teacher won’t let anyone leave until everyone is done... if we cannot automate everything, then the results are quite different. We don’t get acceleration at merely a slower rate*—*we get no acceleration at all.*

Our point is that the idea of bottlenecking—featured everywhere from Baumol in the sixties to Matt Clancy today—deserves more airtime.[^clancy] It makes clear why the hurdles to AI progress are *stronger together than they are apart*. AI must transform all essential economic sectors and steps of the innovation process, not just some of them. Otherwise, the chance that we should view AI as similar to past inventions goes up.

Perhaps the discourse has lacked specific illustrations of hard-to-improve steps in production and innovation. Fortunately many examples exist.

## 2. Despite rapid progress in some AI subfields, major technical hurdles remain

**Progress in fine motor control has hugely lagged progress in neural language models.** [Robotics workshops ponder]( what to do when "just a few cubicles away, progress in generative modeling feels qualitatively even more impressive." [Moravec's paradox]( and [Steven Pinker's 1994 observation]( remain relevant: "The main lesson of thirty-five years of AI research is that the hard problems are easy and the easy problems are hard." The hardest "easy" problems, like tying one's shoelaces, remain. Do breakthroughs in robotics easily follow those in generative modeling? That OpenAI [disbanded its robotics team]( is not a strong signal.

It seems highly unlikely to us that growth could greatly accelerate without progress in manipulating the physical world. Many current economic bottlenecks, from housing and healthcare to manufacturing and transportation all have a sizable physical-world component.

**The list of open research problems relevant to transformative AI continues.** Learning a causal model is one. [Ortega et al.]( show a naive case where a sequence model that takes actions can experience delusions without access to a causal model.[^causality] Embodiment is another. [Murray Shanahan views]( cognition and having a body as inseparable: cognition exists for the body to survive and thrive, continually adjusts within a body's sensorimotor loop, and is itself founded in physical affordances. Watching LeBron James on the court, we are inclined to agree. [François Chollet believes]( efficiency is central, since "unlimited priors or experience can produce systems with little-to-no generalization power." [Cremer and Whittlestone]( list even more problems on which technical experts do not agree.

More resources are not guaranteed to help. Ari Allyn-Feuer and Ted Sanders suggest in "[Transformative AGI by 2043 is <1% likely](" that walking and wriggling (neurological simulation of worms) are simple but still intractable indicator tasks: "And while worms are not a large market... we’ve comprehensively failed to make AI walkers, AI drivers, or AI radiologists despite massive effort. This must be taken as a bearish signal."

We may not need to solve some or even all of these open problems. And we could certainly make more breakthroughs (one of us is directly working on some of these problems). But equally, we cannot yet definitively dismiss them, thus adding to our bottlenecks. Until AI gains these missing capabilities, some of which even children have, it may be better to view them as tools that imitate and transmit culture, rather than as general intelligences, as [Yiu, Kosoy, and Gopnik]( propose.

**Current methods may also not be enough.** Their limits may soon be upon us. Scaling compute another order of magnitude would require [hundreds of billions of dollars]( more spending on hardware. According to SemiAnalysis: "This is not practical, and it is also likely that models cannot scale to this scale, given current error rates and quantization estimates." The continued falling cost of computation could help. But we may have exhausted the low-hanging fruit in hardware optimization and are [now entering an era of deceleration]( Moore's Law has [persisted under various guises](, but the critical factor for transformative AI may be [whether we will reach it before Moore's Law stops](

Next look at data. [Villalobos et al.]( warns that high quality language data may run out by 2026. The team suggests data efficiency and synthetic data as ways out, but so far these are far from complete solutions as [Shumailov et al.]( shows.

In algorithms, our understanding of what current architectures can and *cannot* do is improving. [Delétang et al.]( [Dziri et al.]( identify particularly hard problems for the Transformer architecture. Some say that so-called emergent abilities of large language models could still surprise us. Not necessarily. [Schaeffer et al.]( argues that emergence appears "due the researcher’s choice of metric rather than due to fundamental changes in model behavior with scale." We must be careful when making claims about the irregularity of future capabilities. It is telling that OpenAI [will not train GPT-5 for some time]( Perhaps they realize that good old-fashioned human tinkering is more appetizing than a free lunch of scale.

      <img src="" width="900">
      <figcaption>Scaling up would be expensive. SemiAnalysis, "<a href="">The AI Brick Wall</a>—<a href="">A Practical Limit For Scaling Dense Transformer Models, and How GPT 4 Will Break Past It</a>" (2023)</figcaption>

**Humans remain a limiting factor in development.** Human feedback makes AI outputs more helpful. Insofar as AI development requires human input, humans will constrain productivity. [Millions of humans]( currently annotate data to train models. Their humanity, especially their expert knowledge and creative spark, becomes more valuable by the day. *The Verge* reports: "One engineer told me about buying examples of Socratic dialogues for up to $300 a pop."

That is unlikely to change anytime soon. Geoffrey Irving and Amanda Askell [advocate for a bigger role for humans]( "Since we are trying to behave in accord with people’s values, the most important data will be data from humans about their values." Constitutional AI, a state-of-the-art alignment technique that has even [reached the steps of Capitol Hill](, also [does not aim to remove humans]( from the process at all: "rather than removing human supervision, in the longer term our goal is to make human supervision as efficacious as possible." Even longer-term scalable alignment proposals, such as [running AI debates with human judges](, entrenches rather than removes human experts. Both technical experts and the public seem to want to keep humans in the loop.

      <img src="" width="900">
      <figcaption>Intelligence, embodied. Source: Morri Gash, AP.</figcaption>

**A big share of human knowledge is tacit, unrecorded, and diffuse.** As [Friedrich Hayek declared](, "To assume all the knowledge to be given to a single mind... is to assume the problem away and to disregard everything that is important and significant in the real world." [Michael Polanyi argued]( "that we can know more than we can tell." [Carlo Ginzburg concurred]( "Nobody learns how to be a connoisseur or a diagnostician simply by applying the rules. With this kind of knowledge there are factors in play which cannot be measured: a whiff, a glance, an intuition." Finally, [Dan Wang, concretely](

> *Process knowledge is the kind of knowledge that’s hard to write down as an instruction. You can give someone a well-equipped kitchen and an extraordinarily detailed recipe, but unless he already has some cooking experience, we shouldn’t expect him to prepare a great dish.*

Ilya Sutskever [recently suggested asking an AI]( "What would a person with great insight, wisdom, and capability do?" to surpass human performance. Tacit knowledge is why we think this is unlikely to work out-of-the-box in many important settings. It is why we may need to deploy AI in the real world where it can learn-by-doing. Yet it is hard for us to imagine this happening in several cases, especially high-stakes ones like running a multinational firm or [teaching a child to swim](

We are constantly surprised in our day jobs as a journalist and AI researcher by how many questions do not have good answers on the internet or in books, but where *some* expert has a solid answer that they had not bothered to record. And in some cases, as with a master chef or LeBron James, they may not even be capable of making legible how they do what they do.

The idea that diffuse tacit knowledge is pervasive supports the hypothesis that there are [diminishing]( [returns]( to pure, centralized, cerebral intelligence. Some problems, like [escaping game-theoretic quagmires]( or [predicting the future](, might be just too hard for brains alone, whether biological or artificial.

**We could be headed off in the wrong direction altogether.** If even some of our hurdles prove insurmountable, then we may be far from the critical path to AI that can do all that humans can. Melanie Mitchell quotes Stuart Dreyfus in "[Why AI is Harder Than We Think](": “It was like claiming that the first monkey that climbed a tree was making progress towards landing on the moon.”

We still struggle to concretely specify what we are trying to build. We have little understanding of the nature of intelligence or humanity. Relevant philosophical problems, such as the grounds of moral status, qualia, and personal identity, have stumped humans for thousands of years. Just days before this writing, neuroscientist Christof Koch [lost a quarter-century bet]( to philosopher David Chalmers that we would have discovered how the brain achieves consciousness by now.

Thus, we are throwing dice into the dark, betting on our best hunches, which some believe produce only [stochastic parrots]( Of course, these hunches are still worth pursuing; Matt Botvinick explores in depth [what current progress can tell us about ourselves]( But our lack of understanding should again moderate our expectations. In a [prescient opinion a decade ago](, David Deutsch stressed the importance of specifying the exact functionality we want:

> *The very term "AGI" is an example of one such rationalization, for the field used to be called "AI"*— *artificial intelligence. But AI was gradually appropriated to describe all sorts of unrelated computer programs such as game players, search engines and chatbots, until the G for "general" was added to make it possible to refer to the real thing again, but now with the implication that an AGI is just a smarter species of chatbot.*

A decade ago!

## 3. Even if technical AI progress continues, social and economic hurdles may limit its impact

**The history of economic transformation is one of contingency.** Many factors must come together all at once, rather than one factor outweighing all else. Individual technologies only matter to the extent that institutions permit their adoption, incentivize their widespread deployment, and allow for broad-scale social reorganization around the new technology.

A whole subfield studies the Great Divergence, how Europe overcame pre-modern growth constraints. Technological progress is just one factor. Kenneth Pommeranz, in his influential [eponymous book](, argues also for luck, including a stockpile of coal and convenient geography. Taisu Zhang emphasizes social hierarchies in [*The Laws and Economics of Confucianism*]( Jürgen Osterhammel in [*The Transformation of the World*]( attributes growth in the 19th century to mobility, imperial systems, networks, and much more beyond mere industrialization: "it would be unduly reductionist to present [the organization of production and the creation of wealth] as independent variables and as the only sources of dynamism propelling the age as a whole... it is time to decenter the Industrial Revolution."

All agree that history is not inevitable. We think this applies to AI as well. Just as we should be skeptical of a Great Man theory of history, we should not be so quick to jump to a Great Technology theory of growth with AI.

And important factors may not be on AI’s side. Major drivers of growth, including [demographics]( and [globalization](, are going backwards. AI progress may even be [accelerating the decoupling]( of the US and China, reducing the flow of people and ideas.

**AI may not be able to automate precisely the sectors most in need of automation.** We already “know” how to overcome many major constraints to growth, and have the technology to do so. Yet social and political barriers slow down technology adoption, and sometimes halt it entirely. The same could happen with AI.

[Comin and Mestieri]( observe that cross-country variation in the intensity of use for new technologies explains a large portion of the variation in incomes in the twentieth century. Despite the [dream in 1954]( that nuclear power would cause electricity to be "too cheap to meter," nuclear's share of global primary energy consumption [has been stagnant since the 90s]( Commercial supersonic flight is [outright banned in US airspace]( Callum Williams [provides more visceral examples](

> *Train drivers on London’s publicly run Underground network are paid close to twice the national median, even though the technology to partially or wholly replace them has existed for decades. Government agencies require you to fill in paper forms providing your personal information again and again. In San Francisco, the global center of the AI surge, real-life cops are still employed to direct traffic during rush hour.*

      <img src="" width="900">
      <figcaption>King Charles operating the London tube. Source: <a href="">The Independent</a></figcaption>

Marc Andreessen, hardly a techno-pessimist, [puts it bluntly]( “I don’t even think the standard arguments are needed... AI is already illegal for most of the economy, and will be for virtually all of the economy. How do I know that? Because technology is already illegal in most of the economy, and that is becoming steadily more true over time.” [Matt Yglesias]( and [Eli Dourado]( are skeptical that AI will lead to a growth revolution, pointing to regulation and complex physical processes in sectors including housing, energy, transportation, and healthcare. These happen to be our current growth bottlenecks, and together they [make up over a third of US GDP](

AI may even decrease productivity. One of its current largest use cases, recommender systems for social media, is [hardly a productivity windfall]( [Callum Williams continues](

> *GPT-4 is a godsend for a NIMBY facing a planning application. In five minutes he can produce a well written 1,000-page objection. Someone then has to respond to it... lawyers will multiply. "In the 1970s you could do a multi-million-dollar deal on 15 pages because retyping was a pain in the ass," says Preston Byrne of Brown Rudnick, a law firm. "AI will allow us to cover the 1,000 most likely edge cases in the first draft and then the parties will argue over it for weeks."*

**Automation alone is not enough for transformative economic growth.** History is littered with so-so technologies that have had little transformative impact, as Daron Acemoglu and Simon Johnson note in their new book [*Power and Progress*]( Fast-food kiosks are hardly a game-changer compared to human employees. Nobel laureate Robert Fogel documented that in the same way, [railroads had little impact]( on growth because they were only a bit better than their substitutes, canals and roads. Many immediate applications of large language models, from customer service to writing marketing copy, appear similar.[^brynjolfsson]

OpenAI’s [own economists estimate]( that about "19% of jobs have at least 50% of their tasks exposed" to GPT-4 and the various applications that may be built upon it. Some view this as game-changing. We would reframe it. That means over 80% of workers would have less than 50% of their tasks affected, hardly close to full automation. And their methodology suggests that areas where reliability is essential will remain unaffected for some time.

      <img src="" width="900">
      <figcaption>The long tail. James Bridle, “<a href="">Autonomous trap 001</a>” (2017)</figcaption>

It is telling that though the investment services sector is digitized, data is ubiquitous, and many individual tasks are automated, [overall employment has increased]( Similarly, despite [predictions that AI will replace radiologists]( (Hinton: "stop training radiologists now"), radiology job postings [hit a record high in 2021]( and is projected to increase even more. Allyn-Feuer and Sanders [reviewed 31 predictions]( of self-driving by industry insiders since 1960. The 27 resolved predictions were all wrong. Eight were by Elon Musk. In all these cases, AI faces the challenge of automating the “long tail” of tasks that are not present in the training data, not always legible, or too high-stakes to deploy.

**A big share of the economy may already consist of producing output that is profoundly social in nature.** Even if AI can automate all production, we must still decide what to produce, which is a social process. As [Hayek once implied](, central planning is hard not only because of its computational cost, but also due to a "lack of *access* to information... the information does not exist." A possible implication is that humans must actively participant in business, politics, and society to determine how they want society to look.

Education may be largely about motivating students, and teaching them to [interact socially](, rather than just transmitting facts. Much of the value of art comes from its [social context]( Healthcare combines emotional support with more functional diagnoses and prescriptions. Superhuman AI can hardly claim full credit for [the]( [resurgence]( [of]( [chess]( And business is about framing goals and negotiating with, managing, and motivating humans. Maybe our jobs today are already not that different from figuring out what prompts to ask and how to ask them.

There is a deeper point here. GDP is a made-up measure of how much some humans value what others produce, a big chunk of which involves doing social things amongst each other. [As one of us recently wrote](, we may value human-produced outputs precisely because they are scarce. As long as AI-produced outputs cannot substitute for that which is social, and therefore scarce, such outputs will command a growing “human premium”, and produce Baumol-style effects that weigh on growth.

## How should we consider AI in light of these hurdles?

AI progress is bound to continue and we are only starting to feel its impacts. We are hopeful for further breakthroughs from more reliable algorithms to better policy. AI has certainly surprised us before.

Yet as this essay has outlined, myriad hurdles stand in the way of widespread transformative impact. These hurdles should be viewed *collectively*. Solving a subset may not be enough. Solving them all is a combinatorially harder problem. Until then, we cannot look to AI to clear hurdles we do not know how to clear ourselves. We should also not take future breakthroughs as guaranteed—we may get them tomorrow, or not for a very long time.

The most common reply we have heard to our arguments is that AI research itself could soon be automated. AI progress would then explode, begetting a powerful intelligence that would solve the other hurdles we have laid out.

But that is a narrow path to tread. Though AI research has made remarkable strides of late, many of our hurdles to transformation at large apply to the process of automating AI research itself. And even if we develop highly-intelligent machines, that is hardly all that is needed to automate the entirety of research and development, let alone the entire economy. To build an intelligence that can solve everything else, we may need to solve that same everything else in the first place.

So the case that AI will be an invention elevated far above the rest is not closed. Perhaps we should best think of it as a "prosaic" history-altering technology, one that catalyzes growth on the order of great inventions that have come before. We return to the excellent [Aghion, Jones, and Jones](

> *…we model A.I. as the latest form in a process of automation that has been ongoing for at least 200 years. From the spinning jenny to the steam engine to electricity to computer chips, the automation of aspects of production has been a key feature of economic growth since the Industrial Revolution.*

Recall, the steam engine is general, too. You may not think it is as general as a large language model. But one can imagine how turning (the then infinite) bits of coal into energy would [prompt a nineteenth century industrialist]( to flirt with the end of history.

The steam engine certainly increased growth and made the world an unrecognizable place. We want to stress that AI ending up like the steam engine, rather than qualitatively surpassing it, is still an important and exciting outcome! What then to make of AI?

**The most salient risks of AI are likely to be those of a prosaic powerful technology.** Scenarios where AI grows to an autonomous, uncontrollable, and incomprehensible existential threat must clear the same difficult hurdles an economic transformation must. Thus, we believe AI's most pressing harms are those that already exist or are likely in the near future, such as bias and misuse.

**Do not over-index future expectations of growth on progress in one domain.** The theory of bottlenecks suggests casting a wide net, tracking progress across many domains of innovation, not just progress in AI's star subfield. Markets agree. If transformative AI were coming soon, real interest rates would rise in line with expectations of great future wealth or risk. Yet [Chow, Halperin, and Mazlish]( test exactly this theory and find that 10-, 30-, and 50-year real interest rates are low.

      <img src="" width="900">
      <figcaption>Short bonds now if we are wrong. Chow, Trevor, Basil Halperin and J. Zachary Mazlish. “<a href="">AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years.</a>” (2023)</figcaption>

**Accordingly, invest in the hardest problems across innovation and society.** Pause before jumping to the most flashy recent development in AI. From technical research challenges currently not in vogue to the puzzles of human relations that have persisted for generations, broad swaths of society will require first-rate human ingenuity to realize the promise of AI.

*The authors: [Arjun Ramani]( is the global business and economics correspondent at The Economist. [Zhengdong Wang]( is a research engineer at Google DeepMind. Views our own and not those of our employers.*

*We thank Hugh Zhang, Will Arnesen, Mike Webb, Basil Halperin, Tom McGrath, and Nathalie Bussemaker for reading drafts, and many others for helpful discussions.*

[^davidson]: See [the discussion]( between Tom Davidson and Ben Jones in Jones’s review of [Davidson](, which we note is probably the best countercase to our arguments.

[^elasticity]: Baumol and the economist William Bowen famously used the Schubert string quartet as an example of a stagnant sector for which labor remained essential. The exact numbers for how much growth in one sector contributes to aggregate growth depends on the price and income elasticities of demand for the outputs of both sectors.

[^bottlenecking]: Acemoglu, Autor, and Patterson provide numerous historical examples of bottlenecking: “breakthroughs in automotive technology cannot be achieved solely with improvements in engine management software and safety sensors, but will also require complementary improvements in energy storage, drivetrains, and tire adhesion... when some of those innovations, say batteries, do not keep pace with the rest, we may simultaneously observe rapid technological progress in a subset of inputs and yet slow productivity growth in the aggregate.”

[^gustafson]: A similar concept exists in computer science. [Gustafson's law]( shows that sequential tasks set theoretical limits on the gains from parallel computing (though one can of course use the leftover parallelizable resources to complete other tasks).

[^causality]: A caveat here. The problem arises when sequence models condition on their past outputs to produce new outputs, such as imitating an expert policy when learning from offline data. Online reinforcement learning treats agent actions as causal interventions and is one way to solve this problem. But this returns us to the sample efficiency and deployment challenges of reinforcement learning. At the moment it seems we can fully reap the benefits of only one of massive pre-training and interactive decision-making. Integrating causality into AI research is a project [whose champions include Yoshua Bengio](

[^brynjolfsson]: Erik Brynjolfsson makes a related point in his essay “[The Turing Trap](” If the ancient Greeks had built automatons that could perform all work-related tasks from herding sheep to making clay pottery, labor productivity would certainly go up. But the living standards of ancient Greeks would still be nowhere where they are today. That requires technology that can perform tasks humans were never able to do in the first place. It is certainly plausible that AI does this (in fact it already has in some domains). But the thought experiment does suggest that producing “human-like” AI may not by itself radically boost productivity growth.

[^clancy]: Matt Clancy most recently discusses the idea of bottlenecking in a [debate]( with Tamay Besiroglu in *Asterisk Magazine*.





More posts like this

Sorted by Click to highlight new comments since: Today at 5:39 PM

The article doesn't seem to have a comment section so I'm putting some thoughts here. 

  • Economic growth: I don't feel I know enough about historical economic growth to comment on how much to weigh the "the trend growth rate of GDP per capita in the world's frontier economy has never exceeded three percent per year." I'll note that I think the framing here is quite different than that of Christiano's Hyperbolic Growth, despite them looking at roughly the same data as far as I can tell. 
  • Scaling current methods: the article seems to cherrypick the evidence pretty significantly and makes the weak claim that "Current methods may also not be enough." It is obvious that my subjective probability that current methods are enough should be <1, but I have yet to come across arguments that push that credence below say 50%. 
    • "Scaling compute another order of magnitude would require hundreds of billions of dollars more spending on hardware." This is straightforwardly false. The table included in the article, from the Chinchilla paper with additions, is a bit confusing because it doesn't include where we are now, and because it lists only model size rather than total training compute (FLOP). Based on Epoch's database of models, PaLM 2 is trained with about 7.34e24 FLOP, and GPT-4 is estimated at 2.10e25 (note these are not official numbers). This corresponds to being around the 280B param (9.9e24 FLOP) or 520B param (3.43e25 FLOP) rows in the table. In this range, tens of millions of dollars are being spent on compute for the biggest training runs now. It should be obvious that you can get a couple more orders of magnitude more compute before hitting hundreds of billions of dollars. In fact, the 10 Trillion param row in the table, listed at $28 billion, corresponds to a total training compute of 1.3e28 FLOP, which is more than 2 orders of magnitude above the biggest publicly-known models are estimated. I agree that cost may soon become a limiting factor, but the claim that an order of magnitude would push us into hundreds of billions is clearly wrong given that currently costs are tens of millions. 
    • Re cherrypicking data, I guess one of the most important points that seems to be missing from this section is the rate of algorithmic improvement. I would point to Epoch's work here. 
  • "Constitutional AI, a state-of-the-art alignment technique that has even reached the steps of Capitol Hill, also does not aim to remove humans from the process at all: "rather than removing human supervision, in the longer term our goal is to make human supervision as efficacious as possible."" This seems to me like a misunderstanding of Constitutional AI, for which a main component is "RL from AI Feedback." Constitutional AI is all about removing humans from the loop in order to get high quality data more efficiently. There's a politics thing where developers don't want to say they're removing human supervision, and it's also true that human supervision will probably play a role in data generation in the future, but the human:total (AI+human) contribution to data ratio is surely going to go down. For example research using AIs where we used to use humans, see also Anthropic's paper Model Written Evaluations, and the AI-labeled MACHIAVELLI benchmark. More generally, I would bet the trend toward automating datasets and benchmarks will continue, even if humans remain in the loop somewhat; insofar as humans are a limiting factor, developers will try to make them less necessary, and we already have AIs that perform very similarly to human raters at some tasks. 
  • "We are constantly surprised in our day jobs as a journalist and AI researcher by how many questions do not have good answers on the internet or in books, but where some expert has a solid answer that they had not bothered to record. And in some cases, as with a master chef or LeBron James, they may not even be capable of making legible how they do what they do." Not a disagreement, but I do wonder how much of this is a result of information being diffuse and just hard to properly find, a kind of task I expect AIs to be good at. For instance, 2025 language models equipped with search might be similarly useful to if you had a panel of relevant experts you could ask questions to. 
  • Noting that section 3: "Even if technical AI progress continues, social and economic hurdles may limit its impact" matters for some outcomes and not for others. It matters given the authors define "transformative AI in terms of its observed economic impact." It matters for many outcomes I care about like human well-being, that are related to economic impacts. It applies less to worries around existential risk and human disempowerment, for which powerful AIs may pose risks even while not causing large economic impacts ahead of time (e.g., bioterrorism doesn't require first creating a bunch of economic growth). 
    • Overall I think the claim of section 3 is likely to be right. A point pushing the other direction is that there may be a regulatory race to the bottom where countries want to enable local economic growth from AI and so relax regulations, think medical tourism for all kinds of services. 
  • "Yet as this essay has outlined, myriad hurdles stand in the way of widespread transformative impact. These hurdles should be viewed collectively. Solving a subset may not be enough." I definitely don't find the hurdles discussed here to be sufficient to make this claim. It feels like there's a motte and bailey, where the easy to defend claim is "these 3+ hurdles might exist, and we don't have enough evidence to discount any of them", and the harder to defend claim is "these hurdles disjunctively prevent transformative AI in the short term, so all of them must be conquered to get such AI." I expect this shift isn't intended by the authors, but I'm noting that I think it's a leap. 
  • "Scenarios where AI grows to an autonomous, uncontrollable, and incomprehensible existential threat must clear the same difficult hurdles an economic transformation must." I don't think this is the case. For example, section 3 seems to not apply as I mentioned earlier. I think it's worth noting that AI safety researcher Eliezer Yudkowsky has made a similar argument to what you make in section 3, and he is also thinks existential catastrophe in the near term is likely. I think the point your making here is directionally right, however, that AI which poses existential risk is likely to be transformative in the sense you're describing. That is, it's not necessary for such AI to be economically transformative, and there are a couple other ways catastrophically-dangerous AI can bypass the hurdles you lay out, but I think it's overall a good bet that existentially dangerous AIs are also capable of being economically transformative, so the general picture of hurdles, insofar as they are real, will affect such risks as well [I could easily see myself changing my mind about this with more thought]. I welcome more discussion on this point and have some thoughts myself, but I'm tired and won't include them in this comment; happy to chat privately about where "economically transformative" and "capable of posing catastrophic risks" lie on various spectrums. 

While my comment has been negative and focused on criticism, I am quite glad this article was written. Feel free to check out a piece I wrote, laying out some of my thinking around powerful AI coming soon, which is mostly orthogonal to this article. This comment was written sloppily, partially as my off-the-cuff notes while reading, sorry for any mistakes and impolite tone. 

Hey Aaron, thanks for your thorough comment. While we still disagree (explained a bit below), I'm also quite glad to read your comment :)

Re scaling current methods: The hundreds of billions figure we quoted does require more context not in our piece; SemiAnalysis explains in a bit more detail how they get to that number (eg assuming training in 3mo instead of 2 years). We don't want to haggle over the exact scale before it becomes infeasible, though---even if we get another 2 OOM in, we wanted to emphasize with our argument that 'the current method route' 1) requires regular scientific breakthroughs of the pre-TAI sort, and 2) even if we get there doesn't guarantee capabilities that look like magic compared to what we have now, depending on how much you believe in emergence. Both would be bottlenecks. We're pretty sure that current capabilities can be economically useful with more people, more fine-tuning. Just skeptical of the sudden emergence of the exact capabilities we need for transformative growth.

On Epoch's work on algorithmic progress specifically, we think it's important to note that:

1) They do this by measuring progress on computer vision benchmarks, which isn't a good indicator of progress in either algorithms for control (physical world important for TAI) or even language---it might be cheeky to say, little algorithmic progress there; just scale ;) Computer vision is also the exact example Schaeffer et al. gives for the subfield where emergent abilities do not arise---until you induce them by intentionally crafting the evaluations.

2) That there even is a well-defined benchmark is a good sign for beating that benchmark. AI benefits from quantifiable evaluation (beating a world champion, CASP scores) when it measures what we want. But we'd say for really powerful AI we don't know what we want (see our wrong direction / philosophy hurdle), plus at some point the quantifiable metrics we do have stop measuring what we really want. (Is there really a difference between models that get 91.0 and 91.1 top-1 accuracy on ImageNet? Do people really look at MMLU over qualitative experience when they choose which language model to play with?)

3) We don't discount algorithmic progress at all! In fact we cite SemiAnalysis and the Epoch team's suggestions on where to research next. But again, these require human breakthroughs, bottlenecked on human research timescales---we don't have a step by step progress we can just follow to improve a metric to TAI, so hard-won past breakthroughs doesn't guarantee future ones happen at the same clip.

Re Constitutional AI: We agree that researchers will continue searching for ways to use human feedback more efficiently. But under our Baumol framework, the important step is going from one to zero, not n to one. And there we find it hard to believe that in high stakes situations (say, judging AI debates), that safety researchers are willing to hand over the reins. We'd also really contest the 'perform very similarly to human raters' is enough---it'd be surprising if we already have a free lunch, no information lost, way to simulate humans well enough to make better AI.

Re 2025 language models equipped with search: For this to be as useful as a panel of experts, the models need to be searching an index where what the experts know is recorded, in some sense, which 1) doesn't happen (experts are busy being experts) 2) is sometimes impossible (chef, LeBron) 3) maybe less likely in the future when an LLM is going to just hoover up your hard won expertise? I know you mentioned you don't disagree with our point here though.

Re motte and bailey: We agree that our hurdles may have overlap. But the point of our Baumol framework is that any valid hurdle, where we don't know if it's fundamentally the same problem that causes other hurdles, each has the potential to bottleneck transformative growth. And we allude to several cases where for one reason or another a promising invention did not meet expectations precisely because they could not clear them all.

Hope this clarifies our view. Not conclusive, of course, we're happy, like your piece, to also be going for intuition pumps to temper expectations.

Thanks for your response. I'll just respond to a couple things. 

Re Constitutional AI: I agree normatively that it seems bad to hand over judging AI debates to AIs[1]. I also think this will happen. To quote from the original AI Safety via Debate paper, 

Human time is expensive: We may lack enough human time to judge every debate, which we can address by training ML models to predict human reward as in Christiano et al. [2017]. Most debates can be judged by the reward predictor rather than by the humans themselves. Critically, the reward predictors do not need to be as smart as the agents by our assumption that judging debates is easier than debating, so they can be trained with less data. We can measure how closely a reward predictor matches a human by showing the same debate to both.


We'd also really contest the 'perform very similarly to human raters' is enough---it'd be surprising if we already have a free lunch, no information lost, way to simulate humans well enough to make better AI. 

I also find this surprising, or at least I did the first 3 times I came across medium-quality evidence pointing this direction. I don't find it as surprising any more because I've updated my understanding of the world to "welp, I guess 2023 AIs actually are that good on some tasks." Rather than making arguments to try and convince you, I'll just link some of the evidence that I have found compelling, maybe you will too, maybe not: Model Written Evals, MACHIAVELLI benchmark, Alpaca (maybe the most significant for my thinking), this database, Constitutional AI

I'm far from certain that this trend, of LLMs being useful for making better LLMs and for replacing human feedback, continues rather than hitting a wall in the next 2 years, but it does seem more likely than not to me, based on my read of the evidence. Some important decisions in my life rely on how soon this AI stuff is happening (for instance if we have 20+ years I should probably aim to do policy work), so I'm pretty interested in having correct views. Currently, LLMs improving the next generation of AIs via more and better training data is one of the key factors in how I'm thinking about this. If you don't find these particular evidences compelling and are able to explain why, that would be useful to me! 

  1. ^

    I'm actually unsure here. I expect there are some times where it's fine to have no humans in the loop and other times where it's critical. It generally gives me the ick to take humans out of the loop, but I expect there are some times where I would think it's correct. 

Makes sense that this would be a big factor in what to do with our time, and AI timelines. And we're surprised too by how AI can overperform expectations, like in the sources you cited.

We'd still say the best way of characterizing the problem of creating synthetic data is that it's a wide open problem, rather than high confidence that naive approaches using current LMs will just work. How about a general intuition instead of parsing individual sources. We wouldn't expect making the dataset bigger by just repeating the same example over and over to work. We generate data by having 'models' of the original data generators, humans. If we knew what exactly made human data 'good,' we could optimize directly for it and simplify massively (this runs into the well-defined eval problem again---we can craft datasets to beat benchmarks of course).

An analogy (a disputed one, to be fair) is Ted Chiang's lossy compression. So for every case of synthetic data working, there's also cases where it fails, like Shumailov et el. we cited. If we knew exactly what made human data 'good,' we'd argue you wouldn't see labs continue to ramp up hiring contractors specifically to generate high-quality data in expert domains, like programming.

A fun exercise---take a very small open-source dataset, train your own very small LM, and have it augment (double!) its own dataset. Try different prompts, plot n-gram distributions vs the original data. Can you get one behavior out of the next generation that looks like magic compared to the previous, or does improvement plateau? May have nitpicks with this experiment, but I don't think it's that different to what's happening at large scale.

Re scaling current methods: The hundreds of billions figure we quoted does require more context not in our piece; SemiAnalysis explains in a bit more detail how they get to that number (eg assuming training in 3mo instead of 2 years).

That's hundreds of billions with current hardware. (Actually, not even current hardware, but the A100 which is last-gen; the H100 should already do substantially better.) But HW price-performance currently doubles every ~2 years. Yes, Moore's Law may be slowing, but I'd be surprised if we don't get another OOM improvement in price-performance during the next decade, especially given the insatiable demand for effective compute these days.

We don't want to haggle over the exact scale before it becomes infeasible, though---even if we get another 2 OOM in, we wanted to emphasize with our argument that 'the current method route' 1) requires regular scientific breakthroughs of the pre-TAI sort, and 2) even if we get there doesn't guarantee capabilities that look like magic compared to what we have now, depending on how much you believe in emergence. Both would be bottlenecks.

Yeah, I agree things would be a lot slower without algorithmic breakthroughs. Those do seem to be happening at a pretty good pace though (not just looking at ImageNet, but also looking at ML research subjectively). I'd assume they'll keep happening at the same rate so long as the number of people (and later, possibly AIs) focused on finding them keeps growing at the same rate.

I really appreciate that you took the time to provide such a detailed response to these arguments. I want to say this pretty often when on the forum, and maybe I should do it more often! 

Thanks for posting David! Better title than ours :) Would you mind pasting the text of the article for ease of the reader?

Done. Sorry I was slow. 

Curated and popular this week
Relevant opportunities