There's a toy model of AI development where it's pretty easy to jump into cutting-edge research and be successful: all you need is a decent dataset, a decent algorithm, and lots of compute. In theory, all these things are achievable with money.

In practice, I assume it's more complicated, and the top labs today are accumulating resources that are hard to replicate: things like know-how, organizational practices, internal technical tools, and relationships with key external orgs. These things are harder to quantify, and and might not be as externally visible, but could provide a serious barrier to new entrants. 

So, how much do these intangibles matter?  Could new orgs easily become competitive with OpenAI/DeepMind, if they have lots of money to throw at the problem? Which intangibles matter most for keeping early labs ahead of their competitors? 

I'd love to get a take from people with relevant domain knowledge. 

  • Gwern's scaling hypothesis post mentions this dynamic, but it's hard to tell how important he thinks it is. He says "all of this hypothetically can be replicated relatively easily (never underestimate the amount of tweaking and special sauce it takes), [but] competitors lack the most important thing," which is belief in the scaling hypothesis. Well, Google and Microsoft have jumped into the large language models game now; I'm guessing that many orgs will follow them in the coming decade, including some with lots of money. So how much does the special sauce actually matter?
New Answer
New Comment

2 Answers sorted by

You can get an estimate based on how many authors there are on the papers (it's often quite a lot, e.g. 20-40). Though this will probably become less reliable in the future, as such organizations develop more infrastructure that's needed that no longer qualifies as "getting you on the paper", but is nonetheless important and not publicly available.

One problem with this estimate is that you don’t end up learning how long the authors spent on the project, or how important their contributions were. My sense is that contributors to industry publications often spent relatively little time on the project compared to academic contributors.

Rohin Shah
Yeah, good point.

Interesting, thanks! Any thoughts on how we should think about the relative contributions and specialization level of these different authors? ie,  a world of maximally important intangibles might be one where each author was responsible for tweaking a separate, important piece of the training process. 

My rough guess is that it's more like 2-5 subteams working on somewhat specialized things, with some teams being moderately more important and/or more specialized than others. 

Does that framing make sense, and if so, yeah, what do you think?

Rohin Shah
I haven't looked into it much, but the PaLM paper has a list of contributions in Appendix A that would be a good starting point.

Anthropic took less than a year to set up large model training infrastructure from scratch but with the benefit of experience. This indicates that infrastructure isn’t currently extremely hard to replicate.

EleutherAI has succeeded at training some fairly large models (the biggest has like 20B params, compared to 580B in PaLM) while basically just being talented amateurs (and also not really having money). These models introduced a simple but novel tweak to the transformer architecture that PaLM used (parallel attention and MLP layers). This suggests that experience also isn’t totally crucial.

I think that the importance of ML experience for success is kind of low compared to other domains of software engineering.

My guess is that entrenched labs will have bigger advantages as time goes on and as ML gets more complicated.