Investigating how technology-focused academic fields become self-sustaining

Ben Snodin; Megan Kinniment

Investigating how technology-focused academic fields become self-sustaining

Ben Snodin,

Comments 4

Sorted by

New & upvoted

kierangreig🔸

This was really cool! Thanks a bunch for writing it up :)

For those interested, it somewhat reminded me of Some Case Studies in Early Field Growth and Establishing a research field in the natural sciences.

One quick observation that is probably a small thing or not right:

For the 8 fields that reached establishment, the median time between a field’s origin year and establishment year[3] was 18 years, with the quickest field (Genetic Circuits) becoming established after 5 years, and the slowest (Clean Meat) becoming established after 63 years (the full list of times to establishment for the 8 fields, in years, is: 5, 16, 16, 17, 19, 26, 26, 63).

For Clean meat it looks like you use something like the date of postulation as the initial time point to measure the length of time to field establishment.

I don’t have a great understanding but have a feeling that for Genetic Circuits something like the date of postulation point maybe isn’t the initial time point used when measuring the length of time to field establishment?

If so, that might be doing most of the work in setting Genetic Circuits as the quickest and Clean Meat as the slowest.

Ben Snodin

Nice, thanks for those links, great to have those linked here since we didn't point to them in the report. I've seen the Open Phil one but I don't think I'd seen the Animal Ethics study, it looks very interesting.

Thanks for raising the point about speed of establishment for Clean Meat and Genetic Circuits! Our definition for the "origin year" (from here) is "The year that the technology or area is purposefully explored for the first time." So it's supposed to be when someone starts working on it, not when someone first has the idea. We think that Willem van Eelen started working on developing clean meat in the 1950's, so we set the origin year to be around then. Whereas as far as we're aware no-one was working on genetic circuits until much later.

At the moment I'm not sure whether the supplementary notes say anywhere that we think van Eelen was working on developing clean meat in the 50's, I think Megan is going to update the notes to make this clearer.

kierangreig🔸

Thanks, Ben! :)

Ofer

We define a self-sustaining field as “an academic research field that is capable of attracting the necessary funds and expertise for future work without reliance on a particular small group of people or funding sources” (see the next subsection for more on this definition).

I think that another aspect that is important to consider here is the goals of the funders. For example, an academic field may get huge amounts of funding from several industry actors that try to influence/bias researchers in certain ways (e.g. for the purpose of avoiding regulation). Such an academic field may satisfy the criterion above for being "self-sustaining" while having lower EV than what it would have if it had only a small group of EA-aligned funders.

Comments

See this Google Doc for a list of the factors and their definitions. ↩︎
For example, maybe the two non-self-sustaining fields we chose reflect our exposure to the EA, rationalist, and futurist communities, making these “atypical” in an important sense. See this section for more discussion. ↩︎
Roughly: “from when the field started existing to when it became self-sustaining”; see this section for more detailed definitions of “origin year” and “establishment year”. ↩︎
Technology Forecasting, Science and Technology Studies, and Innovation Studies seemed potentially relevant but we weren’t able to find much literature in these areas addressing the question of how academic fields emerge. A commenter on an earlier draft also mentioned Management, as well as Science Policy (example papers: An economic evaluation of the war on cancer, The relation between R&D spending and patents: The moderating effect of collaboration networks), which seems potentially relevant but which we didn’t investigate closely. ↩︎
Relevant work in this area includes Identifying emerging topics in science and technology, A novel approach to predicting exceptional growth in research, and Machine Intelligence for Scientific Discovery and Engineering Invention. ↩︎
I use “binary” here to indicate that each variable at a given time could take one of only two values (i.e. it could either be “on” or “off”), as opposed to, say, being graded on a scale of 1 to 5. ↩︎
An example might help the reader get a better feel for the origin year: for DNA Nanotechnology we judged the origin year to be 1982, the year Ned Seeman published his paper on the possibility of lattices made of DNA (Nucleic acid junctions and lattices), later considered to be the “founding paper” of the field. ↩︎
Although in some cases the factors had a negative effect: for example, we think a “dogged champion” for Strategies for Engineered Negligible Senescence may have on balance harmed the progress of the field in the 1960s and 70s. ↩︎
Here is the definition we used: At this time, there was in total more than $1mil per year (1990 USD equivalent) of risk tolerant funding going towards the development of this area or technology. Where risk tolerant funding is funding where it is accepted that failure is at least somewhat likely and that this is an acceptable outcome. Note that $1million in 1990 USD is roughly equivalent to $2million in 2021 USD, adjusting for inflation. ↩︎
My impression is that in the early stages of a field, the amount of “risk tolerant” funding, by our definition, would be similar to the total amount of funding of any kind. Though we didn’t explicitly track this. ↩︎
The confidence intervals are calculated assuming independent observations (but the observations aren’t independent) and can’t account for selection effects. This will tend to mean they underestimate the uncertainty. On the other hand, they also don’t incorporate information such as “the probability will tend to increase with increasing factor count” or “the probability will tend not to vary too quickly with respect to factor count”, which will tend to mean they overestimate the uncertainty. So it’s not even clear whether the “true” confidence intervals should be bigger or smaller in any particular case. The confidence intervals are the Clopper-Pearson confidence intervals for binomial proportions computed with a Python package called statsmodels. ↩︎
This result seems to be an example of “very simple metrics working much better than they have any apparent right to”. I guess a good source on this might be The robust beauty of improper linear models in decision making (although I haven’t read it). ↩︎
It wasn’t obvious to me that we would find a relationship this strong: While some of the factors seem likely to be directly related to whether a field is thriving (e.g. “risk tolerant funding”), in many cases it’s not clear how closely related they are (e.g. “ongoing innovations in neighbouring fields”, “no widespread political or ethical baggage”). ↩︎
One issue is that we only have 8 observations of establishment years, whereas we have 100s of observations of years in which a field is self-sustaining. ↩︎
See here for a more detailed view, showing which fields had which factors at each time slice. ↩︎
Of course, there might also be an effect here from certain factors being harder to find evidence for than others, but I doubt this is an important part of the story here. ↩︎
We defined this factor as “relevant advances in fabrication within the last 10 years”, so it’s not obvious exactly when the fabrication advances corresponding to this peak occurred. ↩︎
See the Appendix section called Investigating factor importance with decision trees for more details. ↩︎
Note that the breakout period includes both the origin year and the establishment year. ↩︎
The confidence interval shown for the 8-first-occurrences-in-the-breakout-period case is consistent with a range of values for the simulated probability of zero instances for that case; but it turns out that the simulated probability of zero instances is around 96%, so observing one instance of 8-first-occurrences-in-the-breakout-period is indeed very unlikely to happen by chance. ↩︎

Investigating how technology-focused academic fields become self-sustaining

Investigating how technology-focused academic fields become self-sustaining

Summary

Guide to the rest of the post

Research question, context, and motivation

Defining “self-sustaining”

Motivation for the project

The surrounding academic literature

The data we collected

Possible sources of bias and other issues

Data collection

Technology selection bias and generalisability

Factors that are switched on when a field is self-sustaining (almost) by definition

Prediction takeaways

Differential technological development takeaways

Brief opinionated stories about field establishment

Detailed results

Factor count is strongly related to whether a field is self-sustaining

Factor count is positively related to whether / when a field becomes self-sustaining

When the different factors tend to occur

Factor occurrences in the run-up to establishment

Factor occurrences in the establishment year compared to the non-self-sustaining fields in 2020

Results from statistical measures of factor importance

The length of time between field origin and establishment

Opinionated stories about field establishment

AI

Clean Meat

DNA Nanotechnology

Fusion Power

Genetic Circuits

Quantum Computing

RNA Vaccines

Solid State Batteries

Themes from reading about field histories

More details

Final thoughts

Directions for further work

Acknowledgements

Appendix: more results

Plotting the logistic regression R-squareds against factor lag arguably provides weak evidence that our establishment year judgements are biased

Sensitivity analyses suggest that the logistic regression of establishment year against factor count is robust to exact choice of fields and establishment years

The distribution of factor first occurrences within the breakout period isn’t just due to noise

Factor count dynamics

Investigating factor importance with decision trees

Correlations between factors

Top pairwise correlations

Negative average correlation

We initially tried to develop a quantitative field establishment measure, but eventually decided not to

Paper counts

Notes