Executive summary

  • The h-index, which is used to track the performance of researchers, discourages novel research.
  • Reallocation of resources towards novel research would yield a net benefit to humanity through additional research on existential risks, longer lifespans, and higher economic growth.
  • The h-index can be improved by taking into account textual analysis / secondary citations.
  • The issue appears to be tractable, with realistic paths forward.

The problem - current incentives discourage novel research

A working paper by Battacharya and Packalen argues that

  • Productivity in scientific research has been declining (more input is needed for the same output)
  • A major cause of this problem is the lack of incentives for producing novel research
  • This can be explained by the career relevance of KPIs such as the "h-index", whose main components are the number of publications and the number of direct citations of those publications. More novel research is less likely to perform well on those indicators.

This thesis strikes me as a plausible. Beyond the evidence presented in the paper, ambitious researchers I know as well as the researchers they know appear to agree with it - you would expect them to be aware of the incentives they are operating under. 

Supposing that we accept the problem, what if anything should be done about it?

Should improving productivity in research be an EA goal?

The upside 

  • Research on emerging existential risk areas is more likely to be seen as novel. The first person to want to work on addressing a speculative risk scenario currently has strong incentives to work in a more established area. Therefore, a reallocation of resources towards more novel research would also be a reallocation of resources towards more existential risk research.
  • The same reasoning applies to other "neglected" cause areas, whether or not they are linked to an existential risk.
  • If we are to believe the paper's claims, more novel research is expected to lead to more breakthrough research, which usually leads to breakthrough technological innovation. Certain technological innovations may directly increase total human welfare (the most obvious example being an increase in the human lifespan).
  • Technological innovation also increases GDP growth, which reduces poverty and increases welfare for this as well as future generations.

The downside

  • Certain technological innovations may themselves present an existential-risk and/or a risk of inducing large amounts of suffering.

On balance

I would argue the positives outweigh the negatives. Given how unlikely it is that we will simply stop trying to improve science and technology or that we will improve human wisdom in some significant way in coming decades, it is better to increase the output of novel research than it is to decrease it or keep it as it is. We might create more problems for ourselves but we should also:

  • Spot them earlier
  • Identify solutions sooner.

Can we improve research incentives?

Ways to complement and/or supplant the h-index might include:

  • Textual analysis. This is the approach favoured in the Battacharya and Packalen paper: "By indexing the words and word sequences that appear in each scientific paper, we can construct first a list of the ideas that each scientific contribution builds upon and advances. The vintage of each idea can then be determined based on how long ago the idea first appeared in the literature. Having determined the ideas that appear in each paper and the vintage of each idea, we can then identify which research papers try out and advance relatively new ideas." Other text-based approaches might include determining the degree to which a paper combines ideas which usually are not found together.
  • Secondary citations. I was struck by the following quote in the article: "a vanishingly small fraction of published biochemical papers today that involve DNA cite the seminal paper by James Watson and Francis Crick on its molecular structure." Perhaps I am mistaken, but I would expect it to be the case that many papers cite papers which themselves cite Watson&Crick. In other words, secondary citations. A paper with more secondary citations could be said to be more "generative"[1]. More novel research is more likely to be generative in that sense. Yet to my knowledge only direct citations are taken into account in the h-index as it currently stands.

Turning the idea into reality

The issue appears to be highly tractable. We are, after all, talking about improving a 17-year old performance indicator - there are much harder tasks before us.

Two possible paths forward for those who might be interested in pursuing those ideas further:

  • The entrepreneurial path: Create your own index/scorecard, score the output of current researchers based on those indicators, put it in a website, email it to universities / the scientists directly. "According to those measures, you are the Nth most influential researcher in (this area)". Attention-grabbing, perhaps, but also effective?
  • The institutional path: Create an institution devoted to standardizing measures of scientific performance, assemble a panel of well-respected scientists and have them reach at a consensus on what those measures might be.
  • Of course there could be others!

Anyone interested in running with it? I'm probably not the right person to bring any of those ideas about, though I would be interested in heping others who might be.

  1. ^

    One could even go further and compare a paper's citation input to its output, though I would be more skeptical as creativity tends to be combinatorial - truly novel work need not start from scratch.


Sorted by Click to highlight new comments since: Today at 1:06 PM

Thanks, this is interesting! 2 questions and a comment:

1) Would a novelty-focused metric trade off against  replication work?

2) Would resource constraints matter for choice of metric? I'm thinking that some metrics are computationally/logistically easier to gather and maintain (e.g. pre-existing citation databases), and the cost/bother of performing textual analysis to some depth of the volumes of relevant literature might be non-negligible. 

My impression from reading some Wikipedia articles (https://en.wikipedia.org/wiki/H-index , https://en.wikipedia.org/wiki/Citation_impact , https://en.wikipedia.org/wiki/Citation_analysis ) is that there are lots of proposals for different metrics, but a common theme of criticism is the difficulty of comparing between disciplines, where field-dependent factors are critical to a metric being meaningful/useful. If this is the case, maybe a smaller version of this project would be to pick a particularly important field to EAs, and see if targeted analysis/work can propose a more relevant metric for it. 

PCO Moore

Good questions!

1) I agree that replication work is also vital. It seems to me that a better equilibrium compared to what we have now would both incentivize novelty and replication more than they do currently. Perhaps this entails two different metrics, which would both form part of an overall "scorecard" (the sabermetrics approach favoured by the cited paper).
2) This is likely to be one of the constraints yes. Would this also apply to secondary citations?

As for your comment - indeed,starting with a specific field would be a reasonable first step. Any idea which EA-relevant field would suffer the most from a lack of novel research? Biorisks perhaps? (I would assume AI research  suffers less from h-index incentives due to economic incentives, though I may be wrong).

I think secondary citations would be easier like you say. And you wouldn't have to stop there - once you have the citation data, you could probably do a lot of creative things analysing the resulting graphs (graphs in the mathematical sense). I expect it's where the input data is harder to reach and scrape (like whole text) that logistical worries enter.

Yeah I don't know! I'm sure there some folks who have thought about meta-science/improving science etc. that might have good ideas.

As an academic, I appreciate people thinking about these things. However, I think Goodhart’s law would bite most new measures hard (just as it bites the current ones). This would apply most clearly to textual analysis.