How can you estimate the value of research output? You could use pairwise comparisons, e.g., to ask specialists how much more valuable Darwin's The Original of Species is than Dembski's Intelligent Design. Then you can use these relative valuations to estimate absolute valuations.
Summary
-
Estimating values is hard. One way to elicit value estimates is ask researchers to compare two different items and , asking how much better is than . This makes the problem more concrete than just asking "what is the value of ?". The Quantified Uncertainty Institute has made an app for doing this kind of thing, described here.
-
Nuño Sempere had a post about eliciting comparisons of research value from effective altruism researchers. This is a more recent post about AI risk, but it uses distributions instead of point estimates.
-
This post proposes some technical solutions to problems introduced to me in Nuño's post. In particular, it includes principled ways to
-
estimate subjective values,
-
measure consistency in pairwise value judgments,
-
measure agreement between the raters,
-
aggregate subjective values.
-
I also propose to use weighted least squares when the raters supply distributions instead of numbers. It is not clear to me it is worth it to ask for distributions in these kinds of questions though, as your uncertainty level can be modelled implicitly by comparing different pairwise comparisons.
-
-
I use these methods on the data from the 6 researchers post.
I'm assuming you have read the 6 researchers post recently. I think this post will be hard to read if you haven't.
Thank you for telling about this! In economics, the discrete choice model is used to estimate a scale-free utility function in similar way. It is used in health research for estimating QALYs, among other things, see e.g. this review paper.
But discrete choice / the Schulze method should probably not be used by themselves, as they cannot give us information about scale, only ordering. A possibility, which I find promising, is to combine the methods. Say that I have ten items I0…I9 I want you to rate. Then I can ask "Do you prefer Ii to Ij?" for some pairs and "How many times better is Ii than Ij?" for other pairs, hopefully in an optimal way. Then we would lessen the cognitive load of the study participants and make it easier to scale this kind of thing up.
(The congitive load of using distributions is the main reason why I'm skeptical about having participants using them in place of point estimates when doing pairwise comparisons.)