Hide table of contents

Comment Permalink

Great post! Seems like the predictability questions is impt given how much power laws surface in discussion of EA stuff.

More precisely, future citations as well as awards (e.g. Nobel Prize) are predicted by past citations in a range of disciplines

I want to argue that things which look like predicting future citations from past citations are at least partially "uninteresting" in their predictability, in a certain important sense.

(I think this is related to other comments, and have not read your google doc, so apologies if I'm restating. But I think its worth drawing out this distinction)

In many cases I can think of wanting good ex-ante prediction of heavy-tailed outcomes, I want to make these predictions about a collection which is in an "early stage". For example, I might want to predict which EAs will be successful academics, or which of 10 startups seed rounds I should invest in.

Having better predictive performance at earlier stages gives you a massive multiplier in heavy-tailed domains: investing in a Series C is dramatically more expensive than a seed investment.

Given this, I would really love to have a function which takes in the intrinsic characteristics of an object, and outputs a good prediction of performance.

Citations are not intrinsic characteristics.

When someone is choosing who to cite, they look at - among other things- how many citations they have. All else equal, a paper/author with more citations will get cited more than a paper with less citations. Given the limited attention span of academics (myself as case in point) the more highly cited paper will tend to get cited even if the alternative paper is objectively better.

(Ed Boyden at MIT has this idea of "hidden gems" in the literature which are extremely undercited papers with great ideas: I believe the original idea for PCR, a molecular bio technique, had been languishing for at least 5 years with very little attention before later rediscover. This is evidence for the failure of citations to track quality.)

Domains in which "the rich get richer" are known to follow heavy-tailed distributions (with an extra condition or two) by this story of preferential attachment.

In domains dominated by this effect we can predict ex-ante that the earliest settlers in a given "niche" are most likely to end up dominating the upper tail of the power law. But if the niche is empty, and we are asked to predict which of a set would be able to set up shop in the niche--based on intrinsic characteristics--we should be more skeptical of our predictive ability, it seems to me.

Besides citations, I'd argue that many/most other prestige-driven enterprises have at least a non-negligible component of their variance explained by preferential attachment. I don't think it's a coincidence that the oldest Universities in a geography also seem to be more prestigious, for example. This dynamic is also present in links on the interwebs and lots of other interesting places.

I'm currently most interested in how predictable heavy-tailed outcomes are before you have seen the citation-count analogue, because it seems like a lot of potentially valuable EA work is in niches which don't exist yet.

That doesn't mean the other type of predictability is useless, though. It seems like maybe on the margin we should actually be happier defaulting to making a bet on whichever option has accumulated the most "citations" to date instead of trusting our judgement of the intrinsic characteristics.

Anyhoo- thanks again for looking into this!

Showing 3 of 13 replies (Click to show all)

ecaMar 30 20211

Yeah this is great; I think Ed probably called them sleeping beauties and I was just misremembering :)

Thanks for the references!

Linch

Mar 29 2021

I have indeed made that comment somewhere. It was one of the more insightful/memorable comments she made when I interviewed her, but tragically I didn't end up writing down that question in the final document (maybe due to my own lack of researcher taste? :P) That said, human memory is fallible etc so maybe it'd be worthwhile to circle back to Liv and ask if she still endorses this, and/or ask other poker players how much they agree with it.

alex lawsen

Mar 29 2021

I've been much less successful than LivB but would endorse it, though I'd note that there are substantially better objective metrics than cash prizes for many kinds of online play, and I'd have a harder time arguing that those were less reliable than subjective judgements of other good players. It somewhat depends on sample though, at the highest stakes the combination of v small playerpool and fairly small samples make this quite believable.

See in context

How much does performance differ between people?

by Max_Daniel, Benjamin_Todd

Mar 25 20217 min read 77