5

I think that if someone's proposing multiple answers to a question, an answer appearing among their first guesses is some evidence that this answer is correct. If the set of possible answers is large, the amount of evidence might be substantial. I explain a though experiment to illustrate this, and an actual experiment that might test it.

I'm not totally sure my example holds up.

Here's a thought experiment. I'm going to suggest that it might be single comparisons, not multiple comparisons, that merit special treatment.

Suppose some economists publish a study that shows that people whose third grade teacher was called "Margaret" earn \$1000 more each year on average than people whose third grade teacher was not called "Margaret", p=0.001 (they had a big sample). It sounds like nonsense, but on further investigation you discover that

• this study was pre-registered
• the research group pre-registers all of their studies
• the only income association they ever investigated was the one they found: between having a third grade teacher called "Margaret" and earnings in later life

It seems to me that I would then be inclined to conclude that this association is real, and that if I am trying to forecast someone's earnings it might be a little bit helpful to ask them the name of their third grade teacher. I can even think of explanations - "Margarets" are, perhaps, more likely to work at affluent schools.

Now imagine exactly the same situation, except the research team investigated 10 000 different associations, none of which they pre-specified were especially plausible. Now it seems clear that the association is most likely an artefact of multiple comparisons.

There's a standard answer for how to deal with this second case from a Bayesian perspective, and that is to say (roughly) "my prior probability on having a 'Margaret' for third grade being strongly associated with earnings is too low for this evidence to rescue it". This position would also oblige you to reject the association in the first case because it's the same evidence about the same hypothesis, so if your conclusion only depends on this evidence and your prior probability then it must be the same conclusion. I think priors are important - if it was a study that reported evidence for precognition, I'd say "too low on priors" in both cases - but I don't think both cases are identical in this example.

We could run test whether people do treat these two scenarios differently. Present the first scenario to one group of people and ask them how likely they think the association is, and present the second scenario to another group of people and ask them the same question (I'm speaking loosely here; I think the language would need to be clearer to actually run a test). My guess is that the first group would think the association is quite likely real, and the second group would think it is not very likely to be real, but I'm pretty unsure about this. Of course, whether people do treat these scenarios differently and whether people should treat them differently are different questions.

If there is an asymmetry, that asymmetry must be due to one of two things:

1. In the first case the hypothesis of a substantial association between a third grade "Margaret" and pay was one of the first things the researchers thought of, and in the second case it was not
2. In the second case, there is additional evidence that most of the prospective associations considered by the researchers are not large

We could distinguish the two by positing a third scenario: we have the first scenario play out exactly as described, then the same researchers follow up with a second paper examining 9 999 other prospective associations with income, and finding that only 1 in 100 is stronger than the "3rd grade Margaret" association, and the vast majority of the stronger associations make more intuitive sense. In this case, do people revise their initial assessment of the reality of the "3rd grade Margaret" association? I think probably not too much, but again I'm pretty unsure. My main reason for thinking this is that I already expect most prospective associations to be small, and I expect the large ones to be more plausible than "3rd grade Margaret".

So it looks to me like I might be inclined in this case to take the author's prioritisation to be substantial evidence - maybe a Bayes factor of 10 - in favour of the prioritised hypothesis.

I think one of the unusual features of this example is that the following two things usually go together, but I've tried here to separate them:

• By my own lights, I think some hypothesis is a priori reasonably likely
• Some researcher thinks the same hypothesis is a priori reasonably likely

A rough theory of what's going on

Here's a rough theory of what might be going on:

• When we're trying to answer a question, we (or economics researchers) can propose rough answers before we've fully worked it out[1]
• The likelihood of an answer being proposed is proportional to its weight in some prior distribution over answers, along with a constraint that it's sufficiently different to answers already considered
• As a result, early answers tend to have higher prior probability than later ones
• By something like Aumann's agreement theorem, if we know that someone else assigns higher probability to X, we should probably do so too

This might be related to "anchoring bias", where people's numerical estimates can depend on obviously unreliable pieces of information presented just before the estimates are elicited. Unlike in the classic anchoring bias experiments, I think that taking author's opinions seriously in my example is reasonable. The restrictions I specified, mean that the author's choice of topic is a fairly honest representatio about how plausible they think it is relative to other things they could have investigated. The setup of anchoring bias experiments where these initial guesses are unreliable might be unusual in some sense - though it seems implausible that people usually represent their first guesses reliably. As a case in point, I had to go out of my way to specify that the researchers didn't have some large body of hidden research in addition to the headline result for my first example.

Conclusion

I'm not sure what the practical implications of this are. I know that I don't typically make any adjustments for authors beliefs when reading studies, and given that I rarely know how many other hypotheses they've tested I'm unlikely to change this policy much.

The cases where this seems potentially significant is when the space of possibilities is very large, and so a uniform probability would assign vanishingly small probabilities to any particular possibility. I attempted to suggest this interpretation in my example; I picked a seemingly arbitrary association to discuss, and there are many other equally arbitrary seeming associations we could think of. For another example, if we are considering the space of mathematical models of some phenomenon, someone proposing a model is perhaps quite strong evidence for this model being useful for the purpose. Thus, for example, if it's unclear how to choose an appropriate model for some problem, we could consider any model that actually gets proposed to have a modest probability of being appropriate, all else being equal (though everything else usually isn't equal).

1. ^

I'm agnostic about whether the generation process is brainstorming or something more involved, though this theory does not work if considerably more cumulative work informs later answers than earlier ones. Basically, whatever the process is, every answer gets the same amount of consideration.