Yes, this is the main difference compared to forecasters being randomly assigned to a question.
I don't think you can learn much from observational data like this about the causal effect of the number of forecasters on performance. Do you have any natural experiments that you could exploit? (ie. some 'random' factor affecting the number of forecasters, that's not correlated with forecaster skill.) Or can you run a randomized experiment?
It sounds like you're doing subsampling. Bootstrapping is random sampling with replacement.
If, for example, we kept increasing the size of the sample we draw, then eventually the variance would be guaranteed to go to zero (when the sample size equals the total number of forecasters and there is only one possible sample we can draw).
With bootstrapping, there are possible draws when the bootstrap sample size is equal to the actual sample size . (And you could choose a bootstrap sample size .)
Imagine two cities. In one, it is safe for women to walk around at night and in the second it is not. I think the former city is better even if women don’t want to walk around at night, because I think that option is valuable to people even if they do not take it. Preference-satisfaction approaches miss this.
Don't people also have preferences for having more options?
I'm surprised the Nigerian business plan competition was not included. (Chris Blattman writeup from 2015 here: "Is this the most effective development program in history?".)
I say "They were arguably right, ex ante, to advocate for and participate in a project to deter the Nazi use of nuclear weapons." Actions in 1939-42 or around 1957-1959 are defensible.
Given this, is it accurate to call Einstein's letter a 'tragedy'? The tragic part was continuing the nuclear program after the German program was shut down.
I suppose sprints start out as jogs.
- 2 August 1939: Einstein-Szilárd letter to Roosevelt advocates for setting up a Manhattan Project. [...]
- June 1942: Hitler decides against an atomic program for practical reasons.
Is it accurate to say that the US and Germans were in a nuclear weapons race until 1942? So perhaps the takeaway is "if you're in a race, make sure to keep checking that the race is still on".
What is the 'policy relevance' of answering the title question? Ie. if the answer is "yes, forecaster count strongly increases accuracy", how would you go about increasing the number of forecasters?