We do track whether predictions have a positive ("good thing will happen") or negative ("bad thing will happen") framing, so testing for optimism/pessimism bias is definitely possible. However, only 2% of predictions have a negative framing, so our sample size is too low to say anything conclusive about this yet.
Enriching our database with base rates and categories would be fantastic, but my hunch is that given the nature and phrasing of our questions this would be impossible to do at scale. I'm much more bullish on per-predictor analyses and that's more or less what we're doing with the individual dashboards.
Very good point!
I see a few ways of assessing "global overconfidence":
Interesting, thanks for sharing that trick!
Our forecasting questions are indeed maximally uncertain in some absolute sense because our base rate is ~50%, but it may also be the case that they're particularly uncertain to the person making the prediction as you suggest.
A related issue is that people may be more comfortable making predictions about less important aspects of the project, since the consequences of being wrong are lower
I'm actually concerned about the same thing but for exactly the opposite reason, i.e. that because the consequences of being wrong (a hit to one's Brier score) are the same regardless of the importance of the prediction people might allocate the same time and effort to any prediction, including the more important ones that should perhaps warrant closer examination.
We're currently trialing some of the stuff you suggest about bringing in other people to suggest predictions. This might be an improvement, but it's too early to say, and scaling it up wouldn't be easy for a few reasons:
Thanks for doing these analyses!
I recently had to dive into the Metaculus data for a report I'm writing and I produced the following plot along the way. I'm posting it here because it didn't make it into the final report, but I felt it was worth sharing anyway.
Each dot corresponds to the Brier score for the community prediction on every non-ambiguously resolved question as a function of time horizon (i.e. time remaining until resolution when the prediction was made). There are up to 101 predictions per question for the reasons you describe in the post. The red line is a moving average and the shaded area is a (t-distributed) 95% confidence interval around the mean.