Katherine Milkman on Twitter notes how far off the epidemiological expert forecasts were in the linked sample:
They gave an average estimate of 20,000 cases. The actual outcome was 122,653 by the stated date in the U.S. That's off by a factor of 6.13.
I was curious how this compares to the Metaculus community forecast (note: not the machine learning fed one, just the simple median prediction). Unfortunately the interface doesn't tell me the full distribution at date x, it just says what the median was at the time. If the expert central tendency was off by a factor of 6.13, how far off was it for Metaculus?
I looked into it in this document:
Sadly a direct comparison is not really feasible, since we weren't predicting the same questions. But suppose if all predictions of importance were inputted into platforms such as the Good Judgement Project Open or Metaculus. Then making comparisons between groups could be trivial and continuous. This isn't even "experts versus non-experts". The relevant comparison is at the platform-level. It is "untrackable and unworkable one-off PDFs of somebody's projections" versus proper scoring and aggregation over time. Since Metaculus accounts can be entirely anonymous, why wouldn't we want every expert to input their forecast into a track record? That would make it possible to find out if the person is a dart-throwing chimp. You should assume half of them are.