https://www.openphilanthropy.org/blog/how-feasible-long-range-forecasting (a)
The opening:
How accurate do long-range (≥10yr) forecasts tend to be, and how much should we rely on them?
As an initial exploration of this question, I sought to study the track record of long-range forecasting exercises from the past. Unfortunately, my key finding so far is that it is difficult to learn much of value from those exercises, for the following reasons:
1. Long-range forecasts are often stated too imprecisely to be judged for accuracy. [More]
2. Even if a forecast is stated precisely, it might be difficult to find the information needed to check the forecast for accuracy. [More]
3. Degrees of confidence for long-range forecasts are rarely quantified. [More]
4. In most cases, no comparison to a “baseline method” or “null model” is possible, which makes it difficult to assess how easy or difficult the original forecasts were. [More]
5. Incentives for forecaster accuracy are usually unclear or weak. [More]
6. Very few studies have been designed so as to allow confident inference about which factors contributed to forecasting accuracy. [More]
7. It’s difficult to know how comparable past forecasting exercises are to the forecasting we do for grantmaking purposes, e.g. because the forecasts we make are of a different type, and because the forecasting training and methods we use are different. [More]
Happy to see this focus. I still find it quite strange out how little attention the general issue has gotten from other groups and how few decent studies exist.
I feel like one significant distinction for these discussions is that of calibration vs. resolution. This was mentioned in the footnotes (with a useful table) but I think it may deserve more attention here.
If long-term calibration is expected to be reasonable, then I would assume we could get much of the important information we could be interested in about forecasting ability from the resolution numbers. If forecasters are confident in predictions for a 5-20+ year time frame, this would be evident in corresponding high-resolution forecasts. If we want to compare these to baselines we could set them up now and compare resolution numbers.
We could also have forecasters do meta-forecasts; forecasts about forecasts. I believe that the straightforward resolution numbers should provide the main important data, but there could be other things you may be interested. For example, "What average level of resolution could we get on this set of questions if we were to spend X resources forecasting them?" If the forecasters were decently calibrated the main way this could go poorly is if the predictions to these questions would be low resolution, but if so that would be apparent quickly.
The much trickier thing seems to be calibration. If we cannot trust our forecasts to be calibrated over long time horizons, then the resolution of their forecasts is likely to be misleading, possibly in a highly systematic and deceiving way.
However, long-term calibration seems like a relatively constrained question to me, and one with possibly a pretty positive outlook. My impression from the table and spreadsheet is that in general, calibration was shown to be quite similar for short and long term forecasts. Also, it's not clear to me why calibration would be dramatically worse in long-term questions than it would be in specific short-term questions that we could test for cheap. For instance, if we expected that forecasters may be poorly calibrated on long-term questions because the incentives are poor, we could try having forecasters forecast very short-term questions with similarly poor incentives. I recall reading Anthony Aguirre speculating that he didn't expect Metaculus's forecaster's incentives to change much for long-term questions, but I forgot where this was mentioned (it may have been a podcast).
Having some long-term studies seems quite safe as well, but I'm not sure how much extra benefit they will give us compared to more rapid short-term studies combined with large sets of long-term predictions by calibrated forecasters (which should come with numbers of resolution).
Separately, I missed the footnotes on my first read through, but think that may have been my favorite part of it. The link is a bit small (though clicking on the citation numbers brings it up).