Let’s say you want to know how likely it is that an innovative new product will succeed, or that China will invade Taiwan in the next decade, or that a global pandemic will sweep the world — basically any question for which you can’t just use “predictive analytics,” because you don’t have a giant dataset you can plug into some statistical models like (say) Amazon can when predicting when your package will arrive.

Is it possible to produce reliable, accurate forecasts for such questions?

Somewhat amazingly, the answer appears to be “yes, if you do it right.”

Prediction markets are one promising method for doing this, but they’re mostly illegal in the US, and various implementation problems hinder their accuracy for now. Fortunately, there is also the “superforecasting” method, which is completely legal and very effective.

How does it work? The basic idea is very simple. The steps are:

  1. First, bother to measure forecasting accuracy at all. Some industries care a lot about their forecasting accuracy and therefore measure it, for example hedge funds. But most forecasting-heavy industries do not even bother to measure their forecasting accuracy,[1] for example the US intelligence community or philanthropy.[2]

  2. Second, identify the people who are consistently more accurate than everyone else — say, those in the top 0.1% for accuracy, for multiple years in a row. These are your “superforecasters.”

  3. Finally, pose your forecasting questions to the superforecasters, and use an aggregate of their predictions.

Technically, the usual method is a bit more complicated than that,[3] but these three simple steps are the core of the superforecasting method.

So, how well does this work?

A few years ago, the US intelligence community tested this method in a massive, rigorous forecasting tournament that included multiple randomized controlled trials and produced over a million forecasts on >500 geopolitical forecasting questions such as “Will there be a violent incident in the South China Sea in 2013 that kills at least one person?” This study found that:

  1. This method produced forecasts that were very well-calibrated, in the sense that forecasts made with 20% confidence came true 20% of the time, forecasts made with 80% confidence came true 80% of the time, and so on. The method is not a crystal ball; it can’t tell you for sure whether China will invade Taiwan in the next decade, but if it tells you there’s a 10% chance, then you can be pretty confident the odds really are pretty close to 10%, and decide what policy is appropriate given that level of risk.[4]

  2. This method produced forecasts that were far more accurate than those of a typical forecaster or other approaches that were tried, and ~30% more accurate than intelligence community analysts who (unlike the superforecasters[5]) had access to expensively-collected classified information and years of training in the geopolitical issues they were making forecasts about.[6] Those are pretty amazing results! And from an unusually careful and rigorous study, no less![7]

So you might think the US intelligence community has eagerly adopted the superforecasting method, especially since the study was funded by the intelligence community, specifically for the purpose of discovering ways to improve the accuracy of US intelligence estimates used by policymakers to make tough decisions. Unfortunately, in my experience, very few people in the US intelligence and national security communities have even heard of these results, or even the term “superforecasting.”[8]

A large organization such as the CIA or the Department of Defense has enough people, and makes enough forecasts, that it could implement all steps of the superforecasting method itself, if it wanted to. Smaller organizations, fortunately, can just contract already-verified superforecasters to make well-calibrated forecasts about the questions of greatest importance to their decision-making. In particular:

  • The superforecasters who out-predicted intelligence community analysts in the forecasting tournament described above are available to be contracted through Good Judgment Inc.

  • Another company, Hypermind, offers aggregated forecasts from “champion forecasters,” i.e. the most accurate forecasters across thousands of forecasting questions for corporate clients going back (in some cases) almost two decades.[9]

  • Several other projects, for example Metaculus, are also beginning to identify forecasters with unusually high accuracy across hundreds of questions.

These companies each have their own strengths and weaknesses, and Open Philanthropy has commissioned forecasts from all three in the past couple years. If you work for a small organization that regularly makes important decisions based on what you expect to happen in the future, including what you expect to happen if you make one decision vs. another, I suggest you try them out. (All three offer “conditional” questions, e.g. “What’s the probability of outcome X if I make decision A, and what’s the probability of that same outcome if I instead make decision B?”)

If you work for an organization that is very large and/or works with highly sensitive information, for example the CIA, you should consider implementing the entire superforecasting process internally. (Though contracting one or more of the above organizations might be a good way to test the model cheaply before going all-in.)


  1. Except to the extent they’re able to use predictive analytics for particular questions for which they have rich data sets, which isn’t the subject of this post. I’m focused here on “general-purpose” forecasting methods, i.e. methods that can generate forecasts for any reasonably well-specified forecasting questions, and not just for those conducive to predictive analytics. ↩︎

  2. In both example industries, there are a few exceptions, for example the intelligence community prediction market in the US intelligence community, or Open Philanthropy in philanthropy. ↩︎

  3. E.g. for higher accuracy you might want to “team” the superforecasters in a certain way. See Superforecasting for details. ↩︎

  1. By saying the odds “really are” close to 10%, I just mean that the 10%-confident predictions from this process are well-calibrated; I don’t mean to imply an interpretation of probability other than standard subjective Bayesianism. ↩︎

  2. A few superforecasters had a geopolitics background of some kind, but most did not. ↩︎

  3. For various accuracy comparisons, see Superforecasting, Mellers et al. (2014), and Goldstein et al. (2015). For high-level summaries of some of these results, see this page from Good Judgment Inc. and also AI Impacts (2019). ↩︎

  4. One limitation of the currently available evidence is that we don’t know how effective superforecasting (or really, any judgment-based forecasting technique) is on longer-range forecasting questions (see here). I have a hunch that superforecasting is capable of producing forecasts on well-specified long-range questions that are well-calibrated even if they’re not very strong on “resolution” (explained here), but that’s just a hunch. ↩︎

  5. For example, economist Tyler Cowen recently asked John Brennan (CIA Director until 2017): “You’re familiar with Philip Tetlock’s superforecasters project?” Brennan was not familiar. ↩︎

  6. Technically, Hypermind’s usual aggregation algorithm also includes forecasts from other forecasters too, but gives much greater weight to the forecasts of the “champion forecasters.” ↩︎

Show all footnotes
Comments3


Sorted by Click to highlight new comments since:

:surprised:

How can I put footnotes on my posts?!?!

Use these instructions!

https://forum.effectivealtruism.org/posts/fQ4HGx4AR2QXHR5RL/ea-forum-footnotes-are-live-and-other-updates#Footnotes_are_live

Wait so there are some people with oracle-like abilities? Shouldn't this be investigated more?

Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f
LewisBollard
 ·  · 8m read
 · 
> How the dismal science can help us end the dismal treatment of farm animals By Martin Gould ---------------------------------------- Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post. ---------------------------------------- This year we’ll be sharing a few notes from my colleagues on their areas of expertise. The first is from Martin. I’ll be back next month. - Lewis In 2024, Denmark announced plans to introduce the world’s first carbon tax on cow, sheep, and pig farming. Climate advocates celebrated, but animal advocates should be much more cautious. When Denmark’s Aarhus municipality tested a similar tax in 2022, beef purchases dropped by 40% while demand for chicken and pork increased. Beef is the most emissions-intensive meat, so carbon taxes hit it hardest — and Denmark’s policies don’t even cover chicken or fish. When the price of beef rises, consumers mostly shift to other meats like chicken. And replacing beef with chicken means more animals suffer in worse conditions — about 190 chickens are needed to match the meat from one cow, and chickens are raised in much worse conditions. It may be possible to design carbon taxes which avoid this outcome; a recent paper argues that a broad carbon tax would reduce all meat production (although it omits impacts on egg or dairy production). But with cows ten times more emissions-intensive than chicken per kilogram of meat, other governments may follow Denmark’s lead — focusing taxes on the highest emitters while ignoring the welfare implications. Beef is easily the most emissions-intensive meat, but also requires the fewest animals for a given amount. The graph shows climate emissions per tonne of meat on the right-hand side, and the number of animals needed to produce a kilogram of meat on the left. The fish “lives lost” number varies significantly by
Relevant opportunities
20
Eva
· · 1m read