[Cause Exploration Prizes] Training experts to be forecasters

Sam Abbott; nikos

This essay was submitted to Open Philanthropy's Cause Exploration Prizes contest.

If you're seeing this in summer 2022, we'll be posting many submissions in a short period. If you want to stop seeing them so often, apply a filter for the appropriate tag!

Summary

Improving decision-making through forecasting requires both knowledge of how to forecast, and domain expertise to develop the forecast question, provide guidance for sources of evidence and their synthesis, and help facilitate the use of the results of the forecasting process. We argue that one, currently neglected, strategy of making forecasting more useful is to focus on making domain experts better forecasters. The current dominant strategy to make forecasting useful relies on the use of aggregate forecasts from a pool of self-selecting forecasters, providing feedback and incentives, identifying high-performing forecasters from this pool, and focussing on forecasts from this subset. We refer to this approach as forecast consulting and suggest the model is similar to other forms of consultancy. Significant resources are being spent on training and identifying high-performing forecasters with some of this focus being to give these forecasters the time, space, and training needed to gain relevant domain knowledge. Currently little is done to encourage existing domain experts to become better forecasters themselves or to identify those with innate potential. We believe that encouraging and helping experts to become better forecasters has potential benefits that far exceed what is feasible through the current dominant strategy of employing generalist forecasters as consultants alone.

Context on forecasting

There is some consensus in the EA community that forecasting, as commonly practised in the community, is important and helpful. Forecasting is “big if true”: In principle, it is an excellent way to summarise existing knowledge. It can help guide and improve relevant and important decisions by making implicit intuitions and assessments explicit, and by revealing uncertainty and variation in opinions. Forecasting can be understood as a thought process, that can unmask blind spots, help make disagreements clear, and aid in synthesising evidence. For many, forecasts are also a form of “news source” that helps them interpret and contextualise current events.

In the following, we focus on forecasting that is explicitly used to inform decision-making. In this context, forecasting can provide value if

We can select a useful forecast target.
The forecasters can forecast the target quantity meaningfully. This means both that it is possible to forecast the target, and that the forecasters have the required expertise to understand the target and make a forecast.
The forecasts are “good” or “good enough” to inform the decision being made.
If practised by decision-makers the exercise of forecasting, rather than the output of any given forecast, may be beneficial by helping to structure and formalise, synthesise evidence, calibrate expectations, and foster useful debate.
Relevant people trust good forecasts and use them as part of their decision-making process.

In the following, we argue that we can achieve our goal of improving decision-making through forecasting if more focus is given to increasing the use of forecasting by domain experts and improving their forecasting practice. Note: we’re using the term “expert” very broadly here. An expert could be an academic, but could also be a policy maker or someone with personal experience and deep knowledge of the subject.

The current relationship between forecasting and domain expertise

Today, the overlap between forecasters, domain experts, and policymakers does not appear to be very large. This is despite many years of research which indicates that forecasting may lead to improved decision-making, which ultimately both experts and policymakers are interested in. Most forecasters on public platforms like Metaculus or Manifold Markets are not initially experts in the areas they forecast, and as far as we are aware, this appears to be generally true across the wider space (though we have heard of niche efforts in some domain areas, for example, the US defence community where those with expert knowledge are encouraged to forecast and sporadic efforts in infectious disease epidemiology). The dominant strategy to explicitly combine forecasting and decision making is what we call “forecasting as consulting”. Policymakers and domain experts determine what they would need forecasts on, forecasters consult on what a good operationalisation of the question could look like and then provide predictions. This approach is easy to operationalise and generalisable across domains, but in our eyes ultimately provides limited value compared to other alternatives we will explore in this text.

“Forecasting as consulting” usually works by either eliciting forecasts from a large crowd, or by employing a smaller group of experienced general forecasters. On both approaches, significant amounts of money are spent:

The prize pool for tournaments currently open at Metaculus is about $50k
Manifold Markets have received a $1m grant from the FTX Future Fund to further develop their platform (which is mostly aimed at a non-specialist audience)
Metaculus recently was awarded a $5.5m grant from Open Phil to advance forecasting as a social good. Some of the money will be used to pay professional forecasters.
In February, Metaculus asked professional forecasters to predict nuclear risk in the context of the Ukraine war.
The Swift Centre for applied forecasting (https://www.swiftcentre.org/) received a $2m grant from the FTX Future Fund to hire professional forecasters (among other things)

The implicit assumption behind the “forecasting as consulting” approach is that people with a specific aptitude and interest in forecasting, given enough training, experience, and time to synthesise sources of evidence, can make useful contributions, regardless of significant domain expertise on the forecast target. This may be true, in a similar way that consultants in other fields are thought to make useful contributions.

Generalist forecasters usually acquire and benefit from domain expertise in the areas they forecast in. Making a good forecast usually requires knowledge of past trends, current developments, and factors that might influence the forecast target in the future. Much of this domain expertise can likely be learnt quickly for many forecast questions. Being able to do this well, combined with understanding what expertise is required for a given target, may be one of the generalist forecaster skills that is linked to good performance. Some successful forecasters are known to spend significant amounts of time on research and evidence synthesis for their forecasts, potentially accumulating significant subject matter expertise. Most forecasters, however, are likely unable to allocate the amount of time this research requires and fewer are likely able to dedicate sufficient time across multiple domains.

The current approach to making forecasting impactful could be summarised as follows:

Developing a large community of forecasters, with a particular focus on identifying and training top-performing forecasters.
Working closely with decision-makers and domain experts to identify promising forecast targets and provide useful insights.
Encourage forecasters to gain the subject area expertise required to understand the forecast target, identify relevant sources of evidence, and synthesise this evidence into a forecast.
Develop a pool of skilled forecasters with the hope they apply these skills more widely in impactful organisations.

Turning experts into forecasters

We argue that we should pay more attention to the opposite approach: encouraging existing domain experts to become better forecasters both in public and in the organisations, they operate within.

We can get better forecasts

It seems likely that a domain expert with the same aptitude for forecasting and given the same training as a generalist forecaster would outperform on questions in their domain. It also appears likely that to forecast targets in rapidly evolving areas of interest, for example, zoonotic spill-over events, trained domain experts will outperform initially regardless of forecaster aptitude as the generalist forecaster requires time to understand the forecast target, gather sources of evidence, and synthesise them.

We base these conclusions on the following. Domain experts can draw on rich experience in their areas of interest that may help contextualise the forecast target, distinguish signal from noise, already be aware of the various sources of evidence (for example other domain experts) and their various limitations, and identify key details that non-experts might overlook. We think it is likely that domain experts could be trained in the same way that we currently train forecasters. We also think it is likely that experts could be encouraged to be active forecasters given the right communication and incentives. We are not aware of any evidence supporting our claims but think that our prior on this being beneficial is sufficiently strong to make a good case that this kind of evidence should be collected. One potential flaw in our argument is that it may be the case that those selected via open access platforms have an inherently higher potential to be good forecasters, but we consider this unlikely given the limited populations of these platforms and the generally weak evidence that forecasting ability is an innate rather than learnable skill.

Some forecast targets require nuanced domain expertise in a way that may prevent non-experts from making useful predictions without a large time commitment to acquire additional expertise. Sometimes, questions are important, but require a large amount of context and domain knowledge. One example is this Metaculus question on the generation interval for Omicron (authored by us), which asks about a parameter that is important in infectious disease modelling and that influenced, at least in the UK, decision makers’ response to the Omicron variant in late 2021 and early 2022. The definition and estimation of that parameter, however, are complex and so prone to different biases that need to be disentangled. Given the minimal community response to this question, the level of uncertainty in the forecast, and the lack of updates over time as new evidence became available, we are unsure what proportion of those forecasting on it grasped the forecast target fully, and if they did, whether or not they were able to sufficiently weigh the available literature. This means at least for this forecast question we, as domain experts in this area, would not feel able to recommend this forecast to decision-makers in our networks. If others feel similarly this would render the effort spent outlining the question, and the time spent by consulting forecasters meaningless (aside from any training benefits). Similarly, existing domain knowledge is indispensable in very fast-moving situations in which generalist forecasters did not have the time to acquire expertise or sources of evidence from which to “borrow” expertise. Generalist forecasters struggled to produce forecasts that weren’t immediately identifiable as incorrect by domain experts for the recent monkeypox outbreak. For example, initially endemic cases were ignored, leading to spuriously low initial forecasts, and then time-varying ascertainment bias was not accounted for, leading to overestimation of the growth rate. For this forecast question, there was also some uncertainty as to whether the target was infections or reported cases and this led to changes of several orders of magnitude in the forecast. We are currently working on evaluating these early monkeypox forecasts to understand whether they would have added utility over a generic domain area baseline. We note that experts were invited to contribute to these forecasts but that there were zero responses to the invitation from the last author of this study.

Current incentive mechanisms don’t strongly encourage forecasters to acquire a deep understanding, or to collect information ahead of time, such that it is immediately useful when an unforeseen situation occurs, and predictions are urgently needed. Ege Erdil recently published a piece in which he used earthquakes as an example. The idea is that you can make a reasonable forecast about earthquakes without any deep understanding of plate tectonics based on past distributions and recent trends. Conversely, a deep understanding of plate tectonics wouldn’t immediately translate into a good forecast. For generalist forecasters with little prior domain expertise, a key skill is being able to rapidly identify which sources of evidence are of most use and focus on synthesising these. Conversely, for domain experts, who already have large amounts of domain knowledge, the skill is to instead learn how to identify what areas of their expertise are required to forecast a given question and which are not. It is important to realise that the concept of parsimonious models, mental or otherwise, is already well known and practised in many domain areas and so it is likely that many domain experts already possess this skill to varying degrees.

Domain experts are also most likely in a position where they can actively improve predictions in the future by developing new forecasting methods or by identifying new data that should be collected. For example, one of the authors (SA) authored an initial estimate for the generation time of the Omicron variant which was used by decision-makers in later 2021 and led to the development of the related Metaculus question. Along with other researchers, he then authored a follow-up study exploring these initial estimates for bias.

The most relevant forecasts are not made by members of the forecasting community

Even if trained domain experts did not perform any better than professional forecasters, we still see value in training experts in forecasting. This is because most forecasting is informal and implicit and happens in boards, committees, and conversations between decision-makers. Professional forecasters simply do not have access to these areas as it stands though this may change in the future if domains recognise the value (and this is shown conclusively) of generalist forecasters within decision-making loops. In addition, often information that is needed to inform the most relevant decisions is confidential (or entails information hazards) and cannot easily be shared with external forecasters. In many instances, this may be less of an issue when employing professional forecasters who can sign non-disclosure agreements, but this may be difficult to organise over short periods and requires significant belief on the part of decision-makers that the forecasts produced will be worth the effort of facilitating this.

Unless the forecasting community can position trained forecasters into key decision-making positions, the relevant circles are difficult to access for the forecasting community. The current model of “forecasting as consulting” is one way in which we increase the use of forecasting by decision-makers. It is hard to quantify how much success this approach has had in the past (and it would be good to collect more formal evidence on this) or how much success we should expect given the significant resources being allocated to this aim. Even if the current model is very successful, it can only capture a small fraction of potential forecasting questions that can be formalised within the timelines required for critical day-to-day decision-making.

In contrast to “forecasting as consulting”-type models of collaboration, teaching experts how to forecast may have a lasting effect even in situations where no formal forecasts are elicited by improving the way experts and decision-makers reason about uncertainty and approach decisions. Lastly, training forecasters may also serve to establish a valuable point of contact in the relevant circles, potentially increasing the uptake and utility of traditional forecasting as consulting.

Traditional experts can make their forecasts heard

Forecasts, even very good ones, will only be able to influence decision-making if decision-makers take them seriously. Similarly, they can only be used to justify a decision if they are being viewed as a respectable source by the public otherwise, decision-makers will likely need to obfuscate their usage. This is much easier for traditional experts as they have prior work in the area on which to justify their views, they have an established network that likely includes the decision makers they seek to influence, and because in general credentials are still widely respected across domains and by the public.

In addition to this, domain experts are more likely to be able to contextualise and explain their forecasts, or forecasts from the wider community, within the wider domain area. This is particularly key for forecast targets that may be sensitive or in areas where quantitative forecasts are unusual. In some circumstances, this may be critical for encouraging the uptake of forecasts. Similarly, as experts are typically operating with a much greater depth of domain knowledge it is more likely they will be able to identify the implications of forecasts.

Potential solutions

This is a complex area and clearly, more research is needed to understand the potential trade-offs of proven generalist forecasts gaining domain area expertise versus domain experts gaining forecasting expertise. We believe that it is worthwhile to encourage domain experts to become better forecasters and provide some suggestions for how this could be approached. The overall goals we have in mind are to get domain experts interested in forecasting, to provide incentives for them to interact with forecasting, and to make it easy to train them in how to forecast. Our list is by no means complete and we do not think the suggestions are necessarily optimal.

Provide easy mechanisms for domain experts to ask questions of interest to them.
Make it easy for experts to identify other experts interested and active in forecasting.
Make it easy for domain experts to share their forecasts and comparative track records with others.
Make it easy for experts to obtain meaningful forecast outputs, including
- Full forecasting data in convenient formats
- Making it possible to filter by forecaster domain expertise
Provide free training for domain experts and policymakers in forecasting and incentivise its uptake.
Create titles/positions that confer money/prestige to experts willing to engage in public forecasting. For example, letting experts become part of a selective task force. This is needed as public forecasting has a large downside for domain experts (i.e if they are incorrect) and little upside which is generally not the case for generalist forecasters.
Encourage more group forecasting, with hybrid groups of generalist forecasters and domain experts.
To support our assumptions on the value of domain knowledge unbiased research is needed to compare generalist and domain expert forecasters after adjusting for time spent forecasting. In addition, research is needed to explore our assumption that time spent forecasting on a given domain increases performance in this domain as this is key to several of our arguments

Conflict of interest note

One of the authors, NB, works for Metaculus and would in that capacity potentially profit from more resources being allocated to causes related to forecasting. Both authors qualify as “experts” in epidemiology and might benefit from more resources being attributed to domain experts who engage in forecasting.

About the Authors

Nikos Bosse is currently pursuing a PhD in Epidemiology at the London School of Hygiene & Tropical Medicine, focusing on evaluating infectious disease forecasts. He also works as a research coordinator for the prediction platform Metaculus and is training to be a medical doctor. His past work includes developing an application to elicit forecasts of COVID-19 case and death numbers, which was used to compare human and model-based predictions of COVID-19 in Poland and Germany. He also developed an R package, scoringutils, that facilitates the evaluation and comparison of forecasts from different models. In his capacity as research coordinator for Metaculus as well as PhD student he has worked on forecasting in epidemiological settings and believes that forecasting by experts as well as generalists can be very valuable. To help foster the forecasting community by providing information for forecasters and forecasting researchers he founded the Forecasting Wiki. He has been involved in the EA community for several years and is trying to help make forecasting a more useful tool for decision-making. He has done a fair share of forecasting himself but is not a power user of any of the large forecasting platforms.

Sam Abbott is a researcher at the London School of Hygiene and Tropical Medicine with a background in mathematical modelling. Since early 2020 he has worked as part of the academic response to the COVID-19 pandemic focussing on real-time analyses, nowcasting, and short-term forecasting. He provided short-term forecasts and estimates of the effective reproduction number to government advisory bodies which were then aggregated along with those from other research groups and used to help inform government policy-makers. He has co-authored multiple other studies used to inform national and international decision-making throughout the pandemic. Methods he has developed are used within public health agencies as part of their decision-making processes as well as by other researchers. For more on his research see here and here. His interest in human forecasts is mainly driven by an interest in synthesising expert opinion, and the importance of evaluating forecasts from decision makers. He is sceptical of the value of aggregated forecasts from generalists but open to the idea that some people are better able to rapidly synthesise evidence than others and that this is a useful skill to encourage. He does not identify as an effective altruist. He is a sporadic user of Metaculus, mainly forecasting on questions linked to his domain expertise.

Sam AbbottAug 26 20228

Since we wrote this submission I have come across this nice forum piece looking at some of the evidence comparing generalist forecasters and domain experts. From my (biased) perspective it generally confirms some of the statements we make in this piece but also really highlights that more work is needed. I'd definitely suggest giving it a read.

https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts

Nathan YoungAug 27 20223

The link doesn’t work, can you press space after it so that it does.

Sam AbbottAug 27 20221

Thanks!

Misha_YagudinSep 10 20227

A few considerations against, tied by generalism enables scale theme:

(1) There are a lot of domains where one can become an expert: it feels infeasible to train and select very capable forecasters in all of them. Being generally thoughtful person/forecaster allows to somewhat successfully go into areas outside your immediate expertise.

Training/selecting experts in a few especially important niches (e.g., AI, biosecurity, and certain topics in geopolitics) seems good and feasible.

(2) But at times of crisis, experts' time is much more valuable than generalist's time. Even now, it's often the case that competent forecasters are quite busy with their main jobs — it's not unlikely that competent forecaster-experts should be doing something different from forecasting.

Sam AbbottOct 26 20221

For 2 how important do you think forecasting is if those best suited to it (assuming experts are) shouldn't be spending their time on it?

For ID settings early outbreak forecasts can be critical and the decisions made are often informed by local + international expert teams.

Misha_YagudinOct 26 20222

I don't think your argument reflects much on the importance of forecasting. E.g., it might be the case that forecasting is much more important than whatever experts are going (in absolute terms), but nonetheless, experts should do their things because no one else can substitute them. (To be clear, this is a hypothetical against the structure of the argument.)

I think it's best to access the value of information you can get from forecasting directly.

Hopefully, we can make forecasts credible and communicate it to sympathetic experts on such teams.

Nathan YoungAug 27 20224

I agree. I’m often surprised given the amount of money going into forecasting as a space that we don’t have a tool for orgs to easily record and share forecasts.

JoshuaBlakeAug 31 20223

This forum post is an interesting example in nuclear war of how experts can improve and critique outputs from generalist forecasters.

https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/nuclear-expert-comment-on-samotsvety-nuclear-risk-forecast-2

Sam AbbottAug 31 20221

Thanks that was a useful read.

Effective Altruism Forum
EA Forum