Hide table of contents

Tl;dr: In Czech Priorities, an EA-aligned think tank based in Prague, together with Metaculus, we recently ran a complex forecasting tournament funded by EAIF to support Czech policymaking. Today, we are publishing a write-up of our findings from the tournament, as well as supplementary materials to help any teams or groups of forecasters looking to do the same.
______

From October 2022 to March 2023, we ran a forecasting tournament with a total of 54 questions. In March, we have discussed some of our preliminary findings in this post.

Almost all of our forecasting questions were developed in cooperation with 16 different public institutions and ministerial departments. Each institution or department defined its most useful forecasting topics, participated in a workshop to define specific questions with us, and was later provided with the results. This was intended as a proof of concept of one possible approach to incorporating forecasting in public decision-making.

Once defined, our forecasting questions were then posted on a private Metaculus sub-domain (in Czech language), where an average of 72 forecasters had the opportunity to address them as they would any other question on Metaculus (median of 18 predictions per user). Throughout the tournament, we produced 16 reports detailing the rationales and forecasts, to be used by the cooperating institutions. The institutions and their topics were listed in our previous post.

This approach, in combination with multiple media appearances, also allowed us to strengthen our position as one of the leading Czech institutions with expertise in foresight methods.

We've created a write-up detailing our steps in both talking to institutions and managing the tournament with its specifics. Here are our five overarching takeaways for groups of forecasters who want to make an impact through policymaking, and some specific lessons learned:

General takeaways

  1. Develop partnerships with policymakers across various policy areas. Finding policymakers eager to try analytical innovations is usually not the main bottleneck.
  2. Diversify your portfolio of foresight methods. Particularly in the EU, policymakers are becoming increasingly aware of the importance of foresight. Their familiarity with methods such as scenario planning or horizon scanning can grant more inroads into the policy process for forecasting if leveraged well.
  3. Expect to deal with complex, long-term forecasting questions. They are requested often and can have larger impacts if used to adjust long-term plans and strategies.
  4. Even among promising leads, only approx. 25% will see the process to the end. Forecasting is not yet a priority for many policymakers and there are many steps where it may fail on the way to delivering impact.
  5. Don't expect clearly measurable large impacts. Large impacts are usually difficult to trace back to individual data points (as they require more data and negotiation), while measurable uses are usually based on individual decisions, producing smaller impacts.

Practical lessons learned - Policy

  • Be prepared to be in the driver’s seat. While public institutions might be largely supportive of the idea of forecasting, their ability to closely cooperate for the whole duration of the several months-long forecasting tournament is limited. Keep in mind that their primary function is usually not to experiment and discuss forecasting questions and findings.
  • Don't get locked in. Even if an institution is receptive to forecasting, you may eventually find out that developing feasible questions for their topics of interest is not possible (such as due to data availability, time horizons, etc.). In this case, do not feel obliged to submit sub-optimal questions to forecasters. You are the partner with awareness of how forecasting inputs should look.
  • Find the sweet spot. Policymakers will not want to include probabilistic forecasts in only one chapter in a policy document of many. It would look inconsistent. Aim for policy issues that are likely to have their own standalone discussions and outputs, where forecasting can really pop.
  • Give them something to think about. We received very positive feedback on including forecaster rationales and other contextualizing information in supplemental materials provided along with pure probabilistic information. Try to aim for anywhere between 3-10 pages for a handful of questions.

Practical lessons learned - Forecasting tournaments

  • Help them help you. Scoring rules determine the feedback that forecasters get on their predictions. They need to properly understand how scores are calculated and what they mean so that it serves to inform and motivate them. 
  • Mind the gap. There are numerous factors that will make significant drop-offs inevitable in forecasting tournaments (cognitive and time demands, primarily online activity, etc.). Keep this in mind when planning your forecaster recruitment strategy and goals.
  • Variety is the spice of life. Forecasters strongly favored a wide range of topics covered in the tournament. There's a need to strike a balance between greater diversity and the greater time investment (for research) demanded by it. 
  • Can't win them all. At the start, there are three important objectives: improving public understanding and acceptance of forecasting; identifying and developing top forecasters (and generally keeping forecasters engaged); and crowdsourcing forecasts useful for public policy. At various times these objectives may temporarily clash. Know which is your priority.
  • Rationales don't compete. We offered additional rewards for well-thought-out rationales. While this improved the base quality of contributions, we did not observe explicit competition between forecasters in writing outstanding rationales.

Impacts

In our case, a handful of our partners already acted on the information/judgment presented in our reports. This has concerned, for example, the national foreclosure issue (some 6% of the total population have debts in arrears) where the debt relief process is being redesigned midst of strong lobbying and insufficient personal capacities; or the probabilities of outlier scenarios for European macroeconomic development, which was requested by the Slovakian Ministry of Finance to help calibrate their existing judgments.

Other partners claim to have incorporated this knowledge into the larger cycle of their policymaking process, but we haven't yet seen any actual evidence of it. Our experience and candid discussions with policymakers and forecasters alike, however, also gave us some pointers on what pitfalls are intensified when tournaments are focused on policy questions (such as the incentives and motivations of forecasters).

In general, it seems useful to explore various approaches to grow the number of policymakers with personal experience & skills in forecasting. In our case, we found curiosity and willingness to try forecasting even in unexpected institutional locations (i.e. the Czech R&I funding body). This makes us more confident that the “external forecasts” approach (as compared to building internal prediction tournaments or focusing on advancing the forecasting skills of public servants) is worth investigating further precisely because it allows us to detect and draw on this interest irrespective of institutional and seniority distinctions and resource constraints. 

While we hope that any readers with an interest in forecasting may find our experience useful, we expect that both this and any future projects of ours make it easier for other teams to work towards similar goals. To that effect, the write-up also contains an Annex of “Methodological Guidelines,” where we outline in more explicit terms the questions and decisions that we found were important to tackle when running the project, and what they may entail.

Access our full report HERE.


 

Comments2


Sorted by Click to highlight new comments since:

Thanks for writing this up! As someone who recently ran a forecasting event at a UK Government department for my MSc research project, I fully appreciate some of your challenges (e.g. around attrition and creating a variety of questions).

In your experience, how well did the participants feel the link was between the forecasts they were making and any decisions that were being made on the area/topic? Did they feel like the forecasts would influence/be integrated effectively when a decision on the relevant area was being made? If so, did you notice any improvement in forecasting accuracy? My reason for asking is an issue that is typically raised around forecasting is that it lacks decision-relevance, and that even if forecasts are elicited they have limited influence on the final decision. It'd be interesting to know if you found that perception as well, and if not, if there were any incentive benefits (i.e. if they felt their forecast would inform decisions, then did they become more accurate/try harder).

Out of interest, was there any training provided to participants, before during or after the tournament? 

Thanks for the questions - your experience certainly sounds interesting as well (coming from someone with a smidgeon of past experience in the UK)!

As for the link between decision-relevance and forecaster activity: I think it bears repeating just how actively we had to manage our partnerships to not end up with virtually every question being long-term, which:

a) while obviously not instantly dropping decision relevance is at least heuristically tied to it (insofar as there are by default fewer incentives to act on any information regarding the future than there are for more immediate datapoints); b) presents a fundamental obstacle for both evaluating forecast accuracy itself (as the questions just linger unresolved) and for the tournament model which seeks to reward this accuracy or a proxy thereof

That being said, from the discussions we had I feel at least somewhat confident in making two claims: a) forecasters definitely cared about who will use the predictions and to what effect, though there didn't seem to be significant variance in turnout or accuracy (insofar as we can measure it) bar a few outlier questions (which were duds on our part). b) as a result and based on our exit interviews with the top forecasters, I would think about decision-relevance as a binary or categorical variable, rather than a continuous one. If the forecasting body continuously builds credibility in presenting questions and giving feedback from the institutions, it activates the "I'm not shouting into the void" mode of forecasters and delivers any benefits that might have.

At the same time, however, it is possible that none of our questions involved a leveled-up immediate question ("Is Bin Laden hiding in the compound..."), where a threshold would be crossed and suddenly activate an even more desirable mode of thinking/evaluating evidence. It's questionable, however, whether even if such a threshold exists, a sustainable forecasting ecosystem can be built that exists on the other side of it (though this would be the dream scenario, of course).

As for training: in the previous tournament we ran, there was a compulsory training course on the basics such as base rates, fermisation, etc. Given that many participants in FORPOL had already taken part in it's predecessor, and that our sign-ups indicated that most were familiar with these from having read Superforecasting or already forecasted elsewhere, we kept an updated version of the short training course available, but no longer compulsory. There was no directed training after the tournament, as we did not observe demand for it.

Lastly, perhaps one nugget of personal experience you might find relevant/relatable: when working with the institutions, it definitely was not rare to feel like the causal inference aspects (and even just eliciting cognitive models of how the policy variables interact) might have deserved a whole project to themselves.

Curated and popular this week
 ·  · 32m read
 · 
Summary Immediate skin-to-skin contact (SSC) between mothers and newborns and early initiation of breastfeeding (EIBF) may play a significant and underappreciated role in reducing neonatal mortality. These practices are distinct in important ways from more broadly recognized (and clearly impactful) interventions like kangaroo care and exclusive breastfeeding, and they are recommended for both preterm and full-term infants. A large evidence base indicates that immediate SSC and EIBF substantially reduce neonatal mortality. Many randomized trials show that immediate SSC promotes EIBF, reduces episodes of low blood sugar, improves temperature regulation, and promotes cardiac and respiratory stability. All of these effects are linked to lower mortality, and the biological pathways between immediate SSC, EIBF, and reduced mortality are compelling. A meta-analysis of large observational studies found a 25% lower risk of mortality in infants who began breastfeeding within one hour of birth compared to initiation after one hour. These practices are attractive targets for intervention, and promoting them is effective. Immediate SSC and EIBF require no commodities, are under the direct influence of birth attendants, are time-bound to the first hour after birth, are consistent with international guidelines, and are appropriate for universal promotion. Their adoption is often low, but ceilings are demonstrably high: many low-and middle-income countries (LMICs) have rates of EIBF less than 30%, yet several have rates over 70%. Multiple studies find that health worker training and quality improvement activities dramatically increase rates of immediate SSC and EIBF. There do not appear to be any major actors focused specifically on promotion of universal immediate SSC and EIBF. By contrast, general breastfeeding promotion and essential newborn care training programs are relatively common. More research on cost-effectiveness is needed, but it appears promising. Limited existing
 ·  · 2m read
 · 
Summary: The NAO will increase our sequencing significantly over the next few months, funded by a $3M grant from Open Philanthropy. This will allow us to scale our pilot early-warning system to where we could flag many engineered pathogens early enough to mitigate their worst impacts, and also generate large amounts of data to develop, tune, and evaluate our detection systems. One of the biological threats the NAO is most concerned with is a 'stealth' pathogen, such as a virus with the profile of a faster-spreading HIV. This could cause a devastating pandemic, and early detection would be critical to mitigate the worst impacts. If such a pathogen were to spread, however, we wouldn't be able to monitor it with traditional approaches because we wouldn't know what to look for. Instead, we have invested in metagenomic sequencing for pathogen-agnostic detection. This doesn't require deciding what sequences to look for up front: you sequence the nucleic acids (RNA and DNA) and analyze them computationally for signs of novel pathogens. We've primarily focused on wastewater because it has such broad population coverage: a city in a cup of sewage. On the other hand, wastewater is difficult because the fraction of nucleic acids that come from any given virus is very low,[1] and so you need quite deep sequencing to find something. Fortunately, sequencing has continued to come down in price, to under $1k per billion read pairs. This is an impressive reduction, 1/8 of what we estimated two years ago when we first attempted to model the cost-effectiveness of detection, and it makes methods that rely on very deep sequencing practical. Over the past year, in collaboration with our partners at the University of Missouri (MU) and the University of California, Irvine (UCI), we started to sequence in earnest: We believe this represents the majority of metagenomic wastewater sequencing produced in the world to date, and it's an incredibly rich dataset. It has allowed us to develop
Linch
 ·  · 6m read
 · 
Remember: There is no such thing as a pink elephant. Recently, I was made aware that my “infohazards small working group” Signal chat, an informal coordination venue where we have frank discussions about infohazards and why it will be bad if specific hazards were leaked to the press or public, accidentally was shared with a deceitful and discredited so-called “journalist,” Kelsey Piper. She is not the first person to have been accidentally sent sensitive material from our group chat, however she is the first to have threatened to go public about the leak. Needless to say, mistakes were made. We’re still trying to figure out the source of this compromise to our secure chat group, however we thought we should give the public a live update to get ahead of the story.  For some context the “infohazards small working group” is a casual discussion venue for the most important, sensitive, and confidential infohazards myself and other philanthropists, researchers, engineers, penetration testers, government employees, and bloggers have discovered over the course of our careers. It is inspired by taxonomies such as professor B******’s typology, and provides an applied lens that has proven helpful for researchers and practitioners the world over.  I am proud of my work in initiating the chat. However, we cannot deny that minor mistakes and setbacks may have been made over the course of attempting to make the infohazards widely accessible and useful to a broad community of people. In particular, the deceitful and discredited journalist may have encountered several new infohazards previously confidential and unleaked: * Mirror nematodes as a solution to mirror bacteria. "Mirror bacteria," synthetic organisms with mirror-image molecules, could pose a significant risk to human health and ecosystems by potentially evading immune defenses and causing untreatable infections. Our scientists have explored engineering mirror nematodes, a natural predator for mirror bacteria, to