The purpose of this article (cross-posted from fp21.org) is to share our model for forecasting implementation and gain community input on the substance of the piece. We’d be excited about any feedback.

Background on us: fp21 is a nonprofit think tank dedicated to transforming the processes and institutions of U.S. foreign policy. We research and advocate for changes to the government’s composition, organization, and function to improve preparation and response in existential scenarios and bolster general welfare.

Applying Forecasting to Policy

There is little publicly available guidance about how to integrate forecasting into the policy process. Some proponents view it as merely an analytical tool, like intelligence analysis. For example, Phil Tetlock, suggests that forecasting “should be seen as a complement to expert analysis, not a substitute for it.”

The implicit theory for many in the forecasting community is supply focused: simply generating accurate forecasts for policymakers will improve their understanding of the world and improve decision quality. Other forecasting advocates are more radical, suggesting that all policy decisions are inherently acts of prediction and invite deeper changes to the policy process.

The first step for testing the promise of forecasting is to develop models for how it might integrate into the actual foreign policy decision-making process. Here we posit four models for how this might be done, and speculate on the strengths and weaknesses of each model.

Figure 1: Strengths and weaknesses of the four models for integrating forecasting in policymaking.

1. The analytical model

Accurate forecasts about certain events would be made available to policymakers. No other process changes would be implemented. This development of this model is already mature. For example, the image below is from the popular forecasting platform Metaculus.

Figure 2: Metaculus platform example questions

Strength: This is the least disruptive for existing organizational models because forecasting (much of which could be conducted externally) is just another input into a policymaker’s existing decision process. It exposes the logic of forecasting to policymakers in a low-stakes environment. Forecasts here would likely be most impactful when they draw attention to counter-intuitive emergent trends. 

Weakness: I’m skeptical about the efficacy of this model for achieving direct impact. Policymakers using existing mental models and decision-making heuristics will not likely have their beliefs meaningfully changed by the added precision bonus good forecasting can provide. One could imagine a policymaker cherry-picking only forecasts which support their beliefs and ignoring forecasts which challenge their priors. Further, policymakers would not be involved in the learning process at all in this model.

2. The early warning model

In this model, forecasts could be embedded into early warning systems focused on discrete, high-priority issue areas. Early warning systems for mass atrocitiespolitical instability, and nuclear war already exist and provide examples of this approach, in which forecasters proactively scan the globe for warning signs. Forecasting in this model remains an analytical product intended to focus the attention of a policymaker without dictating a particular response. Other relevant issue areas might be predicting future pandemics, political instability, environmental conflict, arms races, or trade wars.

Strength: Embedding forecasting within discrete issue areas makes bureaucratic sense. One could easily imagine various bureaus within the State Department hosting their own forecasting teams to build expertise and attention for their issue area. This approach might offer a useful foothold within the bureaucracy from which to expand if the techniques are found to be useful.

Weakness: This model would suffer some of the same weaknesses as the first model. If policymakers aren’t involved in the generation of forecasts, it may have little impact on their thinking. Further, constraining forecasting to discrete issue areas limits its applicability.

Estimated risk of new mass killing, 2022–23. From the Early Warning Project.

3. The policy evaluation model

This model asks forecasters to evaluate the possibility of success for discrete policy options in order to highlight the policy interventions most likely to succeed. For instance, three potential interventions could be proposed by decision-makers, and forecasters would estimate the likeliness of success for each option. The options could also be compared with baseline forecasts of the status quo. 

All policy decisions can be viewed as acts of prediction: the decision-maker is betting that their chosen intervention will change the status quo in a way that will benefit their goals. These are often called conditional forecasts, which can be phrased as if/then statements: if the United States conducts intervention X, then we predict that the world will diverge from the status quo outcome Y and we will instead observe outcome Z.

Strength: This seems like a plausible approach for integrating forecasting into decision-making to improve the quality of policymaking. While policymakers would not be required to comply with forecaster recommendations, generating discrete probabilities about the likely success of certain policy tools would incentivize decision-makers to engage with the logic underlying relevant forecasts. Conditional forecasting would also require policymakers to identify discrete and falsifiable goals of their policies, which would already be a major process improvement. Such rigorous approaches might shift the center of gravity of the policy debate toward evidence and away from ideology and turf. It would also provide the foundations for more active learning in the policymaking sphere: policymakers would be able to improve their policymaking skills by studying which of their interventions succeeded and failed.

Weakness: Decision-makers will likely resist the meddling of forecasters on their sacred turf. This intervention would insert forecasting into highly politicized spaces, jeopardize the objectivity of forecasting, and potentially threaten the authority of policymakers. It would also likely require more resources and slow down the decision-making process. Finally, conditional forecasting uses the same logic as the status quo, but its efficacy has not been rigorously studied.

4. The decision-making model

This model would be a big change to the status quo whereby forecasting methods supplant the existing decision-making process altogether. All policy decisions would be presented as testable forecasting questions requiring analysis. Ethical and political judgment would still have a prominent role in the decision-making process, but all empirical claims about likely policy outcomes would be subjected to probabilistic estimates.

Strength: Integrating forecasting at every level of the policy process would create the conditions for a much more intense focus on the impact of foreign policy interventions. It could also drive a much-needed discussion about the merit of decision-makers, which is rather ambiguous without an understanding of the effectiveness of their policy judgment. Such an approach has the potential to have a transformative impact on the quality of foreign policy.

Weakness: The resource requirements to implement forecasting at this scale, and the depth of the organizational changes on which it would depend, make this proposal unviable for the foreseeable future. Much more research and policy experience would be needed to develop the evidence needed to advance such a grand vision.

Alternative Perspectives on Decision-Making

We want to expand on this last “decision-making” model. While this option is the least viable option, it deserves further exploration.

The best way to think about integrating forecasting and decision-making is to break every policy proposal into causal mechanisms and then treat each component as a forecasting question. This requires carefully considering each action/change/step that would need to occur for the policy to achieve success.

Let’s say the policy solution under consideration is “the US will send weapons to Ukraine in order to push back against the Russian invasion.” One could break this strategy down into components: a) the US will need to send the weapons to Ukraine; b) Ukrainians will need to get effectively trained on the weapons; c) the new weapons will need to be deployed in battle at a sufficient rate; d) the new weapons will have to achieve measurable improvements in battlefield effectiveness; e) the effectiveness of the Russian military strategy will be meaningfully undermined, and; f) Ukraine will win the war.

Such an approach encourages precise thinking from policymakers. And it will encourage each step of the policy-making proposal to be generated as a forecasting question:

  • How many weapons will the US actually be able to send to Ukraine (given financial, production, and transportation challenges)?
  • How many Ukrainian units will get trained to use American weapons?
  • How many battles will prominently feature US weapons? How many enemy forces will be killed with US weapons?
  • Given the above… How likely is it that Russia will retreat its military outside of Ukrainian lines by the end of 2023?
  • Given the above… How likely is it that Russia will escalate, including the use of nuclear weapons?
  • Given status quo conditions… How likely is it that Russia will retreat? Use nuclear weapons??

The policymaking team might proceed according to the following script:

  1. Clearly identify the problem statement or opportunity (e.g. Russia invades Ukraine)
  2. Clearly describe goal(s). Goals must be presented as falsifiable end-states (e.g. A complete withdrawal of Russian troops from Ukraine, avoidance of nuclear escalation, etc.). If there are multiple or competing goals, they need to be presented in priority order.
  3. Generate an array of strategies/policies that will achieve the goal(s). (e.g. send NATO troops to Ukraine).
  4. Break these strategies down into component parts
  5. Each claim generates its own forecasting question that can be passed to the prediction market.

We make no claims that supplanting the existing decision-making process will be superior to the existing method. Instead, We merely suggest this model deserves consideration and study.

Summary

Thinking critically about the ways in which we make decisions will help us improve the quality and accuracy of our policymaking process. Excellent research is accumulating on exactly how to achieve this goal. The forecasting methods rely on decades of research in cognitive psychology and decision science that help us understand deeply ingrained human biases that have long afflicted policymaking.

Forecasts generated outside of policymaking spaces are unlikely to meaningfully impact decision-making. New models need to be developed and tested if we hope to capitalize on new research into decision-making and advance our policy process.

The goal is to improve the quality of policymaking and make the world a safer place.
 

48

New comment
4 comments, sorted by Click to highlight new comments since: Today at 3:29 AM

Thanks for posting this, I strongly upvoted it for these reasons:

  1. It's concise but a very high information-to-padding ratio, higher than I sometimes find on the forum[1]
  2. IIDM is something I'm interested in, and seeing a high-quality post in this area is very welcome. I think it adds value to this area and brings attention to it on the Forum
  3. The structure of 'explanation - strength - weakness' was very clear, in general I thought that the whole post was well structured but this made the post very easy to follow

As for the content of the post itself, I agree with most of it and I look forward to reading the references and links! My only comment for consideration would be, all 4 models seem to have an underlying weakness - that they are actually unlikely to be fully integrated into policy decision-making circles, and that acts as a bottleneck on applying any form on forecasting in the policy realm.

So my questions in that area would be:

  1. Have there been any historical examples where forecasts where explicitly integrated into decision making, either in the public or private sectors? [My assumption is very little of both, and what there has is likely a lot more on the private than public side]
  2. What are the empirical barriers to forecasting being adopted in public policy? Are there case studies of this being attempted and shut down, and in these cases where were the key points leading to a rejection of these models?
  3. Are there any particular constituencies/polities where we might expect forecasting to have more of a foothold - where engagement by the EA IIDM community might lead to actual implementation?

And finally, I just want to end by saying again I thought it was a very good post :)

  1. ^

    Especially on my own posts....

I worked for the UK Civil Service and it was hard to push forecasting because:

  • Making markets is hard
  • Getting people to care about the numbers if they appear is hard

I think that the social problem of prediction market buy in is a bit harder than people generally think.

Michael story writes about it well here.

https://mwstory.substack.com/p/why-i-generally-dont-recommend-internal 

Cool models. Thanks for writing this.

Thanks for writing this up, it's very useful!

I'm curious about model 3 - the policy evaluation model.

I think this point is particularly insightful: "Conditional forecasting would also require policymakers to identify discrete and falsifiable goals of their policies, which would already be a major process improvement."

But I don't quite understand the thinking behind the following two points:

  • "generating discrete probabilities about the likely success of certain policy tools would incentivize decision-makers to engage with the logic underlying relevant forecasts." - how exactly do you see this model changing the incentives policymakers face, relative to the status quo (which includes conditional forecasts sometimes being generated on the likes of Metaculus etc)?

  • "It would also provide the foundations for more active learning in the policymaking sphere: policymakers would be able to improve their policymaking skills by studying which of their interventions succeeded and failed." - what's the process you envision here that enables active leasing? If policymakers themselves are the ones that are making forecasts, and can see how their predictions compare to actual outcomes, then I can see where the learning comes in. But if policymakers are still just consumers of forecasts in this model, I don't see how the supply of conditional forecasts would itself support policymakers' learning.

Thanks in advance for any additional detail you can provide on this proposal!