What this is: A technical post about how I believe binary forecasts should be made. There are probably some minor mathematical mistakes, but I doubt there are major mistakes.
Probabilistic forecasts can rationally be updated without anything happening except the passage of time. This appears to be a violation of conservation of expected evidence but it isn't so, as the passage of time is often evidence in itself.
To analyze how the passage of time affects binary forecasts I use conditional prediction curves , your predicted forecast at time given the information at time . It's often possible to construct well-motivated conditional prediction curves automatically, and I provide some details in the context of the "Will happen by time ?"" type of question.
The benefits of thinking with conditional prediction curves are large, both from the point of view of the individual forecaster, the forecast aggregator, and the scorer.
- The forecaster has no need to repeatedly go and update his predictions despite no new information coming in,
- The aggregator can aggregate the probabilistic forecasts at any point in time,
- The scorer can score a continuous stream of predictions instead of a dislocated bunch of them.
Some downsides of using prediction curves include:
- Different kinds of questions require different models. Questions of the sort "Will the event happen before the event ?" shouldn't be handled in the same way as the "Will happen by time ?" kind of question.
- Making informal forecasts using conditional prediction curves is harder than making point forecasts. Tutorials and non-technical explanations would probably be required.
- It would take some effort to create technical solutions for them.
But prediction curves help us in understanding forecasting too.
- Any rational forecaster's conditional prediction curve will be decreasing if the question looks like "Will happen by time ?"
- But the conditional prediction curve will be constant for questions like "Will the event happen on time ?"
- More complicated kinds of questions won't be as regular. Questions on the form "Will the event happen before the event ?"" can have arbitrary conditional prediction curves.
Define the variables You can think about and using Metaculus questions. For instance, if the question is "Will a non-state actor develop their own nuclear weapon by 2030?", would be if a non-state actor develops a nuke by 2030 and otherwise. The random variable equals if and the point in time the nuke is developed if .
A prediction curve for is a random function that forecasts the outcome at every time-point . If , I'll assume that if and otherwise, so you can't make any studip predictions after the event time to screw yourself over.
We can score prediction curves using the integrated scoring rule where is any proper scoring rule (with being best) and is the starting time. Then is a proper scoring rule for prediction curves (the exact formulation of what this means is a little technical, see the appendix for a proof), meaning it will always be beneficial to supply your the prediction curve you believe in the most. The scoring rule might be strictly proper too, with the proper definitions and assumptions, but I haven't investigated it yet. One reason to use an integrated scoring rule is to insentivize honest reporting even when the event time is far away.
Prediction curves are not used by forecasting sites such as Metaculus. That might be because supplying prediction curves is too much to ask of their audience. But it is possible to construct reasonable prediction curves without too much additional work.
The rational forecaster
Define the information set , A rational forecaster with information set is one who makes probabilistic forecasts at each time point using his best available evidence . Define the rational prediction curve as the stochastic process Here is random since is random. When is a proper scoring rule is the optimal prediction given the information set according to in the sense that when is a suitable class of random functions.
The conditional prediction curve
Define the conditional prediction curve as the expected forecast at time given the information at time , but conditioned on the event not yet having happened Then is the best possible prediction curve based on in the sense that You can interpret as
- the rational prediction at time of a forecaster who missed all the information from time to time .
- the rational prediction at time of a lazy forecaster who did not bother to look for any new information after time .
- the actually rational prediction at time when information arrives in bursts, not continuously, and the last bit of information became available at time .
In practice we need the conditional prediction curve because no one is able to update continuously. Call it bounded rationality if you want. The idea is to have each forecaster update their prediction curve whenever they make a forecast, yielding the final prediction curve. If a forecaster provides conditional prediction curves at the times , the final prediction curve is
The prediction curve below contains two updates, one at and one at . The curves in-between updates are conditional prediction curves: The black curve is the conditional prediction curve , the red is , the blue is . Together they form your final prediction curve. The event time is , but that is random an unknown to the forecaster. The conditional forecasts are made using the constant hazard model, discussed in a later section.
Three common question categories
Questions of the sort described above are probably too general to work with, but most can be placed into one of three categories.
Type 1: "Will the event fail to happen by time ?"
Most questions on Metaculus can be written on the form "Will happen by time ?". Examples include "Will a coup or regime change take place in Russia in 2022 or 2023?" and "Will Putin and Zelenskyy meet to discuss the peaceful resolution of the Russian-Ukrainian conflict before 2023?". We will look at the questions formulated on the form "Will the event fail to happen by time ?" as it makes the mathematics slightly cleaner.
For these questions we don't need to model the probability of at all, yielding the prediction curve and the conditional prediction curve
The conditional prediction curve is non-decreasing in for every . Moreover, is strictly increasing in under the very minor condition that the hazard rate of is strictly positive. (This means that there's a possibility of the question resolving at every time point .)
Thus a rational forecaster will always expect the probability of a positive resolution to increase in time. Not expecting the probability to decrease is irrational.
Aside from being non-decreasing and starting in , there are no restrictions on the conditional prediction curve. There are plenty of examples of conditional prediction curves for this kind of question in in the next section.
Type 2: "Will the event happen on time ?"
Some questions on Metaculus can be written on this form. Examples include "Will Ontario's Conservative Party (PC) win the a majority in the election on 2022-06-02?" and "Will Volodymyr Zelenskyy be named Time Person of the Year in 2022?"". In these questions the resolution data is fixed at , so time has no influence except through the information source . Thus the prediction curve is constant in , and we're in the intuitive setting that we cannot expect our prediction to change in the future.
For questions of type "Will the event happen on time ?", the conditional predictive curve is constant
Type 3: "Will the event happen before the event ?"
Questions of this nature are uncommon on Metaculus. The only example I found in my search was "Alexei Navalny to become president or prime minister of Russia in his lifetime?" This questions resolves positively if Navalny becomes PM/president () happens before Navalny dies . Models for problems of this nature are known as competing risk models.
To model it, define the two times and together with and . Then There is no general regularity in unless we know something special about the hazard rates of and .
Let be any function with range . Then there is a model for and an information set so that .
On one hand, the additional complexity suggests that questions of the "Will the event happen before the event ?" form should be avoided. One the other hand, they are quite easy to model, provided you're willing to provide the prediction curve or hazard rate for both variables and . This can be done using the techniques in the next section.
Parametric conditional prediction curves in "Will the event fail to happen by time ?" types of questions
Suppose we know the conditional hazard function at time and denote it Then we can write the prediction curve as see the appendix for the proof. This formulation of the conditional prediction curve is helpful as it's relatively easy to interpret hazard rates. Much of the literature in survival analysis / time-to-event data is formulated in terms of hazard rates too. If you're willing to assume a parametric form for the hazard rate you can construct conditional prediction curves semi-automatically. We'll take a closer look at three examples: Constant hazard rates, Weibull hazard, and Gompertz--Makeham hazards.
If you're willing to assume a parametric form for the hazard rate you can construct conditional prediction curves (semi-)automatically.
Suppose we may assume the hazard rate is constant, i.e., with unknown. Using the equation we see that If you know the point forecast , we may use it to derive . Solving for , we find that so the implied conditional prediction curve is
Suppose the current date is in the middle 2022 and we consider the "Will Putin and Zelenskyy not meet to discuss the peaceful resolution of the Russian-Ukrainian conflict before 2023?". Then we can put and . In the plot below we show , where was the Metaculus prediction at the time. When is reasonably large, the conditional prediction curve is almost linear, making a reasonable approximation to .
The benefits of assuming a constant hazard rate lies in its simplicity.
- The forecaster doesn't have to put in more work in the constant hazard model, everything happens automatically.
- From the aggregator's point of view, the constant hazard model allows you to do principled aggregation using only one point data, as you can derive the most up-to-date prediction for every forecaster.
- The scorer can calculate principled scores using the scoring rule straight from the data.
The Weibull hazard is usually written on the form It is used to model increasing (when ) or decreasing hazard rates. The conditional prediction curve is
To use the Weibull hazard you can an provide a point estimate at the current time and then either
- visually modify the curve until you're pleased with the look,
- provide another point estimates and deduce the values of mathematically,
- provide more than two points and use e.g. least squares to find the best-fitting curve.
Take the logarithm of and solve for The conditional prediction curve can be written in terms of and
Now you can plot the hazard rate and the conditional prediction curve while sliding around. You can stop at the you're most comfortable with.
Suppose and . We need to solve This is equivalent to solving Since we have . In addition, is increasing in , as can be verified by taking its derivative, and has asymptotes at and , so the equality has a solution that can be found using root-finding.
Example: "Will India have at least 200 nuclear warheads at the end of 2023?"
This plausibly a question with increasing rate. The description says that "As of May 2021, the Federation of American Scientists estimated India as having 160 nuclear warheads." In order to reach warheads, they first have to reach , then , and so on, making it more likely they finally reach at a given instance as time goes on.
Due to the way I've formulated the mathematics, we have to analyze the opposite question "Will India have less than 200 nuclear warheads at the end of 2023?" instead.
Suppose I make the forecast at time equal to the the last day of 2022, and suppose that on the last day of . Then The plot below shows the resulting prediction curves for . To interpret the red line, observe that the prediction barely changes when is small enough. This reflects that the probability of India obtaining nukes is small in the short term. However, as approach and they still haven't acquired nukes, the probability of them not acquiring them increases rapidly.
The Gompertz--Makeham hazard has the form . From we find that The Gompertz--Makeham hazard has an age-dependent'' term (the Gompertz term) and a age-independent term (the Makeham term). We can potentially think of them independently. In some cases there are multiple sources both of age-dependent and age-independent terms, making it a multi-Gompertz--Makeham hazards. If we have Gompertz components the -Gompertz--Makeham hazard is with conditional prediction curve
Example: "Will Putin be stay in power until August 11th 2030?"
We can divide the hazards into three parts: Mortality, time-independent hazard for being kicked out of power, time-independent hazard for a coup, and time-dependent hazard for a coup.
- Mortality. This document estimates a Gamma--Gompertz--Makeham model on US data and finds parameters and and (this means the baseline age is , i.e., the mortality starts increasing with age only at age ). This is not the right country nor the right model but the parameters should be close enough. Since there are some rumors of Putin being sick, I'll modify the constant hazard to . Since the hazard .
- Time-independent hazard for a coup. I haven't found a good source on this, but it's probably not too hard to find following the leads in e.g. this paper. I'm guessing a yearly ambient risk of a coup.
- Time-dependent hazard for a coup. For instance, one might reasonably think this one will decrease with temporal distance from the start of the Ukraine war. Let's say the Ukraine conflict adds an annual mortality of right now, expected to decrease to in two years time. Thus and , which implies
We end up with the hazard rate , a sum of two Gompertz components and one Makeham component.
It appears that my complicated Gompertz-Makeham modelling has been for naught, as the prediction curve is virtually identical to the constant hazard prediction curve. I don't know if we should expect this to happen in general or not. It might be because the two Gompertz components cancel each other other out.
As a side effect, this analysis also yields a density for Date Putin Exits Presidency of Russia. The expression for the survival curve is which can be differentiated to find the density as seen below.
I feel quite confident that conditional prediction curves is the best option for handling the time problem in binary forecasts. There are some alternatives, such as providing the entire distribution , but that looks quite cumbersome. There are many benefits from using conditional prediction curves (for the forcasters, aggregators, and scorers), they are not too difficult to implement for forecasting platforms, and it should be possible to develop good tutorials that makes forecasters comfortable with them.
It would be great to find out if the complicated hazard functions are worth the hassle -- maybe the constant hazard is enough for most purposes? The Putin example suggest a constant hazard rate might be enough, as the complicated multi-Gompertz--Makeham prediction curve plot is virtually the same as the constant hazard prediction curve based on the same !
I don't have too many hints for how to choose among the different hazard functions. But you might use empirics as a guide. For instance, the Gompertz--Makeham hazard appears to fit mortality data better than the Weibull--Makeham hazard but the difference appears to be marginal. If you're dealing with questions such as "Will Putin be ousted as president of Russia by 2030?", such observations might help you. There are also theoretical reasons to prefer one over the other in some cases, but I don't know if they are useful.
It could be reasonable to mix the Weibull and Gompertz components too, for instance following the same kind of reasoning as in the Putin example above. There are infinitely many hazard functions I haven't talked about at all, such as the log-normal hazard. Some of these may have nice interpretations that could help the forecaster.
Proof that is proper
We show that the weighted version is a proper scoring rule for any positive weighting function . Let denote the true probability , where is the information observed until time . Let be any other stochastic process adapted to .Since is non-positive, we can apply Fubini's theoremto get
where the second equality follows from iterated expectations. Since is the true probability of conditioned on ,we have since is a proper scoring rule. It follows that hence is a proper scoring rule.
Comment on the scoring rule
The scoring rule has the weakness that early forecasters are penalized. If the scoring rule is bounded above, such as the Brier score, early forecasting can be incentivized by setting for all time points before the forecaster made their first forecast. Other than that, it appears to me to be a reasonable scoring rule to evaluate forecasts in time. There are other potential scoring rules, such as , which do not appear to be proper for predicition curves; but it might also be that prediction curves aren't the correct abstraction.
Proof that .
We know that Using the equality , where is the survival function and the hazard rate, we find that The equality follows from the definition of .
Proof of Proposition 2, that is non-decreasing in for every .
Suppose that . Using we find that Since , , hence . In the same way, if the hazard rate is strictly positive, we have for all , , hence .
Proof of Proposition 3
We can ignore the dependence on and work directly on probability measures. In this case , and we see that
We find that Thus we need to equate
Multiply both sides by to obtain and differentiate with respect to to get Multiply both sides by to obtain which can be rearranged to The function is a hazard function if and only if it is non-negative, hence we require i.e., Solving the equality yields , but this function is negative when is positive, hence it's not in general a hazard function.
We can fix this by defining for if is non-positive, while .