Simple Forecasting Metrics?
I've been thinking about the simplicity of explaining certain forecasting concepts versus the complexity of others. Take calibration, for instance: it's simple to explain. If someone says something is 80% likely, it should happen about 80% of the time. But other metrics, like the Brier score, are harder to convey: What exactly does it measure? How well does it reflect a forecaster's accuracy? How do you interpret it? All of this requires a lot of explanation for anyone not interested in the science of Forecasting.
What if we had an easily interpretable metric that could tell you, at a glance, whether a forecaster is accurate? A metric so simple that it could fit within a tweet or catch the attention of someone skimming a report—someone who might be interested in platforms like Metaculus. Imagine if we could say, "When Metaculus predicts something with 80% certainty, it happens between X and Y% of the time," or "On average, Metaculus forecasts are off by X%". This kind of clarity could make comparing forecasting sources and platforms far easier.
I'm curious whether anyone has explored creating such a concise metric—one that simplifies these ideas for newcomers while still being informative. It could be a valuable way to persuade others to trust and use forecasting platforms or prediction markets as reliable sources. I'm interested in hearing any thoughts or seeing any work that has been done in this direction.
‘Five Years After AGI’ Focus Week happening over at Metaculus.
Inspired in part by the EA Forum’s recent debate week, Metaculus is running a “focus week” this week, aimed at trying to make intellectual progress on the issue of “What will the world look like five years after AGI (assuming that humans are not extinct)[1]?”
Leaders of AGI companies, while vocal about some things they anticipate in a post-AGI world (for example, bullishness in AGI making scientific advances), seem deliberately vague about other aspects. For example, power (will AGI companies have a lot of it? all of it?), whether some of the scientific advances might backfire (e.g., a vulnerable world scenario or a race-to-the-bottom digital minds takeoff), and how exactly AGI will be used for “the benefit of all.”
Forecasting questions for the week range from “Percentage living in poverty?” to “Nuclear deterrence undermined?” to “‘Long reflection’ underway?”
Those interested: head over here. You can participate by:
* Forecasting
* Commenting
* Comments are especially valuable on long-term questions, because the forecasting community has less of a track record at these time scales.[2][3]
* Writing questions
* There may well be some gaps in the admin-created question set.[4] We welcome question contributions from users.
The focus week will likely be followed by an essay contest, since a large part of the value in this initiative, we believe, lies in generating concrete stories for how the future might play out (and for what the inflection points might be). More details to come.[5]
1. ^
This is not to say that we firmly believe extinction won’t happen. I personally put p(doom) at around 60%. At the same time, however, as I have previously written, I believe that more important trajectory changes lie ahead if humanity does manage to avoid extinction, and that it is worth planning for these things now.
2. ^
Moreover, I personally take Nuño Sempere’s “Hurdles of using f
Not that we can do much about it, but I find the idea of Trump being president in a time that we're getting closer and closer to AGI pretty terrifying.
A second Trump term is going to have a lot more craziness and far fewer checks on his power, and I expect it would have significant effects on the global trajectory of AI.
Metaculus introduces 'Changed my mind' button
This short take is a linkpost for this Discussion Post by Metaculus's Technical Product Manager
* Do you sometimes read a comment so good that you revise your whole world model and start predicting the opposite of what you believed before?
* Do you ever read a comment and think “Huh. Hadn’t thought of that.” and then tweak your prediction by a few percentage points?
* Do you ever read a comment so clearly wrong that you update in the opposite direction?
* Do you ever wish you could easily tell other forecasters that what they share is valuable to you?
* Do you ever want to update you prediction right after reading a comment, without getting RSI in your scrolling finger?
Did a comment change your mind? Give Metaculus's new 'Changed my mind' button a click!
And for binary questions, clicking the button lets you update your prediction directly from the comment.
This December is the last month unlimited Manifold Markets currency redemptions for donations are assured: https://manifoldmarkets.notion.site/The-New-Deal-for-Manifold-s-Charity-Program-1527421b89224370a30dc1c7820c23ec
Highly recommend redeeming donations this month since there are orders of magnitude more currency outstanding than can be donated in future months
As someone predisposed to like modeling, the key takeaway I got from Justin Sandefur's Asterisk essay PEPFAR and the Costs of Cost-Benefit Analysis was this corrective reminder – emphasis mine, focusing on what changed my mind:
More detail:
Tangentially, I suspect this sort of attitude (Iraq invasion notwithstanding) would naturally arise out of a definite optimism mindset (that essay by Dan Wang is incidentally a great read; his follow-up is more comprehensive and clearly argued, but I prefer the original for inspiration). It seems to me that Justin has this mindset as well, cf. his analogy to climate change in comparing economists' carbon taxes and cap-and-trade schemes vs progressive activists pushing for green tech investment to bend the cost curve. He concludes:
Aside from his climate change example above, I'd be curious to know what other domains economists are making analytical mistakes in w.r.t. cost-benefit modeling, since I'm probably predisposed to making the same kinds of mistakes.
For a long time I found this surprisingly nonintuitive, so I made a spreadsheet that did it, which then expanded into some other things.
* Spreadsheet here, which has four tabs based on different views on how best to pick the fair place to bet where you and someone else disagree. (The fourth tab I didn't make at all, it was added by someone (Luke Sabor) who was passionate about the standard deviation method!)
* People have different beliefs / intuitions about what's fair!
* An alternative to the mean probability would be to use the product of the odds ratios.
Then if one person thinks .9 and the other .99, the "fair bet" will have implied probability more than .945.
* The problem with using Geometric mean can be highlighted if player 1 estimates 0.99 and player 2 estimates 0.01.
This would actually lead player 2 to contribute ~90% of the bet for an EV of 0.09, while player 1 contributes ~10% for an EV of 0.89. I don't like that bet. In this case, mean prob and Z-score mean both agree at 50% contribution and equal EVs.
* "The tradeoff here is that using Mean Prob gives equal expected values (see underlined bit), but I don't feel it accurately reflects "put your money where your mouth is". If you're 100 times more confident than the other player, you should be willing to put up 100 times more money. In the Mean prob case, me being 100 times more confident only leads me to put up 20 times the amount of money, even though expected values are more equal."
* Then I ended up making an explainer video because I was excited about it
Other spreadsheets I've seen in the space:
* Brier score betting (a fifth way to figure out the correct bet ratio!)
* Posterior Forecast Calculator
* Inferring Probabilities from PredictIt Prices
These three all by William Kiely.
Does anyone else know of any? Or want to argue for one method over another?
Hi all!
Nice to see that there is now a sub-forum dedicated to Forecasting, this seems like a good place to ask what might be a silly question.
I am doing some work on integrating forecasting with government decision making. There are several roadblocks to this, but one of them is generating good questions (See Rigor-Relevance trade-off among other things).
One way to avoid this might be to simple ask questions about the targets the government has already set for itself, a lot of these are formulated in a SMART [1] way and are thus pretty forecastable. Forecasts on whether the government will reach its target also seem like they will be immediately actionable for decision makers. This seemed like a decent strategy to me, but I think I have not seen them mentioned very often. So my question is simple: Is there some sort of major problem here I am overlooking?
The one major problem I could think of is that there might be an incentive for a sort of circular reasoning: If forecasters in aggregate think that the government might not be on its way to achieve a certain target then the gov might announce new policy to remedy the situation. Smart Forecasters might see this coming and start their initial forecast higher.
I think you can balance this by having forecasters forecast on intermediate targets as well. For example: Most countries have international obligations to reduce their CO2 emissions by X% by 2030, instead of just forecasting the 2030 target you could forecasts on all the intermediate years as well.
1. ^
SMART stands for: Specific, Measurable, Assignable, Realistic, Time-related - See https://en.wikipedia.org/wiki/SMART_criteria