The Impactful Forecasting Prize contest ran from February 4 to March 11, 2022. Below we announce the results and share some reflections. Please share any feedback with us in the comments or privately.
Thank you to all participants and congratulations to the winners! The writeups below are from Eli’s perspective, as the winners were from submissions he primarily judged.
$1,500: RyanBeck for a writeup on whether genetic engineering will raise IQ by >= 10 points by 2050
I appreciated Ryan’s literature review densely packed with helpful background information, and his overall forecast seemed reasonable though I found the way the scenarios were presented a little unintuitive. I updated my forecast from 56% to 50% after reading it, with significantly greater resilience  than before. Kudos to Ryan for releasing an updated version of his writeup in the Metaculus Journal.
$1,000: qassiov for a writeup on whether synthetic biological weapons will infect 100 people by 2030
I liked qassiov’s decomposition into non-state actors, state actor accidents, and state accident intentional usage. I also liked their attention to detail on the resolution criteria, e.g. noticing that selective breeding would not count for a pathogen to be considered synthetic. qassiov is skeptical that technology will lower the barrier to entry much in the next 8 years, but I am more uncertain; I’m not sure how much to trust the source they cited here and it may be out of date. I updated from 41% to 35%.
$800: FJehn for a writeup on when carbon capture will costs <$50/ton
FJehn presented a brief but compelling case that scientists are much more pessimistic than a few companies about bringing carbon capture costs down. This updated me from a median of 2039 to 2055, albeit still with somewhat low resilience.
$700: rodeoflagellum for a writeup on how many gene-edited babies will be born by 2030
I found some of the background information in the writeup helpful. The reminder that gene editing to treat diseases is more widely supported than editing for enhancement was useful, with the caveat that the WHO is still fairly critical of the former. The approach of decomposing into scenarios was interesting, and thinking about the framing made me narrow the right tail of my distribution.
Level of interest
The quantity of submissions was lower than we hoped for; with 13 submissions from 8 unique forecasters while we aimed for 100 submissions from 25. On the other hand, the average quality of submissions was higher than we expected. Forecasters may have preferred to either spend lots of time to have a good shot at winning a prize, or not participate at all.
Another possible reason for lower quantity is the focus of the forecasting community on the Ukraine Conflict (e.g. on Metaculus) throughout much of the time the prize was open for submissions. To the extent this shift happened, much of it was likely deserved, e.g. we were heartened to see predictions on Metaculus alerting at least one Ukrainian observer to evacuate.
Some ideas for how we could have gotten more submissions, in case it’s helpful for organizers of future prizes:
- Be ambitious in who you reach out to for promotion, early. We reached out to Scott Alexander and were mentioned in an ACX open thread but we waited until over halfway through the submission window to try. We likely should have reached out to more people with large audiences, earlier.
- Launching near the end of the month and/or running competition for at least ~2 months seems preferable due to newsletter timing at the beginning of months. We launched on Feb 4, then were linked in several newsletters in early March giving people not much notice for submission by March 11.
Let us know if you have other ideas for how we could have done better, especially if you considered submitting but didn’t. Help us improve our hypotheses on the barriers to more people forecasting, both in general and for our contest! Feel free to share either in the comments or in this private and optionally anonymous form.
Evaluating submissions took more time than budgeted for, in part because the average length of submissions was longer than expected but also because as we started judging, we realized the evaluation criteria needed to be refined.
We originally were going to judge submissions mostly based on how much they changed our best-guess forecast, but we realized that this isn’t capturing all we care about. Also, it may have misaligned incentives, in particular the incentive to compose a one-sided writeup to convince the judge to update as much as possible in one direction.
Reflecting on the value of a forecast writeup, we identified 3 components of value:
- Improving the community’s or judge’s best guess forecast
- We approximated this by dividing the extent to which the judge’s best-guess forecast shifted by the initial resilience of the judge’s forecast.
- Increasing the resilience of the forecast: When making decisions based off of a forecast, it’s helpful to have more confidence in the answer, even if the further research needed to gain that confidence doesn’t shift the best guess forecast more.
- We subtracted the judge’s estimated initial resilience from their updated resilience.
- Providing a foundation for others to build off of: High quality comments, especially those that provide background research, clean decompositions, and other re-usable components, are useful in that future forecasts can build off of them without as much effort.
- We approximated this with a subjective quality rating.
To rank submissions, we took into account these 3 components with approximately equal weight.
A forecast is more resilient if I’d expect it to move less with more research. See https://forum.effectivealtruism.org/posts/m65R6pAAvd99BNEZL/use-resilience-instead-of-imprecision-to-communicate.
And of course also due to the everpresent planning fallacy.
We didn’t necessarily observe this during the contest, though it would be hard to know if it occurred.