Hide table of contents

The "Forecasting Innovations Prize" was announced on the 15th of November of 2020 on the Effective Altruism Forum and on LessWrong, with the goal of incentivizing valuable research around forecasting. We received 10 submissions.

Judges—AlexRJL, Eric Neyman, Tamay Besiroglu, Linch Zhang, Ozzie Gooen and myself— recommended a quantity of money to be awarded to each submission. The next section is a short summary of each entry, the prize they were assigned, and the reasons the judges gave. This is followed by a brief discussion of the judging process and takeaways.

We will be contacting authors soon.

Crowd-forecasting COVID-19

The post describes the results of a COVID-19 crowd-forecasting project created during the author's PhD. The judges didn’t know of any other app in which human forecasters could conveniently forecast different points in a time series, with confidence intervals. The project’s forecasts were submitted to the German and Polish Forecast Hub, and they did surprisingly well in comparison with other groups. 

Judges brought up the issue that R/shiny is probably the suboptimal technology for a web-app. Further, as of the time the post was published neither the post under consideration nor other submissions to the German and Polish Forecast Hub were able to outperform a model that simply predicts constant cases on a four-week horizon. 

This post receives a prize of $250.

Incentivizing forecasting via social media

The post explores the implications of integrating forecasting functionality with social media platforms. They consider several important potential issues in some length, and possible solutions to these, as well as indications for next steps. The scenario they consider— if it were to occur—could possibly have a large impact on the 'information economy'. 

However, as the author’s note, the feasibility of the proposal is very unclear (<1%, though note that Twitch recently added some prediction functionality). Further, the authors were not aware of Facebook’s Forecast at the time they wrote the post.

This post receives a prize of $250.

Central Limit Theorem investigation

The post visualizes how quickly the central limit theorem works in practice, i.e., how many distributions of different types one has to sum (or convolve) to approximate a Gaussian distribution in practice. The visualizations are excellent, and give the readers intuitions about how long the central limit theorem takes to apply. Judges thought that explanations of important ideas to a specific community are valuable even if they are only new to that community. 

As a caveat, the post requires understanding that the density of the sum of two independent variables is the convolution of their densities. That is, that when the post mentions “the number of convolutions you need to look Gaussian”, this is equivalent to “the number of times you need to sum independent instances of a distribution in order for the result to look Gaussian”. This point is mentioned in an earlier post of the overall sequence. Judges also weren’t sure to what extent this post was “forecasting-related.” Future competitions, if they happen, will have a clearer cut-off. 

This post receives a prize of $120.

Forecasting of Priorities (Czech Priorities)

This post explains a set of ideas by Czech Priorities to use forecasting as a method of public deliberation, in particular to identify "priorities'' or "mega-trends". Judges thought that, with a less messy design, this post could have won the first prize. In particular, it seems that this group has managed to convince the Czech government to give it two large grants and to pay attention to the result.

However, the suggested implementation really was quite messy. On the one hand, they suggest predicting the result of expert deliberation on the importance of “priorities”, but the selection of those experts could be politicized. On the other hand, one of the proposed mechanisms incorporates both forecasting and preference elicitation, and might not end up producing either good elicitation or good forecasting.

This post receives a prize of $90

One's Future Behavior as a Domain of Calibration

This post advocates for forecasting one's future actions, and presents the author's method to do so. Some judges liked that it is pretty easy for this post to have an actual impact, as long as at least one person acts on it. One small detail the judges disagreed with was the post’s assertion that calibration doesn’t transfer between domains (this somewhat conflicts with some of the judges’ own experiences)

This post receives a prize of $80

What to do about short AI timelines?

This short sequence gathers three posts on short timelines, and asks two questions: How to bet on short AI timelines, and how one’s influence depends on the length of AI timelines. This posts part of a longer running investigation by Daniel Kokotajlo into short timelines.

The posts used the EA forum’s question functionality, and the author didn’t seem very satisfied with the responses, though the least forecasting-related post in the series did see more discussion on LessWrong. Judges found that other posts by the author on the topic of timelines (e.g., this one) were much stronger, whereas the particular research questions in the prize submission didn’t really pan out. Some judges thought that question creation might be underrated. 

This post receives a prize of $70

How might better collective decision-making backfire?

The post is faithful to the title, and comes up with or elicits several pathways through which collective decision-making might backfire.

Judges found the question asked to be important, but found it hard to evaluate the answers, because there was no overall framework to do so. In particular, there was no discussion about which concerns were or would have been historically important. It is also unclear whether any practical actions will be taken as a result of the post, or whether it will be built upon.

This post receives a prize of $60

The Fermi Paradox has not been dissolved

The post points out some flaws in Dissolving the Fermi Paradox, a paper by Sandberg et al. Among other reasons, having good probabilities around the Fermi paradox is valuable because it provides (indirect) evidence about the existence of a "Great Filter" and thus for our likelihood of extinction.

Judges disagreed substantially about to what extent the points raised in the post were substantive, and to what extent the author was too overconfident or forceful. There was also some disagreement about whether the post was very related to “human judgmental forecasting.”

This post receives a prize of $50

The First Sample Gives the Most Information

The post concisely introduces a powerful and simple concept. Judges agreed that the post wasn’t hugely impactful, but that it probably did have a pretty great ratio of value to time spent on it. 

This post receives a prize of $50.

A tenth post was also submitted, but some flaws were identified, and the author asked us not to mention it until it is fixed.

Judging process

Judges read each submission and produced:

  • An assessment of the quality of the project (execution)
  • An estimate of how valuable the project was
  • A funding recommendation
  • Comments as to their reasoning

The reasons why the funding recommendations were not directly proportional to impact and quality were:

  • Adjusting for closeness to forecasting: more impactful projects which weren't that related to forecasting received smaller prices.
  • Some (but not all) judges tried to think about what signals giving higher or lower prizes sends. For example, some judges gave higher prizes to projects which had higher expected values even if they didn’t pan out in the end. Similarly, some judges penalized a post which sounded very overconfident even if it was otherwise impactful or valuable.
  • A high quality project can have low value if it belongs to a less impactful domain.
  • Some judges felt higher effort posts were worth more money per unit of impact, perhaps because lower effort posts could have been written by someone else if the original author hadn't done it.

After giving their initial estimates, judges met in a Zoom call to discuss their estimates. This was done by going project by project and bringing up disagreements. Afterwards, judges updated their estimates and recommendations. The final prize is simply the mean of all judges' recommendations. 

Comments and Reflections

The counterfactual impact of this prize seems uncertain. Of the 10 submissions, only three were counterfactually caused by the prize, with the other seven being submitted because I (Nuño) asked the authors to do so after finding them by browsing forecasting related content in the EA forum and LessWrong. 

Overall, it is possible that there were too many judges which spent too much time cumulatively judging, and that the marginal value of a judge wasn’t too high. However, when hashing out disagreements, each judge did bring unique points.

If there is a second round for this prize before 2022, entries published after the end of the first round will be accepted so as not to generate an incentive to not post forecasting-related content until there is a prize.

Appendix: Quality Adjusted Research Papers.

Judged also estimated the impact of these projects in terms of Quality Adjusted Research Papers (Qs). QARPs are intended to both have relative value (a 20 QARPs project should be estimated to be twice as valuable as a project which has 10 QARPs), and absolute meaning (0.1 QARPS, or 100mQARPs should correspond to "A fairly valuable paper", such as this one)

The value judges assigned to each submission was:

Note that this method of rating is highly speculative, and having judges using it was in part intended as a test. Judges brought up that they weren’t sure that the scale was well defined, and that they were much more sure about their own relative values than about the absolute magnitude. Also, note that this didn't consider the relevance of forecasting, which is the main reason why these values don't perfectly correlate with the prizes.

Comments5


Sorted by Click to highlight new comments since:

Thank you! I’m honored to have won a prize! (For “How might better collective decision-making backfire?”) :-D

Thank you also to the people who’ve contributed answers!

To respond to the quasi-questions about the post:

In particular, there was no discussion about which concerns were or would have been historically important.

Yeah, that’d be very interesting! I don’t know if I’ll find someone with the right expertise who I can get interested in researching this. Many of them are broadly applicable to the whole LW and rationality project, so I bet there are people with the right expertise and interests somewhere out there, though.

It is also unclear whether any practical actions will be taken as a result of the post, or whether it will be built upon.

My motivation for asking about it was that software I’m developing. I’ve started a write-up of my approach and reasoning behind it. Roughly, I categorize the risks by how urgent it is for us to address them and by how plausible it is that we can react to them rather than having to prevent them. I’ll probably continue that write-up once I’ve moved to my new place and worked on the software some more.

Congratulations also to all the other winners!

Yeah, that’d be very interesting! I don’t know if I’ll find someone with the right expertise who I can get interested in researching this.

This was also a point we discussed. Having something which builds upon someone else's work, or having something which will be built upon in the future generally makes a project more valuable. And in practice, I get the impression that it's mostly authors themselves which build upon their own work.

Good to hear, and thanks for the thoughts!

Another way we could have phrased things would have been,
"This post was useful in ways X,Y, and Z. If it would have done things A,B, and C it would be been even more useful."

It's always possible to have done more. Some of the entries were very extensive. My guess is that you did a pretty good job per unit of time in particular. I'd think of the comments as things to think about for future work.

And again, nice work, and congratulations!

Thank you! I read it like that, and I’m happy about the feedback too!

And I agree with it. It’s not like you can peek into my Google Drive. ^.^

[comment deleted]1
0
0
Curated and popular this week
Sam Anschell
 ·  · 6m read
 · 
*Disclaimer* I am writing this post in a personal capacity; the opinions I express are my own and do not represent my employer. I think that more people and orgs (especially nonprofits) should consider negotiating the cost of sizable expenses. In my experience, there is usually nothing to lose by respectfully asking to pay less, and doing so can sometimes save thousands or tens of thousands of dollars per hour. This is because negotiating doesn’t take very much time[1], savings can persist across multiple years, and counterparties can be surprisingly generous with discounts. Here are a few examples of expenses that may be negotiable: For organizations * Software or news subscriptions * Of 35 corporate software and news providers I’ve negotiated with, 30 have been willing to provide discounts. These discounts range from 10% to 80%, with an average of around 40%. * Leases * A friend was able to negotiate a 22% reduction in the price per square foot on a corporate lease and secured a couple months of free rent. This led to >$480,000 in savings for their nonprofit. Other negotiable parameters include: * Square footage counted towards rent costs * Lease length * A tenant improvement allowance * Certain physical goods (e.g., smart TVs) * Buying in bulk can be a great lever for negotiating smaller items like covid tests, and can reduce costs by 50% or more. * Event/retreat venues (both venue price and smaller items like food and AV) * Hotel blocks * A quick email with the rates of comparable but more affordable hotel blocks can often save ~10%. * Professional service contracts with large for-profit firms (e.g., IT contracts, office internet coverage) * Insurance premiums (though I am less confident that this is negotiable) For many products and services, a nonprofit can qualify for a discount simply by providing their IRS determination letter or getting verified on platforms like TechSoup. In my experience, most vendors and companies
 ·  · 4m read
 · 
Forethought[1] is a new AI macrostrategy research group cofounded by Max Dalton, Will MacAskill, Tom Davidson, and Amrit Sidhu-Brar. We are trying to figure out how to navigate the (potentially rapid) transition to a world with superintelligent AI systems. We aim to tackle the most important questions we can find, unrestricted by the current Overton window. More details on our website. Why we exist We think that AGI might come soon (say, modal timelines to mostly-automated AI R&D in the next 2-8 years), and might significantly accelerate technological progress, leading to many different challenges. We don’t yet have a good understanding of what this change might look like or how to navigate it. Society is not prepared. Moreover, we want the world to not just avoid catastrophe: we want to reach a really great future. We think about what this might be like (incorporating moral uncertainty), and what we can do, now, to build towards a good future. Like all projects, this started out with a plethora of Google docs. We ran a series of seminars to explore the ideas further, and that cascaded into an organization. This area of work feels to us like the early days of EA: we’re exploring unusual, neglected ideas, and finding research progress surprisingly tractable. And while we start out with (literally) galaxy-brained schemes, they often ground out into fairly specific and concrete ideas about what should happen next. Of course, we’re bringing principles like scope sensitivity, impartiality, etc to our thinking, and we think that these issues urgently need more morally dedicated and thoughtful people working on them. Research Research agendas We are currently pursuing the following perspectives: * Preparing for the intelligence explosion: If AI drives explosive growth there will be an enormous number of challenges we have to face. In addition to misalignment risk and biorisk, this potentially includes: how to govern the development of new weapons of mass destr
jackva
 ·  · 3m read
 · 
 [Edits on March 10th for clarity, two sub-sections added] Watching what is happening in the world -- with lots of renegotiation of institutional norms within Western democracies and a parallel fracturing of the post-WW2 institutional order -- I do think we, as a community, should more seriously question our priors on the relative value of surgical/targeted and broad system-level interventions. Speaking somewhat roughly, with EA as a movement coming of age in an era where democratic institutions and the rule-based international order were not fundamentally questioned, it seems easy to underestimate how much the world is currently changing and how much riskier a world of stronger institutional and democratic backsliding and weakened international norms might be. Of course, working on these issues might be intractable and possibly there's nothing highly effective for EAs to do on the margin given much attention to these issues from society at large. So, I am not here to confidently state we should be working on these issues more. But I do think in a situation of more downside risk with regards to broad system-level changes and significantly more fluidity, it seems at least worth rigorously asking whether we should shift more attention to work that is less surgical (working on specific risks) and more systemic (working on institutional quality, indirect risk factors, etc.). While there have been many posts along those lines over the past months and there are of course some EA organizations working on these issues, it stil appears like a niche focus in the community and none of the major EA and EA-adjacent orgs (including the one I work for, though I am writing this in a personal capacity) seem to have taken it up as a serious focus and I worry it might be due to baked-in assumptions about the relative value of such work that are outdated in a time where the importance of systemic work has changed in the face of greater threat and fluidity. When the world seems to
Recent opportunities in Forecasting
32
Ozzie Gooen
· · 2m read