Summary

(Shallow investigation of ideas outside our expertise.)

  • Most people will probably never participate on existing forecasting platforms which limits their effects on mainstream institutions and public discourse. (More)
  • Changes to the user interface and recommendation algorithms of social media platforms might incentivize forecasting and lead to its more widespread adoption. Broadly, we envision i) automatically suggesting questions of likely interest to the user—e.g., questions related to the user’s current post or trending topics—and ii) rewarding users with higher than average forecasting accuracy with increased visibility. (More)
  • In a best case scenario, such forecasting-incentivizing features might have various positive consequences such as increasing society’s shared sense of reality and the quality of public discourse, while reducing polarization and the spread of misinformation. (More)
  • Facebook’s Forecast could be seen as one notable step towards such a scenario and might offer lessons on how to best proceed in this area. (More)
  • However, various problems and risks would need to be overcome—e.g., lack of trust or interest, cost, risks of politicization, and potential for abuse. (More)
  • While recommendation algorithms seem to present a particularly high-leverage point, there are other ways of promoting truth-seeking. (More)
  • Similar ideas might be applied to fact-checking efforts. (More)

Existing forecasting platforms

Existing forecasting platforms, like Metaculus, Good Judgment Open, and the new Forecast, are already very valuable. Our concern is that the vast majority of people will never participate on such platforms (with the potential exception of Forecast, more on this below). This limits their effects on mainstream institutions and discourse.

We speculate several reasons for this:

  1. There is little incentive to participate on these platforms aside from intrinsic enjoyment of making forecasts (which most people will never acquire).
    • There are real-money prediction markets like PredictIt but unlike the stock market (in which on average people make money) people lose money, on average, due to the zero-sum nature of the market and transaction fees.[1]
    • Granted, there are other (modest) monetary incentives. For example, good forecasters can make money by winning forecasting tournaments or being employed as a superforecaster. However, most people will not be able to make money with forecasting and those that do need to put in many hours.[2]
    • Another incentive is the status associated with being a good forecaster. However, most people will probably never strongly care about this type of status unless forecasting becomes more widespread. (In this regard, platforms like Forecast or ideas like creating prediction markets for internet points seem promising.)
  2. Only a small fraction of questions of interest to people is available on forecasting platforms and especially on prediction markets. This is partly because crafting questions and resolving them is hard work and the staff of Metaculus and co is small. The questions being few in number, they must either concentrate on specific topics or spread too thinly over many topics. Either way, many potentially interested people will be turned off due to not having enough questions on topics that interest them.
  3. The user interface of some of these platforms is probably off-putting to most normal people as it makes forecasting quite complex.

Because of scaling effects, these problems compound. Even if there are a million people in the world right now who actually would use Metaculus regularly if they tried it once, very few of them will ever hear about Metaculus and those that do probably won’t try it. What if there was a way to “nudge” millions of people into trying it out?

Incentivizing forecasting on social media—a best case scenario

We first elaborate on an ambitious if unrealistic best case scenario in order to better explain the general idea. We then briefly discuss Facebook’s Forecast which could be seen as a notable step towards such a scenario. Next, we outline possible beneficial consequences of promoting forecasting more widely, and on social media specifically. Afterwards, we discuss more realistic next steps, and other objections and open problems.

Basic idea

Imagine a major social media platform, such as Twitter or Facebook[3], created a feature that incentivizes forecasting in two major ways.

First, the feature automatically suggests questions of potential interest to the user—for example, questions related to the user’s current post or trending topics.[4] This would remove the inconvenience of searching for interesting questions and might even make forecasting slightly addictive.

Second, the feature would incentivize forecasting: users who make more accurate forecasts than the community will be rewarded by the platform’s recommendation algorithm with increased visibility; something many users value highly.[5] Of course, forecasting accuracy would only be one out of the many recommendation algorithm’s ranking factors.

Concrete implementation example

For the sake of illustration, we use concrete examples based on Twitter. Similar things could be done on Facebook or other social media platforms. Note that it’s practically certain that there are better ways of implementing the general vision—our specific examples should only be seen as suggestions, not as definitive answers. (It might also be best to have all of this as an opt-in feature, at least at first.)

How would forecasts be suggested?

Imagine a user is making the following Tweet:

“European countries are getting clobbered by corona. The Mainstream Media does not like reporting this!”

Then a text window below the Tweet pops up:

“Looks like you are interested in COVID-19. Show your knowledge by making forecasts! Explore some of these trending questions.”

Some of the most popular questions relating to COVID-19 are then displayed, such as, “By 31 Dec 2021, will more people have died due to COVID-19 in Europe or in the US?”, alongside the current community prediction. (Clicking on these questions could bring up more details such as the precise resolution criteria and discussion.)

Users might also explore popular questions by clicking on a header named, say, “Forecasts”. Perhaps such a header could be placed below, say, “Explore” (see the screenshot below).

Where would forecasts be displayed?

Forecasts should probably stay secret by default until they resolve since otherwise copying the predictions of good forecaster’s would be too easy (but more on this below). The resolved forecasts of users might be displayed somewhere on their profile, perhaps in a column next to “Likes” which is currently the right-most column on one’s Twitter profile.

How exactly would good forecasting be rewarded?

Twitter’s recommender systems can boost the visibility of Tweets, Twitter users, and news stories by displaying them near the top of the page—e.g., “Explore - For you” and “Who to follow” as shown below.

We don’t know how exactly Twitter’s recommendation algorithm works. But it seems technically feasible to boost the visibility of users with better forecasting accuracy by more often displaying them near the top of the relevant pages.

Last, one might also add a forecasting leaderboard—e.g., next to the current “Entertainment” header—where people with the best forecasting accuracy are displayed at the top.

Lack of connection between content and forecasting?

Much, perhaps even most content on social media has nothing to do with sharing publicly relevant information; it’s more about funny memes and cat videos. It would be a bit odd to boost the visibility of, say, Nihilist Arby’s even if the person behind the account makes highly accurate forecasts. It seems possible, however, to distinguish information-related content from other categories such as entertainment or sports. In fact, Twitter does this already. Thus, it might be possible to only boost the content of users who (primarily) post informational content.[6]

If distinguishing between information and entertainment somehow proves infeasible, maybe one could (for the time being) only reward news media outlets and/or their individual journalists with increased visibility for higher than average forecasting accuracy. It might generally be more natural (and impactful) to tie forecasting to journalism/news specifically rather than to social media content in general. See this section for more details and other ideas in this vicinity.

Who would create and resolve questions?

Who would create and resolve questions? The default would be dedicated Twitter staff, perhaps overseeing communities of volunteer moderators.

In order to reduce potential bias, conflicts of interest and increase trust among users, perhaps questions should be created, moderated, and resolved by committees of staff and members of the general public (perhaps randomly selected ones). Committee members could also be required to undergo a selection, training and instruction process—perhaps resembling those of US juries or similar entities which enjoy a high level of trust among the public and generally seem to make relatively fair judgments (with notable exceptions, of course).

Other

Other issues and risks—like, e.g., lack of trust, PR risks, lack of interest by companies and/or users—are discussed in greater detail in the section “Problems, objections and potential solutions” below.

Facebook’s Forecast

Note that we wrote most of this post before we became aware of the existence of Forecast.

Forecast, which was developed by Facebooks’ New Product Experimentation team, already goes a considerable way towards realizing the abovementioned best case scenario. Forecast incentivizes accurate forecasting via points, leaderboards, and badges, among other things. We recommend this recent LessWrong post for more details.

Of course, the main ingredient that is missing is full integration with Facebook and its recommendation algorithms. However, the existence of Forecasts suggests that the leadership of Facebook could be more open towards greater integration of forecasting than we would have initially suspected.

Forecast is also interesting since at least one of its staff members seems sympathetic towards LessWrong. This might point to the value of having insider connections at big social media companies. It might also suggest that it could be a high impact career path for effective altruists to work for such companies and try to advocate for forecasting-incentivizing features from within.

Path to impact

For the sake of illustration, imagine a best case scenario in which a major social media company (e.g., Twitter) adopts such forecasting-incentivizing features. It will likely be discussed in several major news outlets. Similar features might be adopted by other social media platforms in adapted forms, such as Facebook, Youtube, and Reddit. The population of social media “thought leaders” would shift to have more representation by people with good forecasting skills. Within a few years, a few percent of social media users might make at least one forecast per month.

In essence, it could lead to most social media recommendation algorithms becoming ~10% more truth-rewarding. This might be similarly impactful as convincing most journalists to care ~10% more about the truth. Various positive consequences could result from such a best case scenario.

Tens of millions of people might become slightly more reflective, epistemically modest and less polarized as a result of their experience with forecasting—compare Mellers, Tetlock & Arkes (2018) who found that participation in forecasting tournaments had comparable effects.[7] Some of the negative effects of social media—such as epistemic fragmentation and polarization—could be reduced, making it easier to reach compromise solutions and avoid major conflicts and political gridlock. Public discourse could become more nuanced and truth-tracking. Ultimately, the world’s shared sense of reality and its sanity waterline might go up by a noticeable amount.

Tens of thousands of people could become the equivalent of superforecasters in various domains, and thousands of good hires might be made partly on the basis of forecasting track record. Substantially more people could find their way to the rationalist and (longtermist) EA communities as a result of getting excited about forecasting. Several (governmental) institutions, corporations, and academic disciplines might reform themselves to adopt better forecasting practices.

The greater prevalence of forecasting would plausibly increase the influence of superforecasters, forecasting platforms, and prediction markets. As a result, several important world events might be successfully predicted and prepared for; events that otherwise would have taken the world by surprise due to people listening to pundits and analysts instead of forecasters. Superforecasters, forecasting platforms, and prediction markets would likely also put non-trivial probability on the dangers of transformative AI and other risks longtermists care about. As a result, more people might be recruited to these causes.

More realistic alternatives and next steps

Needless to say, the above scenarios are improbable. The base rate of major tech companies adopting one’s desired feature is low. Getting a feature rolled out is presumably very complicated and involves many considerations that we don't understand. (Other problems and objections are discussed in the next section.) That being said, the existence of Forecast suggests that it might not be that improbable that a major social media platform would be open towards a greater integration of forecasting into their platform.

Still, we outline some other more realistic alternatives and potential next steps below—some of them might even result in something resembling the best case scenario eventually.

Reputable actors convincing social media companies

One way forward would be for reputable actors—say, large grantmaking entities or public intellectuals such as Tetlock—to try to convince social media companies to adopt forecasting-incentivizing features. Grantmaking entities might even be able to cover expenses associated with the development (and/or maintenance) of forecasting-incentivizing features. In addition to such external efforts, it might also be possible to convince senior managers or teams to advocate for the idea from within the company.

Starting with smaller or more sophisticated platforms

It might be considerably easier to convince smaller (not necessarily social media) companies such as Medium or Substack to adopt forecasting-incentivizing features. (Substack might be especially promising given that they apparently made a “extremely generous offer” to Scott Alexander.) Some of these platforms might also have a more sophisticated user base. If forecasting-incentivizing features were adopted by a smaller platform with sufficient success—e.g., in terms of engagement and reception—larger social media companies might be more inclined to adopt such features as well.

Browser extension or app

Those enthusiastic about forecasting might also develop their own browser extension or app. For example, it might be technically feasible to create a browser extension that displays one’s forecasting accuracy. Other users might then deprioritize the content of those with bad forecasting track records (or those who don’t use the extension in the first place). There are many variations of this idea.

Problems, objections and potential solutions

Forecasting might be too difficult or effortful for most users

Forecasting is effortful and only a tiny fraction of the population engages in regular forecasting. However, it seems possible that this fraction might be substantially increased provided forecasting is made sufficiently easy and rewarded sufficiently strongly.

Many people seem to care strongly about how many views and likes their content gets. To increase their follower count, many people spend great effort on improving their thumbnails, video-editing, and so on. There seem to be hundreds of videos on how to “game” the Youtube algorithm, many of them with more than a million views. Spending a few hours on learning how to make forecasts doesn’t seem inherently more difficult or less enjoyable.

Furthermore, forecasting can be made much more intuitive and simpler than on, say, Metaculus. For example, one could have a slider ranging from “extremely unlikely” to “extremely likely” (while displaying the precise probability below).

The feature could be gamed

Different methods of rewarding forecasting accuracy open up different ways for gaming the system. We are therefore unsure how to best reward forecasting accuracy. In the beginning, one could use a scoring method similar to Metaculus’s to kickstart adoption of the feature. In contrast to a proper prediction market, this method rewards users for simply agreeing with the community’s consensus forecast (provided it is correct). Once more people start participating, one might want to switch to a scoring method that only rewards users who beat the community.

Other ways of manipulation include:

  • Users paying skilled forecasters to send them a list of forecasts.
  • Users simply copying the predictions of good forecasters (either from the same platform or a different forecasting platform).
  • In the worst case, malevolent actors could pay out large amounts of money to skilled forecasters to get good forecasts for their own armies of social media accounts that mostly push a malicious agenda of misinformation unrelated to these correct forecasts.

Potential ways of addressing these problems:

  • Once one moves to a scoring method that only rewards those that beat the community, simply copying the predictions of good forecasters (on the platform or elsewhere) will become difficult because many people are going to do that. In the extreme case, the community prediction on, say, Twitter will start to approximate the predictions of prediction markets or superforecasters on other forecasting platforms. However, this would actually be desirable: most people become more epistemically modest and simply defer to the best forecasters and experts. Those who are overconfident will be punished.
  • Paying others to make forecasts for them could be against the terms of service. Users who do this will be banned. (However, this is probably difficult to enforce. It might also be desirable if forecasting becomes a skill highly valued by the market.)
  • Ensuring that the predictions of (top) forecasters on the platform stay secret until they are resolved.
  • Keeping track of forecasting performance separately for different categories, and then only boost content that seems to fit within the category the user has performed well in. (This might be very difficult to implement.)
  • Generally, it might be best if the recommendation algorithms don’t reward accurate forecasts in socially irrelevant domains such as sports—or reward them less so.

Lack of trust

What if a substantial portion of users doesn’t trust those resolving the questions? This seems a major concern given that many people already don’t trust the fact-checking efforts by Twitter, Facebook, and others.

We hope that this problem might be reduced by:

  • Limiting questions to less (politically) controversial, more easily resolvable domains—at least in the beginning.
  • Ensuring that those who resolve the questions are properly selected, instructed, and trained. As mentioned above, it might be prudent to establish committees consisting of employers and (randomly selected) members of the public.
  • Forecasting is, perhaps, less likely to evoke notions of censorship and bias than fact-checking (see more here).

Direct costs of development and maintenance

The implementation of a forecasting-incentivizing feature will be costly. However, this does not seem to be an insurmountable problem. Metaculus already features more than 1,000 questions while being a relatively small platform. Major social media companies could relatively easily employ ~100 employees that work full-time on resolving questions and related tasks for around $10 million per year.[8]

One could argue that kind of money only buys you ~10x as many questions as Metaculus —not enough to be interesting. However, perhaps the system could be made even more efficient with users allowed to make communities that propose questions and propose question resolutions, and the social media employees would just be moderators/referees or even oversee community moderators/referees. These employees could also check questions for potential offensiveness or PR risks, create new questions, merge similar ones, and so on.

Alternatively, social media companies could try to (partly) crowdsource the resolution of questions and related tasks to its users—there is some evidence that well-implemented crowdsourcing works with fact-checking (Allen et al., 2020).

Reduced advertising revenue and engagement

Currently employed recommendation algorithms are presumably optimized for ad revenue (or proxies thereof such as time spent on the platform). One could argue that optimizing for forecasting accuracy would automatically mean less optimizing for revenue since it is a priori unlikely that these two objectives are highly correlated. However, forecasting-incentivizing features might also increase engagement or be close to net neutral in terms of revenue. If money/lost revenue were indeed the sole limiting factor, grant-making entities might be willing to subsidize such features.

Overall, social media companies are probably willing to forego some profits if doing so has enough societal benefits[9] or helps to reduce the probability of potentially immensely costly regulation—such as regulating social media companies as publishers.

PR

It’s not clear to us whether social media companies would be in favor of or opposed to forecasting-incentivizing features for PR reasons.

On the one hand, forecasting is often misunderstood by the public and could generate negative media coverage. For instance, the media coverage of Facebook’s Forecast is rather mixed with TechCrunch writing that “an app focused on making “guesses” about the future seems ill-advised” and that “[...] scientists [...] don’t then crowdsource voting to determine if a hypothesis is true — they test, experiment, gather supporting data [...]”. (More media coverage of Forecast is summarized in Appendix B.)

On the other hand, social media companies are already widely criticized[10], partly for (allegedly) contributing to the spreading of misinformation and “fake news”. If recommendation algorithms started to reward forecasting accuracy, social media companies could justifiably claim that they are doing their part to fight such disinformation. Social media companies could also counter another criticism about how their recommendation algorithms are too intransparent by rewarding forecasting accuracy and making this public knowledge.

Effort

The upper management would need to prioritize the development and implementation of forecasting-incentivizing features over the many other improvements and features that could be developed instead. This is problematic given that there are presumably dozens of improvements that require less effort and would be more lucrative.[11]

The politicization of forecasting or damaging its reputation

It is crucial to ensure that such forecasting-incentivizing features won't become controversial, politicized or give forecasting a bad reputation.

Potential worrisome scenarios include:

  • Some potential questions could be extremely controversial and outrage one or both political sides and ultimately force the social media company to shut down the tool. For example, questions related to particularly sensitive culture war topics.
  • Some potential questions could concern violent events such as terrorist attacks. The history of the Policy Analysis Market suggests that this could result in extreme outrage.
  • For whatever reason, some malevolent or controversial users (e.g., on the far right or far left) could perform very well and have its content boosted. Outrage ensues.
  • The social media company could be accused of being biased (or indeed will be biased) when it comes to which questions it allows, how they are formulated, and how they are resolved.
  • People could accuse the social media company of trying to push a sinister political agenda.

It is crucial to think about how to minimize these (and other) risks. In any case, it seems advisable to be very careful, especially in the beginning when the feature is being rolled out. For example, it might be prudent to only allow questions written by carefully vetted staff, and only within uncontroversial domains where questions can be resolved with relative ease and unambiguity.

In the beginning, it seems also wise to make forecasting-incentivizing features only available to a small minority of users interested in and fond of forecasting to ensure a healthy “starter culture”. The feature could then slowly be rolled out to more and more users.

Other potential negative effects

  • Forecasting-incentivizing features might end up disproportionally promoting the voices of intellectual elites at the expense of “regular people”. Some might view this as a negative.
  • Another possibility is that such features might promote “small-picture thinkers” who focus on easily resolvable questions at the expense of thinkers focusing on big-picture questions that can’t be easily resolved.[12] (Generally, whether and how to factor in the relative difficulty of different questions probably deserves more discussion.)
  • Most of the value from forecasting might come from top/“professional” forecasters. Mass-scale forecasting might be comparatively unimportant.

Why focus on forecasting and not on other factors?

It’s plausible that increased truth-seeking is part of “a common set of broad factors which, if we push on them, systematically lead to better futures” (Beckstead, 2013). Of course, there exist many such positive broad factors—such as increased cooperation or altruistic motivation. One could thus ask: Why not adjust recommender systems such that they reward more, for example, charitable, nuanced, thoughtful, rational, or (effectively) altruistic content?

To be clear, we don’t think that lack of (accurate) forecasting is the main problem in our society. However, forecasting in particular and truth-seeking in general seem promising for at least two reasons. First, accurate forecasts are relatively easy to operationalize and measure in an uncontroversial and objective way. For example, even if everyone agreed that more nuanced, charitable, and altruistic content should be rewarded, it’s not clear how one would objectively measure those things. It seems likely that, say, Republicans and Democrats would disagree heavily on how nuanced, charitable, or altruistic a given piece of content on a politically controversial subject is.

Second, it seems that most people would (at least claim to) value truth as measured by accurate forecasts. This can not be said for most other things that effective altruists or rationalists care about. For example, what looks like a nuanced and charitable tweet to us, could easily look like covert racism or hypocritical virtue signalling to others.[13]

That being said, it seems possible that at some point in the future one could measure and reward other positive factors such as nuance as well, maybe with the help of advanced machine learning, wisdom of the crowd approaches, or combinations thereof. Maybe forecasting could just be the start of a longer process of aligning recommender systems.

General limitations

Last, let us briefly elaborate on the limitations of this proposal—and the post in general. We have put relatively little emphasis on how to actually move forward with the implementation of these (and similar) ideas even though this might be the most important and difficult part. Part of the reason for this is that practical execution is not our strength. We also have become less convinced of the importance and feasibility of our ideas and wanted to move on to other projects. For example, we learned about Forecast, an existing effort to build a forecasting community at Facebook; they—and others like them—are in a much better position to think about and advance forecasting than we are.

For this reason, the post is also short on empirical research and lacks input from experts. We simply wanted to get these ideas out there and hope that someone finds some value in them—for example, by building on them or learning how not to think about this area.

Appendix

Why focus on recommender systems?

Recommender systems seem to present a high-leverage opportunity for several reasons. Many of them are already discussed in the post Aligning Recommender Systems as a Cause Area (Vendrov & Nixon, 2019) so we will only elaborate briefly here.

First, the scale seems enormous since recommender systems affect the lives of hundreds of millions of people. (See also the section “Scale” of Vendrov & Nixon (2019) for more details.)

As discussed above, even though it is very difficult to influence major media companies, few actors seem systematically opposed to changing recommender systems such that they uprank socially beneficial values like truth, suggesting at least non-negligent tractability.

Regarding neglectedness: while many people discuss the dangers of social media, relatively few organizations seem to work on technical, non-political solutions to these problems. Forecasting-incentivizing features could thus be seen as an attempt to “pull the rope sideways”.

Other potential leverage points to promote forecasting

News media outlets incentivizing forecasting

It would seem very valuable if major newspapers encouraged their journalists to make relevant forecasts at the end of their articles. Newspapers could even have leaderboards of the journalists with the highest forecasting accuracy, potentially coupled with a substantial prize pool. This might create large social and financial incentives to entice a substantial fraction of journalists to engage in forecasting and do it well. Overconfident pundits might also lose readership and credibility if their forecasting accuracy is poor (or they repeatedly refuse to participate), whereas journalists with high forecasting accuracy might become more popular (as has happened with Nate Silver).

This idea could also be combined with our original proposal of adjusting media recommendation algorithms; perhaps one could convince social media companies to adjust the visibility of the content of news outlets based on the (average) forecasting accuracy of their journalists.

How could one achieve that? Perhaps reputable EA organizations with sufficient funding could give grants to a selected newspaper to adopt such practices. If one reputable news outlet adopts this practice successfully, others might follow suit. (Note that a few million dollars per year would already be a substantial amount of money. The Washington Post, for example, was bought for “just” $250 million by Jeff Bezos in 2013.)

Alternatively, one could try to establish a reputable prize, akin to the Pulitzer Prize, for forecasting-heavy journalism—and/or longtermist journalism, for that matter. (Note that the cash rewards of the Pulitzer Prize are very modest, only $15,000. Thus, the influence of the Pulitzer Prize might mostly exist for historical reasons which are hard to emulate; a better strategy might be to lobby the Pulitzer Prize itself to create a subcategory for accurate forecasting.)

Online courses on forecasting

Another obvious idea would be to have someone highly qualified—Philip Tetlock comes to mind—give a course about forecasting on Coursera (or a similar platform).[14] Such a course could potentially reach hundreds of thousands of people, maybe even more (the most popular Coursera courses have been completed by over a million people). An online course is plausibly also a better medium to teach forecasting than a book; it could better incorporate various practical exercises—which could be completed in teams—and reward a real certificate upon completion.

Perhaps the course’s completion should require students to have made at least ~50 forecasts on sites such as Good Judgment Open or Metaculus. Reputable grantmaking entities could offer prizes for the, say, top 1% of participants in order to incentivize further participation and increase the reputation of such a course.

What about fact-checking?

Another idea would be to adjust recommendation algorithms such that they reward content that has been labelled as true by fact-checkers—and/or downrank content that has been labelled as false. This could definitely be very valuable and is seemingly already done to some extent. It could even be done in combination with our proposal; indeed there is some non-trivial overlap between these two proposals because the resolution of questions upon which forecasts are based involves a sort of fact-checking as well. The advantages and disadvantages of fact-checking are discussed in more detail in Appendix B.

Appendix B

See Appendix: Incentivizing forecasting via social media for additional details.

Acknowledgments

For helpful comments and discussion many thanks to Pablo Stafforini, Rebecca Kossnick, Stefan Schubert, Jia Yuan Loke, Ruairi Donnelly, Lukas Gloor, Jonas Vollmer, Tobias Baumann, Mojmír Stehlík, Chi Nguyen, Stefan Goettsch, Lucius Caviola, and Ozzie Gooen.


  1. Perhaps on some subconscious level most people realize that they’d probably lose money if they played the prediction markets, and thus stay away. Perhaps an even bigger factor is that there exist more exciting alternatives (such as sport games) for people who enjoy betting—why bother with forecasting boring and complicated events even if they are more important from a societal perspective. ↩︎

  2. Increasing the prize pool would likely increase participation but trying to incentivize a non-trivial fraction of the human population to participate seems too costly. ↩︎

  3. Other alternatives include Reddit, Medium, or potentially even Google. For example, websites that make correct forecasts (preferably those thematically related to the content of the website) could get a boost in their PageRank. ↩︎

  4. Or a "Twitter forecast" functionality, similar to Twitter polls. ↩︎

  5. In the beginning, in order to further incentivize participation, one could even consider rewarding all users for merely participating—or everyone above the, say, 20th percentile of forecasting accuracy. ↩︎

  6. Though one might want to promote users who make good forecasts about, say, sport games, partly in order to promote the feature. ↩︎

  7. The authors write: "[...] participants who actively engaged in predicting US domestic events were less polarized in their policy preferences than were non-forecasters. Self-reported political attitudes were more moderate among those who forecasted than those who did not. We also found evidence that forecasters attributed more moderate political attitudes to the opposing side." ↩︎

  8. Assuming salary costs of around $75,000 per year and another ~$2.5 million for lawyers, management, and other costs. ↩︎

  9. For example, in his 80,000 Hours podcast interview, Tristan Harris remarks: “[...Facebook has] made many changes that have actually decreased revenue and decreased engagement for the good of lowering addiction or decreasing political polarization”.

    Another example is how Twitter decided to keep the “Quote Tweet” feature because it “slowed the spread of misleading information” even though it led to an overall reduction in retweets (which seems bad in terms of revenue/engagement). ↩︎

  10. For example, the documentary “The Social Dilemma” argues that social media is doing great harm to society. It is currently the most reviewed documentary of 2020 on IMDB. In one survey, only 14% of British voters thought that social media has an overall positive effect on society while 46% thought it was negative. ↩︎

  11. For example, Twitter and Facebook don’t even have premium versions which seem relatively trivial to develop (just disable ads and add a few gimmicks) and would presumably generate tens of millions in additional revenue. For what it’s worth, they could also be socially beneficial given that the advertising model might incentivize more click-baity, sensationalist, outrage-inducing content. But there could be other reasons for the lack of premium versions. ↩︎

  12. Thanks to Mojmír Stehlík for raising these points. ↩︎

  13. Still, there might still be rather obvious low-hanging fruit such as excessive vulgarity, ad hominems or clear “hate speech” (e.g., racial slurs) that could be measured more easily and punished by recommender systems (and existing platforms seem to already do this to some extent). ↩︎

  14. The existing Coursera courses on forecasting seem at best tangentially related to the type of forecasting we have in mind. ↩︎

70

19 comments, sorted by Highlighting new comments since Today at 9:59 PM
New Comment

Overall I like this idea, appreciate the expansiveness of the considerations discussed in the post, and would excited to hear takes from people working at social media companies.

Thoughts on the post directly

Broadly, we envision i) automatically suggesting questions of likely interest to the user—e.g., questions related to the user’s current post or trending topics—and ii) rewarding users with higher than average forecasting accuracy with increased visibility

I think some version of some type of boosting visibility based on forecasting accuracy seems promising, but I feel uneasy about how this would be implemented. I'm concerned about (a) how this will be traded off with other qualities and (b) ensuring that current forecasting accuracy is actually a good proxy.

On (a), I think forecasting accuracy and the qualities it's a proxy for represent a small subset of the space that determines which content I'd like to see promoted; e.g. it seems likely to be loosely correlated with writing quality. It may be tricky to strike the right balance in terms of how the promotion system works.

On (b):

  1. Promoting and demoting content based on a small sample size of forecasts. In practice it often takes many resolved questions to discern which forecasters are more accurate, and I'm worried that it will be easy to increase/decrease visibility too early.
  2. Even without a small sample size, there may be issues with many of the questions being correlated. I'm imagining a world in which lots of people predict on correlated questions about the 2016 presidential election, then Trump supporters get a huge boost in visibility after he wins because they do well on all of them.

That said, these issues can be mitigated with iteration on the forecasting feature if the people implementing it are careful and aware of these considerations. 

Generally, it might be best if the recommendation algorithms don’t reward accurate forecasts in socially irrelevant domains such as sports—or reward them less so.

Insofar as the intent is to incentivize people to predict on more socially relevant domains, I agree. But I think forecasting accuracy on sports, etc. is likely strongly correlated with performance in other domains. Additionally, people may feel more comfortable forecasting on things like sports than other domains which may be more politically charged.

My experience with Facebook Forecast compared to Metaculus

I've been forecasting regularly on Metaculus for about 9 months and Forecast for about 1 month.

  1. I don't feel as pressured to regularly go back and update my old predictions on Forecast as on Metaculus since Forecast is a play-money prediction market rather than a prediction platform. On Metaculus if I predict 60% and the community is at 50%, then don't update for 6 months and the community has over time moved to 95%, I'm at a huge disadvantage in terms of score relative to predictors who did update. But with a prediction market, if I buy  shares at 50 cents and the price of the shares go up to 95 cents, it just helps me. The prediction market structure makes me feel less pressured to continually update on old questions, which has both its positives and negatives but seems good for a social media forecasting structure. 
  2. The aggregate on Forecast is often decent, but occasionally horrible more egregiously and more often than on Metaculus (e.g. this morning I bought some shares for Kelly Loeffler to win the Georgia senate runoff at as low as ~5 points implying 5% odds, while election betting odds currently have Loeffler at 62%). The most common reasons I've noticed are: 
    1. People misunderstand how the market works and bet on whichever outcome they think is most probable, regardless of the prices.
    2. People don't make the error described in (1) (that I can tell), but are over-confident.
    3. People don't read the resolution criteria carefully.
    4. Political biases. 
    5. There aren't many predictors so the aggregate can be swung easily.
  3. As hinted at in the post, there's an issue with being able to copy the best predictors. I've followed 2 of the top predictors on Forecast and usually agree with their analyses and buy into the same markets with the same positions.
  4. Forecast currently gives points when other people forecast based on your "reasons" (aka comments), and these points are then aggregated on the leaderboard with points gained from actual predictions. I wish there were separate leaderboards for these.

Thanks, great points!

would excited to hear takes from people working at social media companies.

Yeah, me too. For what it's worth, Forecast mentions our post here.

On (a), I think forecasting accuracy and the qualities it's a proxy for represent a small subset of the space that determines which content I'd like to see promoted

Yeah, as we discuss in this section, forecasting accuracy is surely not the most important thing. If it were up to me, I'd focus on spreading (sophisticated) content on, say, effective altruism, AI safety, and so on. Of course, most people would never agree with this. In contrast, forecasting is perhaps something almost everyone can get behind and is also objectively measurable. 

I agree that the concerns you list under (b) need to be addressed. 

 

This is a very good idea. The problems in my view are biggest on the business model and audience demand side. But there are still modest ways it could move forward. Journalism outlets are possible collaborators but they need the incentive perhaps by being able to make original content out of the forecasts.

To the extent prediction accuracy correlates with other epistemological skills you could task above average forecasters in the audience with tasks like up- and down-voting content or comments, too. And thereby improve user participation on news sites even if journalists did not themselves make predictions.

Thanks!

The problems in my view are biggest on the business model and audience demand side.

I agree.

Journalism outlets are possible collaborators but they need the incentive [...]

Yeah, maybe such outlets could receive financial support for their efforts by organizations like OpenPhil or the Rockefeller Foundation—which supported Vox's Future Perfect.

To the extent prediction accuracy correlates with other epistemological skills you could task above average forecasters in the audience with tasks like up- and down-voting content or comments, too.

Interesting idea. More generally, it might be valuable if news outlets adopted more advanced commenting systems, perhaps with Karma and Karma-adjusted voting (e.g., similar to the EA forum). From what I can tell, downvoting isn't even possible on most newspaper websites. However, Karma-adjusted voting and downvotes could also have negative effects, especially if coupled with a less sophisticated user base and less oversight than on the EA forum.

Agree on both points. Economist's World in 2021 partnership with Good Judgment is interesting here. I also think as GJ and others do more content themselves, other content producers will start to see the potential of forecasts as a differentiated form of user-generated content they could explore. (My background is media/publishing so more attuned to that side than the internal dynamics of the social platforms.) If there are further discussions on this and you're looking for participants let me know.

Thanks for writing this! :)

Another potential outcome that comes to mind regarding such projects is a self-fulfilling prophecy effect (provided the predictions are not secret).  I have no idea how much of an (positive/negative) impact it would have though. 

Thanks. :)

Another potential outcome that comes to mind regarding such projects is a self-fulfilling prophecy effect [...]

That's true though this is also an issue for other forecasting platforms—perhaps even more so for prediction markets where you could potentially earn millions by making your prediction come true. From what I can tell, this doesn't seem to be a problem for other forecasting platforms, probably because most forecasted events are very difficult to affect by small groups of individuals. One exception that comes to mind is match fixing.

However, our proposal might be more vulnerable to this problem because there will (ideally) be many more forecasted events, so some of them might be easier to affect by a few individuals wishing to make their forecasts come true.

Some other people have mentioned Facebook's Forecast. Have you thought about talking with them directly about these ideas? For reference, here is the main person. 

Yes, we have talked with Rebecca about these ideas. 

Thanks, stimulating ideas!

My quick take: Forecasting is such an intellectual exercise, I’d be really surprised if it becomes a popular feature on social media platforms, or will have effects on the epistemic competencies of the general population.

I think I‘d approach it more like making math or programming or chess a more widely shared skill: lobby to introduce it at schools, organize prestigious competitions for highschools and universities, convince employers that this is a valuable skill, make it easy to verify the skill (I like your idea of a coursera course + forecasting competition).

Forecasting is such an intellectual exercise, I’d be really surprised if it becomes a popular feature on social media platforms

I'd also be surprised. :) Perhaps I'm not as pessimistic as you though. In a way, forecasting is not that "intellectual". Many people bet on sport games which (implicitly) involves forecasting. Most people are also interested in weather and election forecasts and know how to interpret them (roughly).

Of course, forecasting wouldn't become popular because it's intrinsically enjoyable. People would have to get incentivized to do so (the point of our post). However, people are willing to do pretty complicated things (e.g., search engine optimization) in order to boost their views, so maybe this isn't that implausible.

As we mention in the essay, one could also make forecasting much easier and more intuitive, by e.g. not using those fancy probability distributions like on Metaculus, but maybe just a simple slider ranging from 0% to 100%.

Forecasting also doesn't have to be very popular. Even in our best case scenario, we envision that only a few percent of users make regular forecasts. It doesn't seem highly unrealistic that many of the smartest and most engaged social media users (e.g., journalists) would be open to forecasting, especially if it boosts their views.

But yeah, given that there is no real demand for forecasting features, it would be really difficult to convince social media executives to adopt such features.

I think I‘d approach it more like making math or programming or chess a more widely shared skill

I agree that this approach is more realistic. :) However, it would require many more resources and would take longer.

Hm, regarding sports and election betting, I think you're right that people find it enjoyable, but then again I'd expect no effect on epistemic skills due to this. Looking at sports betting bars in my town it doesn't seem to be a place for people that e.g. would ever track their performance. But I also think the online Twitter crowd is different. I'm not sure how much I'd update on Youtubers investing time into gaming Youtube's algorithms. This seems to be more a case of investing 2h watching stuff to get a recipe to implement?

Just in case you didn't see it, Metaculus' binary forecasts are implemented with exactly those 0%-100% sliders. 

I agree that this approach is more realistic. :) However, it would require many more resources and would take longer.

Not sure if I think it would require that many more resources. I was surprised that Metaculus' AI forecasting tournament was featured on Forbes the other day with "only" $50k in prizes. Also, from the point of view of a participant, the EA groups forecasting tournament seemed to go really well and introduced at least 6 people I know of into more serious forecasting (being run by volunteers with prizes in form of $500 donation money). The coursera course sounds like something that's just one grant away. Looking at Good Judgement Open, ~half of their tournaments seem to be funded by news agencies and research institutes, so reaching out to more (for-profit) orgs that could make use of forecasts and hiring good forecasters doesn't seem so far off, either.

I also imagined that the effect on epistemic competence will mostly be that most people learn that they should defer more to the consensus of people with better forecasting ability, right? I might expect to see the same effect from having a prominent group of people that perform well in forecasting. E.g. barely anyone who's not involved in professional math or chess or poker will pretend they could play as well as them. Most people would defer to them on math or poker or chess questions.

Not sure if I think it would require that many more resources. I was surprised that Metaculus' AI forecasting tournament was featured on Forbes the other day with "only" $50k in prizes. Also, from the point of view of a participant, the EA groups forecasting tournament seemed to go really well and introduced at least 6 people I know of into more serious forecasting (being run by volunteers with prizes in form of $500 donation money).

Yeah, I guess I was thinking about introducing millions of people to forecasting. But yeah, forecasting tournaments are a great idea. 

I agree that a forecasting Coursera course is promising and much more realistic.

I'm not an expert on social media or journalism, but just some fairly low-confidence thoughts - it seems like this is areally interesting idea, but it seems very odd to think of it as a Facebook feature (or other social media platform):

  • Facebook and social media in general don't really have an intellectual "brand". It seems likely that if you did this as a Facebook feature, it would be more likely to get dismissed as "just another silly Facebook game." Or if most of the people using it weren't putting much effort into it, the predictionslikely  wouldn't be that accurate, and that could undermine the effort to convince the public of its value.
  • The part about promoting people with high prediction scores seems awkward. Am I understanding correctly that each user is given one prediction score that applies to all their content? So that means that if someone is bad (good) at predicting COVID case counts, then if they post something else it gets down- (up-) weighted, even if the something else has nothing to do with COVID? That's likely to be perceived as very unfair. Or do you have some system to figure out which forecasting questions count toward the recommender score for which pieces of content? Even then it seems weird - if someone made bad predictions about COVID in the past, that doesn't necessarily imply that content they post now is bad.
  • Presumably the purpose of this is to teach people how to be better forecasters. If you have to hide other people's forecasts to prevent abuse, then how are you supposed to learn by watching other forecasters? Maybe the idea is that Facebook would produce content designed to teach forecasting - but that isn't the kind of content that Facebook normally produces, and I'm not sure why we would expect Facebook to be particularly good at that.
  • All the comparisons between forecasting and traditional fact-checking are weird because they seem to address different issues; forecasting doesn't seem to be a replacement or alternative to fact-checking. For instance, how would forecasting have helped to fight election misinformation? If you had a bunch of prediction questions about things like vote counts or the outcomes of court cases, by the time those questions resolved everything would be already over. (That's not a problem with forecasting, since it's not intended for those kinds of cases. But it does mean that  it would not be possible to pitch this as an alternative to traditional fact-checking.)
  • In general, this seems to require a lot of editorial judgment on the part of Facebook as to what forecasting questions to use and what resolution criteria. (Especially this would be an issue if you were to use a user's general forecasting score as part of the recommender algorithm - for instance, if Facebook included lots of forecasting questions about economic data, that would end up advantaging content posted by people who are interested in economics, while if the forecasting questions were about scientific discoveries instead, then it would instead advantage content posted by people who are interested in science.) My guess is that this sort of editorial role is not something that social media platforms would be particularly enthusiastic about - they were sort of forced into it by the misinformation problem, but in that case they mostly defer to reputable sources to adjudicate claims. While they could defer to reputable sources to resolve questions, I'm not sure who they would defer to to decide what questions to set up. (I'm assuming here that the platform is the one setting up the questions - is that the case?)
  • Another way to game the system that you didn't mention here: set up a bunch of accounts, make different predictions on each of them, and then abandon all the ones that got low scores, and start posting the stuff you want on the account that got a high score.

 

I wonder if it might make more sense to think of this as a feature on a website like FiveThirtyEight that already has an audience that's interested in probabilistic predictions and models. You could have a regular feature similar to The Riddler but for forecasting questions - each column could have several questions, you could have readers write in to make forecasts and explain their reasoning, and then publish the reasoning of the people who ended up most accurate, along with commentary.

Thanks for your detailed comment.

but it seems very odd to think of it as a Facebook feature (or other social media platform)

Yeah, maybe all of this is a bit fantastical. :) 

Facebook and social media in general don't really have an intellectual "brand". It seems likely that if you did this as a Facebook feature, it would be more likely to get dismissed as "just another silly Facebook game." Or if most of the people using it weren't putting much effort into it, the predictionslikely wouldn't be that accurate, and that could undermine the effort to convince the public of its value.

That’s certainly possible. For what it’s worth, while Facebook’s Forecast was met with some amount of skepticism, I wouldn’t say it was “dismissed” out of hand. The forecasting accuracy of Forecast’s users was also fairly good: “Forecast's midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement's published result of 0.227 for prediction markets.”

However, it’s true that a greater integration with Facebook would probably make the feature more controversial and also result in a lower forecasting accuracy.

Btw, Facebook is just one example—I write this because you seem to focus exclusively on Facebook in your comment. In some ways, Twitter might be more appropriate for such features.

Am I understanding correctly that each user is given one prediction score that applies to all their content? So that means that if someone is bad (good) at predicting COVID case counts, then if they post something else it gets down- (up-) weighted, even if the something else has nothing to do with COVID

That would be the less complicated option. It might be perceived as being unfair—not sure if this will be a big problem though.

I’m working under the assumption that people who make more correct forecasts in one domain will also tend to have a more accurate model of the world in other domains—on average, of course, there will be (many) exceptions. I’m not saying this is ideal; it’s just an improvement over the status quo where forecasting accuracy practically doesn’t matter all in determining how many people read your content.

Or do you have some system to figure out which forecasting questions count toward the recommender score for which pieces of content?

That would be the other, more complicated alternative. Perhaps this is feasible when using more coarse-grained domains like politics, medicine, technology, entertainment, et cetera, maybe in combination with machine learning. 

Even then it seems weird - if someone made bad predictions about COVID in the past, that doesn't necessarily imply that content they post now is bad.

Well, sure. But across all users there will likely be a positive correlation between past and future accuracy. I think it would be good for the world if people who made more correct forecasts about COVID in the past would receive more “views” than those who made more incorrect forecasts about COVID—even though it’s practically guaranteed that some people in the latter group will improve a lot (though in that case, they will be rewarded by the recommender system in the future for that) and even make better forecasts than people in the former group. 

Presumably the purpose of this is to teach people how to be better forecasters.

I wouldn’t say that’s the main purpose. 

If you have to hide other people's forecasts to prevent abuse, then how are you supposed to learn by watching other forecasters?

My understanding is that’s how other platforms, like e.g. Metaculus, work as well. Of course, people can still write comments about what they forecasted and how they arrived at their conclusions. 

Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)

All the comparisons between forecasting and traditional fact-checking are weird because they seem to address different issues; forecasting doesn't seem to be a replacement or alternative to fact-checking. 

I didn’t mean to suggest that forecasting should replace fact-checking (though I can now see how our post and appendix conveyed that message). When comparing forecasting to fact-checking, I had in mind whether one should design recommendation algorithms to punish people whose statements were labeled false by fact-checkers. 

In general, this seems to require a lot of editorial judgment on the part of Facebook as to what forecasting questions to use and what resolution criteria. [...] My guess is that this sort of editorial role is not something that social media platforms would be particularly enthusiastic about 

Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible. 

Another way to game the system that you didn't mention here: set up a bunch of accounts, make different predictions on each of them, and then abandon all the ones that got low scores, and start posting the stuff you want on the account that got a high score.

I agree that this is an issue. In practice, it doesn’t seem that concerning though. First, the recommendation algorithm would obviously need to take into account the number of forecasts in addition to their average accuracy in order to minimize rewarding statistical flukes. (Similarly to how Yelp displays restaurants with, say, an average of 4.5 rating but 100 ratings more prominently than restaurants with an average rating of 5.0 but only 5 ratings.) Thus, you would actually need to put in a lot of work to make this worthwhile (and set up, say, hundreds of accounts) or get very lucky (which is of course always possible). 

It would probably also be prudent to put in some sort of decay to the forecasting accuracy boosting (such that a good forecasting accuracy, say, 10 years ago matters less than a good forecasting accuracy in this year) in order to incentivize users to continue making forecasts. Otherwise, people who achieved a very high forecasting accuracy in year 1 would be inclined to stop forecasting in order to avoid a regression to the mean.  

I wonder if it might make more sense to think of this as a feature on a website like FiveThirtyEight that already has an audience that's interested in probabilistic predictions and models. 

Yeah, that’s an interesting idea. On the other hand, FiveThirtyEight is much smaller and it’s readers are presumably already more sophisticated so the potential upside seems smaller. 

That being said, I agree that it might make more sense to focus on platforms with a more sophisticated user base (like, say, Substack). Or focus on news outlets like, say, the Washington Post. That might even be more promising. 

For what it’s worth, while Facebook’s Forecast was met with some amount of skepticism, I wouldn’t say it was “dismissed” out of hand.

 

To clarify, when I made the comment about it being "dismissed", I wasn't thinking so much about media coverage as I was about individual Facebook users seeing prediction app suggestions in their feed  I was thinking that there are already a lot of unscientific and clickbait-y quizzes and games that get posted to Facebook, and was concerned that users might lump this in with those if it is presented in a similar way.

 

Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible. 

I agree, and I definitely admit that the existence of the Facebook Forecast app is evidence against my view. I was more focused on the idea that if the recommender algorithm is based on prediction scores, that would mean that Facebook's choice of which questions to use would affect the recommendations across Facebook. 

The forecasting accuracy of Forecast’s users was also fairly good: “Forecast's midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement's published result of 0.227 for prediction markets.”

For what it's worth , as noted in Nuño's comment this comparison holds little weight when the questions aren't the same or on the same time scales; I'd take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.

as noted in Nuño's comment this comparison holds little weight when the questions aren't the same or on the same time scales

Right, definitely, I forgot to add this. I wasn't trying to say that Forecast is more accurate than real-money prediction markets (or other forecasting platforms for that matter) but rather that Forecasts' forecasting accuracy is at least clearly above the this-is-silly level.

Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)

You also get feedback in form of the community median prediction on Metaculus and GJOpen, which in my experience is usually useful as feedback. Though I do think in general following the reasoning of competent individuals is very useful, but I think the comments and helpful people that enjoy teaching their skills do a solid job covering that.