Research Engineer at Ought. Interested in all things EA but especially cause prioritization, forecasting, and AI safety. More at https://www.elilifland.com/
Appreciate the compliment. I am interested in making it a Forum post, but might want to do some more editing/cleanup or writing over next few weeks/months (it got more interest than I was expecting so seems more likely to be worth it now). Might also post as is, will think about it more soon.
Hi Lizka, thanks for your feedback and think it touched on some of the sections that I'm most unsure about / could most use some revision which is great!
[Bottlenecks] You suggest "Organizations and individuals (stakeholders) making important decisions are willing to use crowd forecasting to help inform decision making" as a crucial step in the "story" of crowd forecasting’s success (the "pathway to impact"?) --- this seems very true to me. But then you write "I doubt this is the main bottleneck right now but it may be in the future" (and don't really return to this).
I'll say up front it's possible I'm just wrong about the importance of the bottleneck here, and I think it also interacts with the other bottlenecks in a tricky way. E.g. if there were a clearer pipeline for creating important questions which get very high quality crowd forecasts which then affect decisions, more organizations would be interested.
That being said, my intuition that this is not the bottleneck comes from some personal experiences I've had with forecasts solicited by orgs that already are interested in using crowd forecasts to inform decision making. Speaking from the perspective of a forecaster, I personally wouldn't have trusted the forecasts produced as an input into important decisions.
Some examples: [Disclaimer: These are my personal impressions. Creating impactful questions and incentivizing forecaster effort is really hard and I respect OP//RP/Metaculus a lot for giving it a shot, and would love to be proven wrong about the impact of current initiatives like these]
So my argument is: given that AFAIK we haven't had consistent success using crowd forecasts to help institutions making important decisions, the main bottleneck seems to be helping the interested institutions rather than getting more institutions interested.
If, say, the CDC (or important people there, etc.) were interested in using Metaculus to inform their decision-making, do you think they would be unable to do so due to a lack of interest (among forecasters) and/or a lack of relevant forecasting questions? (But then, could they not tell suggest questions they felt were relevant to their decisions?) Or do you think that the quality of answers they would get (or the amount of faith they would be able to put into those answers) wouldn't be sufficient?
[Caveat: I don't feel too qualified too opine on this point since I'm not a stakeholder nor have I interviewed ones, but I'll give my best guess.]
I think for the CDC example:
Overall, I'd expect slightly decent forecasts on good but not great questions and I think that this isn't really enough to move the needle, so to speak. I also think there would need to be reasoning given behind the forecasts for stakeholder to understand and trust in crowd forecasts would need to be built up over time.
Part of the reason it seems tricky to have impactful forecasts is that often there are competing people/"camps" with different world models, and a person which the crowd forecast disagrees with may be reluctant to change their mind unless (a) the question is well targeted at cruxes of the disagreement and (b) they have built up trust of the forecasters and their reasoning process. To the extent this is true within the CDC, the harder it seems for forecasting questions to be impactful.
2. [Separate, minor confusion] You say: "Forecasts are impactful to the extent that they affect important decisions," and then you suggest examples a-d ("from an EA perspective") that range from career decisions or what seem like personal donation choices to widely applicable questions like "Should AI alignment researchers be preparing more for a world with shorter or longer timelines?" and "What actions should we recommend the US government take to minimize pandemic risk?" This makes me confused about the space (or range) of decisions and decision-makers that you are considering here.
Yeah I think this is basically right, I will edit the draft.
[Side note] I loved the section "Idea for question creation process: double crux creation," and in general the number of possible solutions that you list, and really hope that people try these out or study them more. (I also think you identify other really important bottlenecks).
I hope so too, appreciate it!
I wrote a draft outline on bottlenecks to more impactful crowd forecasting that I decided to share in its current form rather than clean up into a post.
A third perspective roughly justifies the current position; we should discount the future at the rate current humans think is appropriate, but also separately place significant value on having a positive long term future.
I feel that EA shouldn't spend all or nearly all of its resources on the far future, but I'm uncomfortable with incorporating a moral discount rate for future humans as part of "regular longtermism" since it's very intuitive to me that future lives should matter the same amount as present ones.
I prefer objections from the epistemic challenge, which I'm uncertain enough about to feel that various factors e.g. personal fit, flow-through effects, gaining experience in several domains means that it doesn't make sense for EA to go "all-in". An important aspect of personal fit is comfort working on very low probability bets.
I'm curious how common this feeling is, vs. feeling okay with a moral discount rate as part of one's view. There's some relevant discussion under the comment linked in the post.
Overall I like this idea, appreciate the expansiveness of the considerations discussed in the post, and would excited to hear takes from people working at social media companies.
Thoughts on the post directly
Broadly, we envision i) automatically suggesting questions of likely interest to the user—e.g., questions related to the user’s current post or trending topics—and ii) rewarding users with higher than average forecasting accuracy with increased visibility
I think some version of some type of boosting visibility based on forecasting accuracy seems promising, but I feel uneasy about how this would be implemented. I'm concerned about (a) how this will be traded off with other qualities and (b) ensuring that current forecasting accuracy is actually a good proxy.
On (a), I think forecasting accuracy and the qualities it's a proxy for represent a small subset of the space that determines which content I'd like to see promoted; e.g. it seems likely to be loosely correlated with writing quality. It may be tricky to strike the right balance in terms of how the promotion system works.
That said, these issues can be mitigated with iteration on the forecasting feature if the people implementing it are careful and aware of these considerations.
Generally, it might be best if the recommendation algorithms don’t reward accurate forecasts in socially irrelevant domains such as sports—or reward them less so.
Insofar as the intent is to incentivize people to predict on more socially relevant domains, I agree. But I think forecasting accuracy on sports, etc. is likely strongly correlated with performance in other domains. Additionally, people may feel more comfortable forecasting on things like sports than other domains which may be more politically charged.
My experience with Facebook Forecast compared to Metaculus
I've been forecasting regularly on Metaculus for about 9 months and Forecast for about 1 month.
The forecasting accuracy of Forecast’s users was also fairly good: “Forecast's midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement's published result of 0.227 for prediction markets.”
For what it's worth , as noted in Nuño's comment this comparison holds little weight when the questions aren't the same or on the same time scales; I'd take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.
My forecast is pretty heavily based on the GoodJudgment article How to Become a Superforecaster. According to it they identify Superforecasters each autumn and require forecasters to have made 100 forecasts (I assume 100 resolved), so now might actually be the worst time to start forecasting. It looks like if you started predicting now the 100th question wouldn't close until the end of 2020. Therefore it seems very unlikely you'd be able to become a Superforecaster in this autumn's batch.
[Note: alexrjl clarified over PM that I should treat this as "Given that I make a decision in July 2020 to try to become a Superforecaster" and not assume he would persist for the whole 2 years.]
This left most of my probability mass given you becoming a Superforecaster eventually on you making the 2021 batch, which requires you to both stick with it for over a year and perform well enough to become a Superforecaster. If I were to spend more time on this I would refine my estimates of how likely each of those are.
I assumed if you didn't make the 2021 batch you'd probably call it quits before the 2022 batch or not be outperforming the GJO crowd by enough to make it, and even if you didn't you made that batch you might not officially become a Superforecaster before 2023.
Overall I ended up with a 36% chance of you becoming a Superforecaster in the next 2 years. I'm curious to hear if your own estimate would be significantly different.
Here's my forecast. The past is the best predictor of the future, so I looked at past monthly data as the base rate.
I first tried to tease out whether there was a correlation in which months had more activity between 2020 and 2019. It seemed there was a weak negative correlation, so I figured my base rate should be just based on the past few months of data.
In addition to the past few months of data, I considered that part of the catalyst for record-setting July activity might be Aaron's "Why you should put on the EA Forum" EAGx talk. Due to this possibility, I gave August a 65% chance of hitting over the base rate of 105 >=10 karma posts.
My numerical analysis is in this sheet.
I've recently gotten into forecasting and have also been a strategy game addict enthusiast at several points in my life. I'm curious about your thoughts on the links between the two:
Relevant Metaculus question about whether the impact of the Effective Altruism movement will still be picked up by Google Trends in 2030 (specifically, whether it will have at least .2 times the total interest from 2017) has a community prediction of 70%