elifland

Research Engineer at Ought. Interested in all things EA but especially cause prioritization, forecasting, and AI safety. More at https://www.elilifland.com/

Posts

Sorted by New

Wiki Contributions

Comments

elifland's Shortform

Appreciate the compliment. I am interested in making it a Forum post, but might want to do some more editing/cleanup or writing over next few weeks/months (it got more interest than I was expecting so seems more likely to be worth it now). Might also post as is, will think about it more soon.

elifland's Shortform

Hi Lizka, thanks for your feedback and think it touched on some of the sections that I'm most unsure about / could most use some revision which is great!

  1. [Bottlenecks] You suggest "Organizations and individuals (stakeholders) making important decisions are willing to use crowd forecasting to help inform decision making" as a crucial step in the "story" of crowd forecasting’s success (the "pathway to impact"?) --- this seems very true to me. But then you write "I doubt this is the main bottleneck right now but it may be in the future" (and don't really return to this).

I'll say up front it's possible I'm just wrong about the importance of the bottleneck here, and I think it also interacts with the other bottlenecks in a tricky way. E.g. if there were a clearer pipeline for creating important questions which get very high quality crowd forecasts which then affect decisions, more organizations would be interested. 

That being said, my intuition that this is not the bottleneck comes from some personal experiences I've had with forecasts solicited by orgs that already are interested in using crowd forecasts to inform decision making. Speaking from the perspective of a forecaster, I personally wouldn't have trusted the forecasts produced as an input into important decisions. 

Some examples: [Disclaimer: These are my personal impressions. Creating impactful questions and incentivizing forecaster effort is really hard and I respect OP//RP/Metaculus a lot for giving it a shot, and would love to be proven wrong about the impact of current initiatives like these]

  1. The Open Philanthropy/Metaculus Forecasting AI Progress Tournament is the most well-funded initiative I know of [ETA: potentially besides those contracting Good Judgment superforecasters], but my best guess is that the forecasts resulting from it will not be impactful. An example is the "deep learning" longest time horizon round, where despite Metaculus' best efforts most questions have no-few comments and at least to me it felt like the bulk of the forecasting skill was forming a continuous distribution from trend extrapolation. See also this question where the community failed to fully update on record-breaking scores appropriately. Also note that each question attracted only 25-35 forecasters.
  2. I feel less sure about this, but the RP's animal welfare questions authored by Neil Dullaghan seem to have the majority of comments on them by Neil himself. I feel intuitively skeptical that most of the 25-45 forecasters per question are doing more than skimming and making minor adjustments to the current community forecast, and this feels like an area where getting up to speed on domain knowledge is important to accurate forecasts.

So my argument is: given that AFAIK we haven't had consistent success using crowd forecasts to help institutions making important decisions, the main bottleneck seems to be helping the interested institutions rather than getting more institutions interested. 

If, say, the CDC (or important people there, etc.) were interested in using Metaculus to inform their decision-making, do you think they would be unable to do so due to a lack of interest (among forecasters) and/or a lack of relevant forecasting questions? (But then, could they not tell suggest questions they felt were relevant to their decisions?) Or do you think that the quality of answers they would get (or the amount of faith they would be able to put into those answers) wouldn't be sufficient?

[Caveat: I don't feel too qualified too opine on this point since I'm not a stakeholder nor have I interviewed ones, but I'll give my best guess.]

I think for the CDC example:

  1. Creating impactful questions seems relatively easier here than in e.g. the AI safety domain, though it still may be non-trivial to identify and operationalize cruxes for which predictions would actually lead to different decisions.
  2. I'd on average expect the forecasts to be a bit better than CDC models / domain experts. Perhaps substantially better on tail risks. Don't think we have a lot of evidence here, we have some from Metaculus tournaments with a small sample size.
    1. I think with better incentives to allocate more forecaster effort to this project, it's possible the forecasts could be much better.

Overall, I'd expect slightly decent forecasts on good but not great questions and I think that this isn't really enough to move the needle, so to speak. I also think there would need to be reasoning given behind the forecasts for stakeholder to understand and trust in crowd forecasts would need to be built up over time.

Part of the reason it seems tricky to have impactful forecasts is that often there are competing people/"camps" with different world models, and a person which the crowd forecast disagrees with may be reluctant to change their mind unless (a) the question is well targeted at cruxes of the disagreement and (b) they have built up trust of the forecasters and their reasoning process. To the extent this is true within the CDC, the harder it seems for forecasting questions to be impactful.

2. [Separate, minor confusion] You say: "Forecasts are impactful to the extent that they affect important decisions," and then you suggest examples a-d ("from an EA perspective") that range from career decisions or what seem like personal donation choices to widely applicable questions like "Should AI alignment researchers be preparing more for a world with shorter or longer timelines?" and "What actions should we recommend the US government take to minimize pandemic risk?" This makes me confused about the space (or range) of decisions and decision-makers that you are considering here. 

Yeah I think this is basically right, I will edit the draft.

  1. [Side note] I loved the section "Idea for question creation process: double crux creation," and in general the number of possible solutions that you list, and really hope that people try these out or study them more. (I also think you identify  other really important bottlenecks).

I hope so too, appreciate it!

elifland's Shortform

I wrote a draft outline on bottlenecks to more impactful crowd forecasting that I decided to share in its current form rather than clean up into a post.

Link

Summary:

  1. I have some intuition that crowd forecasting could be a useful tool for important decisions like cause prioritization but feel uncertain
  2. I’m not aware of many example success stories of crowd forecasts impacting important decisions, so I define a simple framework for how crowd forecasts could be impactful:
    1. Organizations and individuals (stakeholders) making important decisions are willing to use crowd forecasting to help inform decision making
    2. Forecasting questions are written such that their forecasts will affect the important decisions of stakeholders
    3. The forecasts are good + well-reasoned enough that they are actually useful and trustworthy for stakeholders
  3. I discuss 3 bottlenecks to success stories and possible solutions:
    1. Creating the important questions
    2. Incentivizing time spent on important questions
    3. Incentivizing forecasters to collaborate
Towards a Weaker Longtermism

A third perspective roughly justifies the current position; we should discount the future at the rate current humans think is appropriate, but also separately place significant value on having a positive long term future.

 

I feel that EA shouldn't spend all or nearly all of its resources on the far future, but I'm uncomfortable with incorporating a moral discount rate for future humans as part of "regular longtermism" since it's very intuitive to me that future lives should matter the same amount as present ones.

I prefer objections from the epistemic challenge, which I'm uncertain enough about to feel that various factors e.g. personal fit, flow-through effects, gaining experience in several domains means that it doesn't make sense for EA to go "all-in". An important aspect of personal fit is comfort working on very low probability bets.

I'm curious how common this feeling is, vs. feeling okay with a moral discount rate as part of one's view. There's some relevant discussion under the  comment linked in the post.

Incentivizing forecasting via social media

Overall I like this idea, appreciate the expansiveness of the considerations discussed in the post, and would excited to hear takes from people working at social media companies.

Thoughts on the post directly

Broadly, we envision i) automatically suggesting questions of likely interest to the user—e.g., questions related to the user’s current post or trending topics—and ii) rewarding users with higher than average forecasting accuracy with increased visibility

I think some version of some type of boosting visibility based on forecasting accuracy seems promising, but I feel uneasy about how this would be implemented. I'm concerned about (a) how this will be traded off with other qualities and (b) ensuring that current forecasting accuracy is actually a good proxy.

On (a), I think forecasting accuracy and the qualities it's a proxy for represent a small subset of the space that determines which content I'd like to see promoted; e.g. it seems likely to be loosely correlated with writing quality. It may be tricky to strike the right balance in terms of how the promotion system works.

On (b):

  1. Promoting and demoting content based on a small sample size of forecasts. In practice it often takes many resolved questions to discern which forecasters are more accurate, and I'm worried that it will be easy to increase/decrease visibility too early.
  2. Even without a small sample size, there may be issues with many of the questions being correlated. I'm imagining a world in which lots of people predict on correlated questions about the 2016 presidential election, then Trump supporters get a huge boost in visibility after he wins because they do well on all of them.

That said, these issues can be mitigated with iteration on the forecasting feature if the people implementing it are careful and aware of these considerations. 

Generally, it might be best if the recommendation algorithms don’t reward accurate forecasts in socially irrelevant domains such as sports—or reward them less so.

Insofar as the intent is to incentivize people to predict on more socially relevant domains, I agree. But I think forecasting accuracy on sports, etc. is likely strongly correlated with performance in other domains. Additionally, people may feel more comfortable forecasting on things like sports than other domains which may be more politically charged.

My experience with Facebook Forecast compared to Metaculus

I've been forecasting regularly on Metaculus for about 9 months and Forecast for about 1 month.

  1. I don't feel as pressured to regularly go back and update my old predictions on Forecast as on Metaculus since Forecast is a play-money prediction market rather than a prediction platform. On Metaculus if I predict 60% and the community is at 50%, then don't update for 6 months and the community has over time moved to 95%, I'm at a huge disadvantage in terms of score relative to predictors who did update. But with a prediction market, if I buy  shares at 50 cents and the price of the shares go up to 95 cents, it just helps me. The prediction market structure makes me feel less pressured to continually update on old questions, which has both its positives and negatives but seems good for a social media forecasting structure. 
  2. The aggregate on Forecast is often decent, but occasionally horrible more egregiously and more often than on Metaculus (e.g. this morning I bought some shares for Kelly Loeffler to win the Georgia senate runoff at as low as ~5 points implying 5% odds, while election betting odds currently have Loeffler at 62%). The most common reasons I've noticed are: 
    1. People misunderstand how the market works and bet on whichever outcome they think is most probable, regardless of the prices.
    2. People don't make the error described in (1) (that I can tell), but are over-confident.
    3. People don't read the resolution criteria carefully.
    4. Political biases. 
    5. There aren't many predictors so the aggregate can be swung easily.
  3. As hinted at in the post, there's an issue with being able to copy the best predictors. I've followed 2 of the top predictors on Forecast and usually agree with their analyses and buy into the same markets with the same positions.
  4. Forecast currently gives points when other people forecast based on your "reasons" (aka comments), and these points are then aggregated on the leaderboard with points gained from actual predictions. I wish there were separate leaderboards for these.
Incentivizing forecasting via social media

The forecasting accuracy of Forecast’s users was also fairly good: “Forecast's midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement's published result of 0.227 for prediction markets.”

For what it's worth , as noted in Nuño's comment this comparison holds little weight when the questions aren't the same or on the same time scales; I'd take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.

Delegate a forecast

My forecast is pretty heavily based on the GoodJudgment article How to Become a Superforecaster. According to it they identify Superforecasters each autumn and require forecasters to have made 100 forecasts (I assume 100 resolved), so now might actually be the worst time to start forecasting. It looks like if you started predicting now the 100th question wouldn't close until the end of 2020. Therefore it seems very unlikely you'd be able to become a Superforecaster in this autumn's batch.

[Note: alexrjl clarified over PM that I should treat this as "Given that I make a decision in July 2020 to try to become a Superforecaster" and not assume he would persist for the whole 2 years.]

This left most of my probability mass given you becoming a Superforecaster eventually on you making the 2021 batch, which requires you to both stick with it for over a year and perform well enough to become a Superforecaster. If I were to spend more time on this I would refine my estimates of how likely each of those are.

I assumed if you didn't make the 2021 batch you'd probably call it quits before the 2022 batch or not be outperforming the GJO crowd by enough to make it, and even if you didn't you made that batch you might not officially become a Superforecaster before 2023.

Overall I ended up with a 36% chance of you becoming a Superforecaster in the next 2 years. I'm curious to hear if your own estimate would be significantly different.

Delegate a forecast

Here's my forecast. The past is the best predictor of the future, so I looked at past monthly data as the base rate.

I first tried to tease out whether there was a correlation in which months had more activity between 2020 and 2019. It seemed there was a weak negative correlation, so I figured my base rate should be just based on the past few months of data.

In addition to the past few months of data, I considered that part of the catalyst for record-setting July activity might be Aaron's "Why you should put on the EA Forum" EAGx talk. Due to this possibility, I gave August a 65% chance of hitting over the base rate of 105 >=10 karma posts.

My numerical analysis is in this sheet.

I'm Linch Zhang, an amateur COVID-19 forecaster and generalist EA. AMA

I've recently gotten into forecasting and have also been a strategy game addict enthusiast at several points in my life. I'm curious about your thoughts on the links between the two:

  • How correlated is skill at forecasting and strategy games?
  • Does playing strategy games make you better at forecasting?
Problem areas beyond 80,000 Hours' current priorities

Relevant Metaculus question about whether the impact of the Effective Altruism movement will still be picked up by Google Trends in 2030 (specifically, whether it will have at least .2 times the total interest from 2017) has a community prediction of 70%

Load More