Thank you - I will update accordingly.
I didn't get the impression from this transcript that Rory Stewart has just heard of cash transfers - is there any part which implied that? It felt to me more like bringing-the-listener-with-him kind of speak to convey a weird but exciting idea.
Reading the transcript cold, maybe it doesn't give that impression. If you're willing to listen to the episodes (there's two of them and the topic comes up a few times intersperced throughout) I'd be interested if your view changes with his joke. (He certainly gives off a tone of surprise). I also think this:
... (read more)I
I feel like I'm going to upvote both? There seem to be some significant (specific) errors, but the message is broadly correct.
To be clear - I think that this is on net a good thing. This podcast will probably introduce both GiveDirectly and EA ideas to a wider audience. Having written up this transcript, I am also less disappointed about how this came across than I was when I first heard this at 2x-speed. That said, I still find two things fairly depressing:
Capital market investors would be attracted to these financial products because they are not correlated with developed world asset prices. As mentioned before, these investments can also hedge against climate risks and GCRs.
Lots of products aren't correlated to financial markets. (Betting on sports for example). That doesn't mean investors want to put money in.
Another point is that if they hedge against climate risk, and you think climate risk will materially affect the world, then you should expect these products to be correlated to the market. (But at least then they might have some excess return).
Capital market investors would be attracted to these finance products due to high returns and a lack of correlation with developed world asset prices. As mentioned before, these investments can also hedge against climate risks and GCRs.
Why should we expect high returns? ILS / "Cat Bonds" don't seem to have especially high returns, and I'm not sure what the economic justification for them having high returns would be?
My general take on this space is:
Re 1: the disconnect between decision makers and forecasting platforms. I think the problem comes in two directions.
You might be interested in both: "Most Likes" and "h-Index" metrics on MetaculusExtras which does have a visible upvote score. (Although I agree it would be nice to have it on Metaculus proper)
Some nitpicks:
Forecasts have been more accurate than random 94% of the time since 2015
This is a terrible metric since most people looking at most questions on Metaculus wouldn't think they are all 50/50 coin flips.
Augur’s solution to this issue is to provide predictors with a set of outcomes on which predictors stake their earnings on the true outcome. Presumably, the most staked-on outcome is what actually happened (such as Biden winning the popular vote being the true outcome). In turn, predictors are rewarded for staking on true outcomes.
This doesn't ac... (read more)
Looking at the rolling performance of your method (optimize on last 100 and use that to predict), median and geo mean odds, I find they have been ~indistinguishable over the last ~200 questions. If I look at the exact numbers, extremized_last_100 does win marginally, but looking at that chart I'd have a hard time saying "there's a 70% chance it wins over the next 100 questions". If you're interested in betting at 70% odds I'd be interested.
... (read more)There seems to be a long tradition of extremizing in the academic literature (see the reference in the post above). Th
This has restored my faith on extremization
I think this is the wrong way to look at this.
Metaculus was way underconfident originally. (Prior to 2020, 22% using their metric). Recently it has been much better calibrated - (2020- now, 4% using their metric).
Of course if they are underconfident then extremizing will improve the forecast, but the question is what is most predictive going forward. Given that before 2020 they were 22% underconfident, more recently 4% underconfident, it seems foolhardy to expect them to be underconfident going forward.
I would NOT... (read more)
It's not clear to me that "fitting a Beta distribution and using one of it's statistics" is different from just taking the mean of the probabilities.
I fitting a beta distribution to Metaculus forecasts and looked at:
Scattering these 5 values against each other I get:
We can see fitted values are closely aligned with the mean and mean-log-odds, but not with the median. (Unsurprising when you consider the ~parametric formula for the mean / median).
The performan... (read more)
I investigated this, and it doesn’t look like there is much evidence for herding among Metaculus users to any noticeable extent, or if there is herding, it doesn’t seem to increase as the number of predictors rises.
1. People REALLY like predicting multiples of 5
2. People still like predicting the median after accounting for this (eg looking at questions where the median isn't a multiple of 5)
(Another way to see how much forecasters love those multiples of 5)
If one had access to the individual predictions, one could also try to take 1000 random bootstrap samples of size 1 of all the predictions, then 1000 random bootstrap samples of size 2, and so on and measure how accuracy changes with larger random samples. This might also be possible with data from other prediction sites.
I discussed this with Charles. It's not possible to do exactly this with the API, but we can approximate this by looking at the final predictions just before close.
We can see that:
I created a question series on Metaculus to see how big an effect this is and how the community might forecast this going forward.
If I was to summarise your post in another way, it would be this:
The biggest problem with pooling is that a point estimate isn't the end goal. In most applications you care about some transform of the estimate. In general, you're better off keeping all of the information (ie your new prior) rather than just a point estimate of said prior.
I disagree with you that the most natural prior is "mixture distribution over experts". (Although I wonder how much that actually ends up mattering in the real world).
I also think something "interesting" is being sai... (read more)
import requests, json
import numpy as np
import pandas as pd
def fetch_results_data():
response = {"next":"https://www.metaculus.com/api2/questions/?limit=100&status=resolved"}
results = []
while response["next"] is not None:
print(response["next"])
response = json.loads(requests.get(response["next"]).text)
results.append(response["results"])
return sum(results,[])
all_results = fetch_results_data()
binary_qns = [q for q in all_results if q['possibilities']['type'] == 'binary' and q['resolution'] in [0,1]]
bi
... (read more)"more questions resolve positively than users expect"
Users expect 50 to resolve positively, but actually 60 resolve positive.
"users expect more questions to resolve positive than actually resolve positive"
Users expect 50 to resolve positive, but actually 40 resolve positive.
I have now editted the original comment to be clearer?
but also the average predictor improving their ability also fixed that underconfidence
What do mean by this?
I mean in the past people were underconfident (so extremizing would make their predictions better). Since then they've stopped being underconfident. My assumption is that this is because the average predictor is now more skilled or because more predictors improves the quality of the average.
Doesn't that mean that it should be less accurate, given the bias towards questions resolving positively?
The bias isn't that more questions resolve pos... (read more)
I find it very interesting that the extremized version was consistently below by a narrow margin. I wonder if this means that there is a subset of questions where it works well, and another where it underperforms.
I think it's actually that historically the Metaculus community was underconfident (see track record here before 2020 vs after 2020).
Extremizing fixes that underconfidence, but also the average predictor improving their ability also fixed that underconfidence.
One question / nitpick: what do you mean by geometric mean of the probabilities?
Met... (read more)
Yes - copy and paste fail - now corrected
brier | -log | |
---|---|---|
metaculus_prediction | 0.110 | 0.360 |
geo_mean_weighted | 0.115 | 0.369 |
extr_geo_mean_odds_2.5_weighted | 0.116 | 0.387 |
geo_mean_odds_weighted | 0.117 | 0.371 |
median_weighted | 0.121 | 0.381 |
mean_weighted | 0.122 | 0.393 |
geo_mean_unweighted | 0.128 | 0.409 |
geo_mean_odds_unweighted | 0.130 | 0.410 |
extr_geo_mean_odds_2.5_unweighted | 0.131 | 0.431 |
median_unweighted | 0.134 | 0.417 |
mean_unweighted | 0.138 | 0.439 |
tl;dr The conclusions of this article hold up in an empirical test with Metaculus data
Looking at resolved binary Metaculus questions and using 5 different methods to pool the community estimate.
Also looking at two different scoring rules (Brier and Log) I find rankings as (smaller is better in my table):
You might be interested in my empirical look at this for Metaculus