This is a special post for quick takes by Will Howard. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since: Today at 5:15 AM

Tyler Cowen has this criticism of prediction markets which is like (paraphrased, plus slightly made up and mixed with my own opinions): "The whole concept is based on people individually trying to maximise their wealth, and this resulting in wealth accruing to the better predictors over time. But then in real life people just bet these token amounts that add up to way less than the money they get from their salary or normal investments. This completely defeats the point! You may as well just take the average probability at that point rather than introducing this overcomplicated mechanism".

Play money can fix this specific problem, because you can make it so everyone starts with the same amount, whereas real money is constantly streaming in and out for reasons other than your ability to predict esoteric world events. I think this is an underrated property of play money markets, as opposed to the usual arguments about risk aversion. (Of course if you can buy play money with real money this muddies the waters quite a bit)

Most deaths in war aren’t from gunshots

This is an edited version of a memo I shared within the online team at CEA. It’s about the forum, but you could also make it about other stuff. (Note: this is just my personal opinion)

There's this stylised fact about war that almost none of the deaths are caused by gunshots, which is surprising given that for the average soldier war consists of walking around with a gun and occasionally pointing it at people. Whether or not this is actually true, the lesson that quoters of this fact are trying to teach is that the possibility of something happening can have a big impact on the course of events, even if it very rarely actually happens.

[warning: analogy abuse incoming]

I think a similar thing can happen on the forum, and trying to understand what’s going on in a very data driven way will tend to lead us astray in cases like this.

A concrete example of this is people being apprehensive about posting on the forum, and saying this is because they are afraid of criticism. But if you go and look through all the comments there aren’t actually that many examples of well intentioned posts being torn apart. At this point if you’re being very data minded you would say “well I guess people are wrong, posts don’t actually get torn apart in the comments; so we should just encourage people to overcome their fear of posting (or something)”.

I think this is probably wrong because something like this happens: users correctly identify that people would tear their post apart if it was bad, so they either don’t write the post at all, or they put a lot of effort into making it good. The result of this is that the amount of realised harsh criticism on the forum is low, and the quality of posts is generally high (compared to other forums, facebook, etc).

I would guess that criticising actually-bad posts even more harshly would in fact lower the total amount of criticism, for the same reason that hanging people for stealing bread probably lowered the theft rate among victorian street urchins (this would probably also be bad for the same reason)

A complaint about using average Brier scores

Comparing average Brier scores between people only makes sense if they have made predictions on exactly the same questions, because making predictions on more certain questions (such as "will there be a 9.0 earthquake in the next year?") will tend to give you a much better Brier score than making predictions on more uncertain questions (such as "will this coin come up head or tails?"). This is one of those things that lots of people know but then everyone (including me) keeps using them anyway because it's a nice simple number to look at.

To explain:

The Brier score for a binary prediction is the squared difference between the predicted probability and the actual outcome . For a given forecast, predicting the correct probability will give you the minimum possible Brier score (which is what you want). But this minimum possible score varies depending on the true probability of the event happening.

For the coin flip the true probability is 0.5, so if you make a perfect prediction you will get a Brier score of 0.25 (). For the earthquake question maybe the correct probability is 0.1, so the best expected Brier score you can get is 0.09 (), and it's only if you are really badly wrong (you think ) that you can get a score higher than the best score you can get for the coin flip.

So if forecasters have a choice of questions to make predictions on, someone who mainly goes for things that are pretty certain will end up with a (much!) better average Brier score than someone who predicts things that are genuinely more 50/50. This also acts as a disincentive for predicting more uncertain things which seems bad.

We've just added Fatebook (which is great!) to our slack and I've noticed this putting me off making forecasts for things that are highly uncertain. I'm interested in if there is some lore around dealing with this among people who use Metaculus or other platforms where Brier scores are an important metric. I only really use prediction markets, which don't suffer from this problem.

Note: this also applies to log scores etc

Yeah, I'm starting to believe that a severe limitation on Brier scores is this inability to use them in a forward-looking way. Brier scores reflect the performance of specific people on specific questions and using them as evidence for future prediction performance seems really fraught...but it's the best we have as far as I can tell.