Will Howard

Software Engineer @ Centre for Effective Altruism
146 karmaJoined Aug 2022London, UK


I'm a developer on the EA Forum (the website you are currently on). You can contact me about forum stuff at will.howard@centreforeffectivealtruism.org or about anything else at w.howard256@gmail.com


Topic Contributions

A complaint about using average Brier scores

Comparing average Brier scores between people only makes sense if they have made predictions on exactly the same questions, because making predictions on more certain questions (such as "will there be a 9.0 earthquake in the next year?") will tend to give you a much better Brier score than making predictions on more uncertain questions (such as "will this coin come up head or tails?"). This is one of those things that lots of people know but then everyone (including me) keeps using them anyway because it's a nice simple number to look at.

To explain:

The Brier score for a binary prediction is the squared difference between the predicted probability and the actual outcome . For a given forecast, predicting the correct probability will give you the minimum possible Brier score (which is what you want). But this minimum possible score varies depending on the true probability of the event happening.

For the coin flip the true probability is 0.5, so if you make a perfect prediction you will get a Brier score of 0.25 (). For the earthquake question maybe the correct probability is 0.1, so the best expected Brier score you can get is 0.09 (), and it's only if you are really badly wrong (you think ) that you can get a score higher than the best score you can get for the coin flip.

So if forecasters have a choice of questions to make predictions on, someone who mainly goes for things that are pretty certain will end up with a (much!) better average Brier score than someone who predicts things that are genuinely more 50/50. This also acts as a disincentive for predicting more uncertain things which seems bad.

We've just added Fatebook (which is great!) to our slack and I've noticed this putting me off making forecasts for things that are highly uncertain. I'm interested in if there is some lore around dealing with this among people who use Metaculus or other platforms where Brier scores are an important metric. I only really use prediction markets, which don't suffer from this problem.

Note: this also applies to log scores etc

Yes this applies to all requests including /graphql. If the user agent of the request matches a known bot we will return a redirect to the forum-bots site. Some libraries (such as python requests and fetch in javascript) automatically follow redirects so hopefully some things will magically keep working, but this is not guaranteed.

I appreciate that this is annoying, and we didn't really want to do it. But the site was being taken down by bots (for a few minutes) almost every day a couple of weeks ago so we finally felt this was necessary.

Thanks for the suggestion! We'll add a user setting for this 👍

This is a good post. I think that in practice the inflation/opportunity cost consideration is by far the biggest effect here. Some reasons:

  • It applies a definite bias in the same direction to all long term markets (pushes them away from the extremes). Hedging might result in a one sided distortion in some cases, if really cold temperatures are bad for crops but not the other way around. But not every hedge-able market has the exact same distortion
  • It affects the decisions of all bettors on all markets. Whereas other distortions only apply to a niche subset such as Spanish farmers or the overly risk averse
  • Importantly, it impacts the decisions of more skilled predictors more strongly. This is because they can expect a higher return on average, so they have a higher opportunity cost.

    E.g. for an unskilled predictor looking at a market that resolves in 1 year, they might only bet if they expect to make over a 10% return; so if their probability estimate falls outside a narrow band close to the market price it still makes sense for them to bet. But a skilled predictor might be making 50% a year on average, so a much wider range of probabilities are wiped out. 

    This is a big effect on Manifold because the top predictors tend to double their money every few months, so there is not much incentive for them to bet on markets longer than a year[1] unless they have a very large edge. I'm not sure how much this effect applies to real money markets
  1. ^

    There is a loan system which helps with this somewhat, but also increases your risk exposure so it's not clear what the overall effect is

Most deaths in war aren’t from gunshots

This is an edited version of a memo I shared within the online team at CEA. It’s about the forum, but you could also make it about other stuff. (Note: this is just my personal opinion)

There's this stylised fact about war that almost none of the deaths are caused by gunshots, which is surprising given that for the average soldier war consists of walking around with a gun and occasionally pointing it at people. Whether or not this is actually true, the lesson that quoters of this fact are trying to teach is that the possibility of something happening can have a big impact on the course of events, even if it very rarely actually happens.

[warning: analogy abuse incoming]

I think a similar thing can happen on the forum, and trying to understand what’s going on in a very data driven way will tend to lead us astray in cases like this.

A concrete example of this is people being apprehensive about posting on the forum, and saying this is because they are afraid of criticism. But if you go and look through all the comments there aren’t actually that many examples of well intentioned posts being torn apart. At this point if you’re being very data minded you would say “well I guess people are wrong, posts don’t actually get torn apart in the comments; so we should just encourage people to overcome their fear of posting (or something)”.

I think this is probably wrong because something like this happens: users correctly identify that people would tear their post apart if it was bad, so they either don’t write the post at all, or they put a lot of effort into making it good. The result of this is that the amount of realised harsh criticism on the forum is low, and the quality of posts is generally high (compared to other forums, facebook, etc).

I would guess that criticising actually-bad posts even more harshly would in fact lower the total amount of criticism, for the same reason that hanging people for stealing bread probably lowered the theft rate among victorian street urchins (this would probably also be bad for the same reason)

Do you feel like shortform achieves this to some extent, or are you thinking of something different?

Thanks for the suggestion! We're planning some changes to user profiles soon and we'll consider adding something like this

Answer by Will HowardApr 19, 2023151

An AI pause debate would be interesting (maybe @Matthew_Barnett would be interesting in debating someone?)

Update on the home page algorithm

As mentioned above, we are now testing out some changes designed to make the home page algorithm better suited for users who visit more or less often. This will make it so that people who visit every day will see a home page similar to the existing algorithm, while less frequent users will see a “slower” home page which is weighted less towards recency and more towards karma.

We are rolling this out as an A/B test, so initially only 1/3rd of users will get the new algorithm. You can deliberately opt in (or out) by going here and selecting “New ‘slower’ frontpage algorithm” in the final dropdown. Some more details about the changes:

  • This will apply to both logged-in and logged-out users. For logged out users: it's calculated by device for you, so if you visit on your phone and your laptop the home page may be different
  • (shh don’t tell anyone) If you’re really a power user you can set the exact speed of the algorithm by adding “?algoActivityFactor=[number]” to the home page url like so. “algoActivityFactor” is a number between 0 and 1 which we calculate for each user based on their visit frequency. A value of 1 indicates daily visits, 0 means you’ve never visited, and a value around 0.5 corresponds to visiting every ~3 days. Keep in mind that this is mainly intended for debugging, so we can’t guarantee it’ll work forever

We’d love to hear your feedback on this, such as “I’m a daily user but I still think the slower algorithm is way better for me” (or the opposite). You can reply here or reach out to us at forum@centreforeffectivealtruism.org

Answer by Will HowardMar 03, 202340

Not a book but @Kirsten has some good simple recipes in this post

I like this one and the use of the word "thrice":

DHALL (onions, red lentils, rice)

- fry onions

- add red lentils and thrice that much water and boil until cooked

- add garlic paste and spices 

- whilst cooking cook rice to serve with

- optionally whilst cooking slow fry (caramalise) some onions to serve with.

Load more