You can now import posts directly from Google docs
Plus, internal links to headers[1] will now be mapped over correctly. To import a doc, make sure it is public or shared with "eaforum.posts@gmail.com"[2], then use the widget on the new/edit post page:
Importing a doc will create a new (permanently saved) version of the post, but will not publish it, so it's safe to import updates into posts that are already published. You will need to click the "Publish Changes" button to update the live post.
Everything that previously worked on copy-paste[3] will also work when importing, with the addition of internal links to headers (which only work when importing).
There are still a few things that are known not to work:
Nested bullet points (these are working now)
Cropped images get uncropped (also working now)
Bullet points in footnotes (these will become separate un-bulleted lines)
Blockquotes (there isn't a direct analog of this in Google docs unfortunately)
There might be other issues that we don't know about. Please report any bugs or give any other feedback by replying to this quick take, you can also contact us in the usual ways.
Appendix: Version history
There are some minor improvements to the version history editor[4] that come along with this update:
You can load a version into the post editor without updating the live post, previously you could only hard-restore versions
The version that is live[5] on the post is shown in bold
Here's what it would look like just after you import a Google doc, but before you publish the changes. Note that the latest version isn't bold, indicating that it is not showing publicly:
Previously the link would take you back to the original doc, now it will take you to the header within the Forum post as you would expect. Internal links to bookmarks (where you link to a specific text selection) are also partially supported, although the link will only go to the paragraph the text selection is in
Sharing with this email address means that anyone can access the contents of your doc if they have the url, because they could go to the new post page and import it. It does mean they can't access the comments at least
I'm not sure how widespread this knowledge is, but previously the best way to copy from a Google doc was to first "Publish to the web" and then copy-paste from this published version. In particular this handles footnotes and tables, whereas pasting directly from a regular doc doesn't. The new importing feature should be equal to this publish-to-web copy-pasting, so will handle footnotes, tables, images etc. And then it additionally supports internal links
For most intents and purposes you can think of "live" as meaning "showing publicly". There is a bit of a sharp corner in this definition, in that the post as a whole can still be a draft.
To spell this out: There can be many different versions of a post body, only one of these is attached to the post, this is the "live" version. This live version is what shows on the non-editing view of the post. Independently of this, the post as a whole can be a draft or published.
Yep images work, and agree that nested bullet points are the biggest remaining issue. I'm planning to fix that in the next week or two.
Edit: Actually I just noticed the cropping issue, images that are cropped in google docs get uncropped when imported. That's pretty annoying. There is no way to carry over the cropping but we could flag these to make sure you don't accidentally submit a post with the uncropped images.
I have thought this might be quite useful to do. I would guess (people can confirm/correct me) a lot of people have a workflow like:
Edit post in Google doc
Copy into Forum editor, make a few minor tweaks
Realise they want to make larger edits, go back to the Google doc to make these, requiring them to either copy over or merge together the minor tweaks they have made
For this case being able to import/export both ways would be useful. That said it's much harder to do the other way (we would likely have to build up the Google doc as a series of edits via the api, whereas in our case we can handle the whole post exported as html quite naturally), so I wouldn't expect us to do this in the near future unfortunately.
There might be other issues that we don't know about. Please report any bugs or give any other feedback by replying to this quick take, you can also contact us in the usual ways.
2 nitpicks:
The title of the doc is imported as the 1st paragraph of the EA Forum post, instead of being imported as the title.
Blank lines without spacing before and after in the doc are not imported, although I personally think this is a feature! Blank lines without spacing before and after in the footnotes of the doc are imported, but I would rather not have them imported.
I'll think about how we could handle this one better. It's tricky because the doc itself as a title, and then people often rewrite the title as a heading inside the doc, so there isn't an obvious choice for what to use as the title. But it may be true that the heading case is a lot more common so we should make that the default.
That was indeed intended as a feature, because a lot of people use blank lines as a paragraph break. We can add that to footnotes too.
I'll set a reminder to reply here when we've done these.
We would love it if people started adding these to job opportunities to nudge people to apply. To add a button, select some text to make the toolbar appear and then click the "Insert button" icon (see below)
As always, feedback on the design/implementation is very welcome.
I came across this spreadsheet buried in a comment thread. I don't know who made it (maybe @MarcusAbramovitch knows) but, it's great. It shows a breakdown of all OP grants by cause area and organisation, updated automatically every day:
This is great. Do we know if all grants are assigned to only one area? In other words, if I want to know the total amount spent on farmed animal welfare more broadly, is it appropriate to add up the sums for "Farmed Animal Welfare" + "Broiler Chicken Welfare" + "Cage-Free Reforms" + . . . .
You can see the raw data in the final tab, everything is only given one "Focus Area", and there are some obviously less-than-ideal codings (e.g. "F.R.E.E. — Broiler and Cage-Free Reforms in Romania" is "Farm Animal Welfare" but not "Broiler Chicken Welfare")
Edit: sorry I didn't read your question properly, I think the answer to whether it's appropriate to add up "Farmed Animal Welfare" + "Broiler Chicken Welfare" + "Cage-Free Reforms" etc is yes
Thanks! In your example, one could argue for placing F.R.E.E. in "Cage-Free Reforms" as well. That might explain the use of "Farmed Animal Welfare" for that grant, since there may have been a decent argument for multiple more specific categories.
Thanks for re-sharing! Unfortunately, these make it quite unclear how much they've given to EA. (I assume it's a large chunk of 'GCR Capacity Building'
I'd be interested in any feedback people have about the analytics page we added recently. You can get to it via "Post stats" in the user menu.
Specific areas of feedback that would be helpful:
Are the stats displayed useful to you and easy to understand, are there other stats that you think would be more useful?
Feedback on the design/layout
Do you endorse it as a concept (i.e. you might think this incentivises engagement bait or something, if you notice that motivation in yourself that would be useful to know)?
Is it broken in any way? Is it annoyingly slow?
We also updated the analytics page for individual posts at the same time so feedback on that would be helpful as well (you can get to this via the three dot menu on posts you have published, or via the "Post stats" page).
I would like to see how many people read to the end. I have an 18 min read post, and I can see that many people read for 1m 30s so probably read the tl;dr or summary, but it does tell me that a number of people did read for much longer. I'd like to know how many read it to the end so I can tell if maybe posts need to be much shorter.
May not be possible, but wishlist idea would be to know a bit more about who these users are in terms of interest demographic. If my post is AI Safety but most of those who click away fast have animal welfare listed as their primary interest then that's not an issue. But if my post is AI Safety and lots of AI Safety people didn't read or engage, then that's an issue I need to fix. This may not be feasible to implement however.
All in all I like it, and it feels like it gives insight into developing better engagement habits rather than encouraging clickbait.
wishlist idea would be to know a bit more about who these users are in terms of interest demographic
Interesting, I hadn't thought of that. It would be possible to implement something like this, one issue would be that a lot of people don't fill out the topics they are interested in, but it could be possible to get round this by basing it on the other posts they read rather than their stated interests.
The thing about reading to the end is something I was planning to add, I have been envisioning a graph like this on the post analytics page (see below, random example from google), would that fit what you're looking for?
Perhaps having a forum option to add tags to your profile, if you're interested in this data being collected?
For the reading to end, yes that would be fantastic. Obviously it can't tell what people were looking at when they left, but based on average reading time it would help pinpoint just how long certain topics should be.
Great to see the stuff you guys are up to with the forum though!
Thanks for doing this, Will (and EA Forum team), and for asking for feedback!
Are the stats displayed useful to you and easy to understand, are there other stats that you think would be more useful?
Yes, I think the displayed stats are useful. I would also be happy to see:
The bounce rate and mean reading time in the general analytics page too, as opposed to just in the analytics page of each post.
Some metrics related to the distribution of the reading time. For example, the 10th and 90th percentile reading time. It would also be good to have these in the general page, and that of each post.
Feedback on the design/layout
I like it.
Do you endorse it as a concept (i.e. you might think this incentivises engagement bait or something, if you notice that motivation in yourself that would be useful to know)?
I endorse it, but I think it would be worth adding a warning about people being mindful about what they are optimising for. Something like the following. "Note karma and reads may not correlate well with impact, and maximising the impact of your EA Forum posts may not be your best strategy to maximise the overall impact of your life/career". Alternatively, you can publish a post on the dangers of non-ideal incentives in the context of EA Forum engagement, and then just link to it in the analytics page.
The neutral point of wellbeing is often associated with a state of not much going on in terms of sensory stimulus, e.g. the Parfit “muzak and potatoes” vision of lives barely worth living. This seems natural, because it matches up zero (valenced) sensory input with zero net wellbeing. But there is actually no reason for these two to exactly coincide, it’s allowed for the mere lack of stimulation to feel mildly pleasant or unpleasant.
If the mere lack of stimulation feels pleasant, then the neutral point of wellbeing would correspond to what common sense might recognise as experiencing mild suffering, such as sitting on an uncomfortable chair but otherwise having all your needs attended to (and not sitting for long enough to become bored). And vice versa if the lack of stimulation feels unpleasant by default.
For me, recognising these two types of neutralness aren’t coupled together pushes in the direction of thinking of the neutral point of wellbeing as mild suffering, rather than “true neutralness” or mild pleasure. If I imagine a situation that is maximally neutral, like walking around a bland city not thinking of anything in particular, that feels comfortably inside life-worth-living territory (at least to do for a short time). If I try to imagine a situation that is borderline not worth experiencing, I find it hard to do without including some fairly bad suffering. Sitting in an aeroplane is the thing that springs to mind for this, but that is actively very uncomfortable.
Equating stimulation-neutralness and wellbeing-neutralness leads to being quick to declare lives as net negative, helped along by the fact that the extremes of suffering seem more intense than the extremes of pleasure.
You look at a gazelle and say “Well, it spends 80% of its time just wandering around grazing on grass (0 wellbeing points), 10% starving, being chased by predators, or being diseased in some way (-1000 wellbeing points), and 10% doing whatever gazelles do for fun (+500 wellbeing points)”, so it’s life is net negative overall. But it could be that the large amount of time animals spend doing fairly lowkey activities is quite positive, and I find this to be more intuitive than the other way around (where neutral activities are slightly negative).
When searching just now I came across this quick take which argues for the exact opposite position in the Parfit example:
A life of just muzak and potatoes isn’t even close to being worth living. … Parfit’s general idea that a life that is barely worth living might be one with no pains and only very minor pleasures seems reasonable enough, but he should have realised that boredom and loneliness are severe pains in themselves.
It’s surprising how people’s intuitions differ on this! Although, I could salvage agreement with @JackM by saying that he’s supposing the boredom and loneliness are noticeably unpleasant and so this isn't a good example of a neutral state.
I think the intuition behind the muzak-and-potatoes example is thrown off by supposing you experience exactly the same things for your whole life, even imagining much more exciting music and tastier food as your only experience feels grotesque in a different way. But imagining being in a room with muzak and potatoes for a couple of hours seems fine.
This is an edited version of a memo I shared within the online team at CEA. It’s about the forum, but you could also make it about other stuff. (Note: this is just my personal opinion)
There's this stylised fact about war that almost none of the deaths are caused by gunshots, which is surprising given that for the average soldier war consists of walking around with a gun and occasionally pointing it at people. Whether or not this is actually true, the lesson that quoters of this fact are trying to teach is that the possibility of something happening can have a big impact on the course of events, even if it very rarely actually happens.
[warning: analogy abuse incoming]
I think a similar thing can happen on the forum, and trying to understand what’s going on in a very data driven way will tend to lead us astray in cases like this.
A concrete example of this is people being apprehensive about posting on the forum, and saying this is because they are afraid of criticism. But if you go and look through all the comments there aren’t actually that many examples of well intentioned posts being torn apart. At this point if you’re being very data minded you would say “well I guess people are wrong, posts don’t actually get torn apart in the comments; so we should just encourage people to overcome their fear of posting (or something)”.
I think this is probably wrong because something like this happens: users correctly identify that people would tear their post apart if it was bad, so they either don’t write the post at all, or they put a lot of effort into making it good. The result of this is that the amount of realised harsh criticism on the forum is low, and the quality of posts is generally high (compared to other forums, facebook, etc).
I would guess that criticising actually-bad posts even more harshly would in fact lower the total amount of criticism, for the same reason that hanging people for stealing bread probably lowered the theft rate among victorian street urchins (this would probably also be bad for the same reason)
Comparing average Brier scores between people only makes sense if they have made predictions on exactly the same questions, because making predictions on more certain questions (such as "will there be a 9.0 earthquake in the next year?") will tend to give you a much better Brier score than making predictions on more uncertain questions (such as "will this coin come up head or tails?"). This is one of those things that lots of people know but then everyone (including me) keeps using them anyway because it's a nice simple number to look at.
The Brier score for a binary prediction is the squared difference between the predicted probability and the actual outcome (O−p)2. For a given forecast, predicting the correct probability will give you the minimum possible Brier score (which is what you want). But this minimum possible score varies depending on the true probability of the event happening.
For the coin flip the true probability is 0.5, so if you make a perfect prediction you will get a Brier score of 0.25 (=0.5∗(1−0.5)2+0.5∗(0−0.5)2). For the earthquake question maybe the correct probability is 0.1, so the best expected Brier score you can get is 0.09 (=0.1∗(1−0.1)2+0.9∗(0−0.9)2), and it's only if you are really badly wrong (you think p>0.5) that you can get a score higher than the best score you can get for the coin flip.
So if forecasters have a choice of questions to make predictions on, someone who mainly goes for things that are pretty certain will end up with a (much!) better average Brier score than someone who predicts things that are genuinely more 50/50. This also acts as a disincentive for predicting more uncertain things which seems bad.
We've just added Fatebook (which is great!) to our slack and I've noticed this putting me off making forecasts for things that are highly uncertain. I'm interested in if there is some lore around dealing with this among people who use Metaculus or other platforms where Brier scores are an important metric. I only really use prediction markets, which don't suffer from this problem.
Yeah, I'm starting to believe that a severe limitation on Brier scores is this inability to use them in a forward-looking way. Brier scores reflect the performance of specific people on specific questions and using them as evidence for future prediction performance seems really fraught...but it's the best we have as far as I can tell.
Tyler Cowen has this criticism of prediction markets which is like (paraphrased, plus slightly made up and mixed with my own opinions): "The whole concept is based on people individually trying to maximise their wealth, and this resulting in wealth accruing to the better predictors over time. But then in real life people just bet these token amounts that add up to way less than the money they get from their salary or normal investments. This completely defeats the point! You may as well just take the average probability at that point rather than introducing this overcomplicated mechanism".
Play money can fix this specific problem, because you can make it so everyone starts with the same amount, whereas real money is constantly streaming in and out for reasons other than your ability to predict esoteric world events. I think this is an underrated property of play money markets, as opposed to the usual arguments about risk aversion. (Of course if you can buy play money with real money this muddies the waters quite a bit)
You can now import posts directly from Google docs
Plus, internal links to headers[1] will now be mapped over correctly. To import a doc, make sure it is public or shared with "eaforum.posts@gmail.com"[2], then use the widget on the new/edit post page:
Importing a doc will create a new (permanently saved) version of the post, but will not publish it, so it's safe to import updates into posts that are already published. You will need to click the "Publish Changes" button to update the live post.
Everything that previously worked on copy-paste[3] will also work when importing, with the addition of internal links to headers (which only work when importing).
There are still a few things that are known not to work:
Nested bullet points(these are working now)Cropped images get uncropped(also working now)There might be other issues that we don't know about. Please report any bugs or give any other feedback by replying to this quick take, you can also contact us in the usual ways.
Appendix: Version history
There are some minor improvements to the version history editor[4] that come along with this update:
Here's what it would look like just after you import a Google doc, but before you publish the changes. Note that the latest version isn't bold, indicating that it is not showing publicly:
Previously the link would take you back to the original doc, now it will take you to the header within the Forum post as you would expect. Internal links to bookmarks (where you link to a specific text selection) are also partially supported, although the link will only go to the paragraph the text selection is in
Sharing with this email address means that anyone can access the contents of your doc if they have the url, because they could go to the new post page and import it. It does mean they can't access the comments at least
I'm not sure how widespread this knowledge is, but previously the best way to copy from a Google doc was to first "Publish to the web" and then copy-paste from this published version. In particular this handles footnotes and tables, whereas pasting directly from a regular doc doesn't. The new importing feature should be equal to this publish-to-web copy-pasting, so will handle footnotes, tables, images etc. And then it additionally supports internal links
Accessed via the "Version history" button in the post editor
For most intents and purposes you can think of "live" as meaning "showing publicly". There is a bit of a sharp corner in this definition, in that the post as a whole can still be a draft.
To spell this out: There can be many different versions of a post body, only one of these is attached to the post, this is the "live" version. This live version is what shows on the non-editing view of the post. Independently of this, the post as a whole can be a draft or published.
Oh wow actually so happy about this, had definitely been an annoying challenge getting formatting right!
Omg what, this is amazing(though nested bullets not working does seem to make this notably less useful). Does it work for images?
Ok nested bullets should be working now :)
Yep images work, and agree that nested bullet points are the biggest remaining issue. I'm planning to fix that in the next week or two.
Edit: Actually I just noticed the cropping issue, images that are cropped in google docs get uncropped when imported. That's pretty annoying. There is no way to carry over the cropping but we could flag these to make sure you don't accidentally submit a post with the uncropped images.
Although I’m no expert, maybe next you could try to be able to convert/download posts into google docs? Super cool btw.
I have thought this might be quite useful to do. I would guess (people can confirm/correct me) a lot of people have a workflow like:
For this case being able to import/export both ways would be useful. That said it's much harder to do the other way (we would likely have to build up the Google doc as a series of edits via the api, whereas in our case we can handle the whole post exported as html quite naturally), so I wouldn't expect us to do this in the near future unfortunately.
Thanks, Will!
2 nitpicks:
Thanks for reporting!
I'll set a reminder to reply here when we've done these.
PSA: You can now add buttons to posts and comments:
Click here to maximise utilityWe would love it if people started adding these to job opportunities to nudge people to apply. To add a button, select some text to make the toolbar appear and then click the "Insert button" icon (see below)
As always, feedback on the design/implementation is very welcome.
Breakdown of Open Philanthropy grants to date
I came across this spreadsheet buried in a comment thread. I don't know who made it (maybe @MarcusAbramovitch knows) but, it's great. It shows a breakdown of all OP grants by cause area and organisation, updated automatically every day:
This is the post introducing the spreadsheet: EffectiveAltruismData.com is now a spreadsheet from @Hamish McDoodles
oh hey
cool to see people are finding this useful
This is great. Do we know if all grants are assigned to only one area? In other words, if I want to know the total amount spent on farmed animal welfare more broadly, is it appropriate to add up the sums for "Farmed Animal Welfare" + "Broiler Chicken Welfare" + "Cage-Free Reforms" + . . . .
You can see the raw data in the final tab, everything is only given one "Focus Area", and there are some obviously less-than-ideal codings (e.g. "F.R.E.E. — Broiler and Cage-Free Reforms in Romania" is "Farm Animal Welfare" but not "Broiler Chicken Welfare")
Edit: sorry I didn't read your question properly, I think the answer to whether it's appropriate to add up "Farmed Animal Welfare" + "Broiler Chicken Welfare" + "Cage-Free Reforms" etc is yes
Thanks! In your example, one could argue for placing F.R.E.E. in "Cage-Free Reforms" as well. That might explain the use of "Farmed Animal Welfare" for that grant, since there may have been a decent argument for multiple more specific categories.
Thanks for re-sharing! Unfortunately, these make it quite unclear how much they've given to EA. (I assume it's a large chunk of 'GCR Capacity Building'
I'd be interested in any feedback people have about the analytics page we added recently. You can get to it via "Post stats" in the user menu.
Specific areas of feedback that would be helpful:
We also updated the analytics page for individual posts at the same time so feedback on that would be helpful as well (you can get to this via the three dot menu on posts you have published, or via the "Post stats" page).
I would like to see how many people read to the end. I have an 18 min read post, and I can see that many people read for 1m 30s so probably read the tl;dr or summary, but it does tell me that a number of people did read for much longer. I'd like to know how many read it to the end so I can tell if maybe posts need to be much shorter.
May not be possible, but wishlist idea would be to know a bit more about who these users are in terms of interest demographic. If my post is AI Safety but most of those who click away fast have animal welfare listed as their primary interest then that's not an issue. But if my post is AI Safety and lots of AI Safety people didn't read or engage, then that's an issue I need to fix. This may not be feasible to implement however.
All in all I like it, and it feels like it gives insight into developing better engagement habits rather than encouraging clickbait.
Interesting, I hadn't thought of that. It would be possible to implement something like this, one issue would be that a lot of people don't fill out the topics they are interested in, but it could be possible to get round this by basing it on the other posts they read rather than their stated interests.
The thing about reading to the end is something I was planning to add, I have been envisioning a graph like this on the post analytics page (see below, random example from google), would that fit what you're looking for?
Perhaps having a forum option to add tags to your profile, if you're interested in this data being collected?
For the reading to end, yes that would be fantastic. Obviously it can't tell what people were looking at when they left, but based on average reading time it would help pinpoint just how long certain topics should be.
Great to see the stuff you guys are up to with the forum though!
Thanks for doing this, Will (and EA Forum team), and for asking for feedback!
Yes, I think the displayed stats are useful. I would also be happy to see:
I like it.
I endorse it, but I think it would be worth adding a warning about people being mindful about what they are optimising for. Something like the following. "Note karma and reads may not correlate well with impact, and maximising the impact of your EA Forum posts may not be your best strategy to maximise the overall impact of your life/career". Alternatively, you can publish a post on the dangers of non-ideal incentives in the context of EA Forum engagement, and then just link to it in the analytics page.
It looks good to me.
The neutral point of wellbeing is often associated with a state of not much going on in terms of sensory stimulus, e.g. the Parfit “muzak and potatoes” vision of lives barely worth living. This seems natural, because it matches up zero (valenced) sensory input with zero net wellbeing. But there is actually no reason for these two to exactly coincide, it’s allowed for the mere lack of stimulation to feel mildly pleasant or unpleasant.
If the mere lack of stimulation feels pleasant, then the neutral point of wellbeing would correspond to what common sense might recognise as experiencing mild suffering, such as sitting on an uncomfortable chair but otherwise having all your needs attended to (and not sitting for long enough to become bored). And vice versa if the lack of stimulation feels unpleasant by default.
For me, recognising these two types of neutralness aren’t coupled together pushes in the direction of thinking of the neutral point of wellbeing as mild suffering, rather than “true neutralness” or mild pleasure. If I imagine a situation that is maximally neutral, like walking around a bland city not thinking of anything in particular, that feels comfortably inside life-worth-living territory (at least to do for a short time). If I try to imagine a situation that is borderline not worth experiencing, I find it hard to do without including some fairly bad suffering. Sitting in an aeroplane is the thing that springs to mind for this, but that is actively very uncomfortable.
Equating stimulation-neutralness and wellbeing-neutralness leads to being quick to declare lives as net negative, helped along by the fact that the extremes of suffering seem more intense than the extremes of pleasure.
You look at a gazelle and say “Well, it spends 80% of its time just wandering around grazing on grass (0 wellbeing points), 10% starving, being chased by predators, or being diseased in some way (-1000 wellbeing points), and 10% doing whatever gazelles do for fun (+500 wellbeing points)”, so it’s life is net negative overall. But it could be that the large amount of time animals spend doing fairly lowkey activities is quite positive, and I find this to be more intuitive than the other way around (where neutral activities are slightly negative).
When searching just now I came across this quick take which argues for the exact opposite position in the Parfit example:
It’s surprising how people’s intuitions differ on this! Although, I could salvage agreement with @JackM by saying that he’s supposing the boredom and loneliness are noticeably unpleasant and so this isn't a good example of a neutral state.
I think the intuition behind the muzak-and-potatoes example is thrown off by supposing you experience exactly the same things for your whole life, even imagining much more exciting music and tastier food as your only experience feels grotesque in a different way. But imagining being in a room with muzak and potatoes for a couple of hours seems fine.
Most deaths in war aren’t from gunshots
This is an edited version of a memo I shared within the online team at CEA. It’s about the forum, but you could also make it about other stuff. (Note: this is just my personal opinion)
There's this stylised fact about war that almost none of the deaths are caused by gunshots, which is surprising given that for the average soldier war consists of walking around with a gun and occasionally pointing it at people. Whether or not this is actually true, the lesson that quoters of this fact are trying to teach is that the possibility of something happening can have a big impact on the course of events, even if it very rarely actually happens.
[warning: analogy abuse incoming]
I think a similar thing can happen on the forum, and trying to understand what’s going on in a very data driven way will tend to lead us astray in cases like this.
A concrete example of this is people being apprehensive about posting on the forum, and saying this is because they are afraid of criticism. But if you go and look through all the comments there aren’t actually that many examples of well intentioned posts being torn apart. At this point if you’re being very data minded you would say “well I guess people are wrong, posts don’t actually get torn apart in the comments; so we should just encourage people to overcome their fear of posting (or something)”.
I think this is probably wrong because something like this happens: users correctly identify that people would tear their post apart if it was bad, so they either don’t write the post at all, or they put a lot of effort into making it good. The result of this is that the amount of realised harsh criticism on the forum is low, and the quality of posts is generally high (compared to other forums, facebook, etc).
I would guess that criticising actually-bad posts even more harshly would in fact lower the total amount of criticism, for the same reason that hanging people for stealing bread probably lowered the theft rate among victorian street urchins (this would probably also be bad for the same reason)
A complaint about using average Brier scores
Comparing average Brier scores between people only makes sense if they have made predictions on exactly the same questions, because making predictions on more certain questions (such as "will there be a 9.0 earthquake in the next year?") will tend to give you a much better Brier score than making predictions on more uncertain questions (such as "will this coin come up head or tails?"). This is one of those things that lots of people know but then everyone (including me) keeps using them anyway because it's a nice simple number to look at.
To explain:
The Brier score for a binary prediction is the squared difference between the predicted probability and the actual outcome (O−p)2. For a given forecast, predicting the correct probability will give you the minimum possible Brier score (which is what you want). But this minimum possible score varies depending on the true probability of the event happening.
For the coin flip the true probability is 0.5, so if you make a perfect prediction you will get a Brier score of 0.25 (=0.5∗(1−0.5)2+0.5∗(0−0.5)2). For the earthquake question maybe the correct probability is 0.1, so the best expected Brier score you can get is 0.09 (=0.1∗(1−0.1)2+0.9∗(0−0.9)2), and it's only if you are really badly wrong (you think p>0.5) that you can get a score higher than the best score you can get for the coin flip.
So if forecasters have a choice of questions to make predictions on, someone who mainly goes for things that are pretty certain will end up with a (much!) better average Brier score than someone who predicts things that are genuinely more 50/50. This also acts as a disincentive for predicting more uncertain things which seems bad.
We've just added Fatebook (which is great!) to our slack and I've noticed this putting me off making forecasts for things that are highly uncertain. I'm interested in if there is some lore around dealing with this among people who use Metaculus or other platforms where Brier scores are an important metric. I only really use prediction markets, which don't suffer from this problem.
Note: this also applies to log scores etc
Yeah, I'm starting to believe that a severe limitation on Brier scores is this inability to use them in a forward-looking way. Brier scores reflect the performance of specific people on specific questions and using them as evidence for future prediction performance seems really fraught...but it's the best we have as far as I can tell.
Tyler Cowen has this criticism of prediction markets which is like (paraphrased, plus slightly made up and mixed with my own opinions): "The whole concept is based on people individually trying to maximise their wealth, and this resulting in wealth accruing to the better predictors over time. But then in real life people just bet these token amounts that add up to way less than the money they get from their salary or normal investments. This completely defeats the point! You may as well just take the average probability at that point rather than introducing this overcomplicated mechanism".
Play money can fix this specific problem, because you can make it so everyone starts with the same amount, whereas real money is constantly streaming in and out for reasons other than your ability to predict esoteric world events. I think this is an underrated property of play money markets, as opposed to the usual arguments about risk aversion. (Of course if you can buy play money with real money this muddies the waters quite a bit)