Can't think of anything better than a t-test, but open for suggestions.
If a forecaster is consistently off by like 10 percentage points - I think that is a difference that matters. But even in that extreme scenario where the (simulated) difference between two forecasters is in fact quite large, we have a hard time picking that up using standard significance tests.
Interesting, thanks for sharing the paper. Yeah agree that using the Brier score / log score might change results and it would definitely be good to check that as well.
In principle yes. In practice also usually yes, but the specifics depend on whether the average user who predicted on a question gets a positive amount of points. So if you predicted very late and your points are close to zero, but the mean number of points forecasters on that question received is positive, then you will end up with a negative update to your reputation score.
Completely agree that a lot hinges on that reputation score. It seems to work decent for the Metaculus Prediction, but it would be good to see what results look like for a different metric of past performance.
Not sure how to quantify that (open for ideas). But intuitively I agree with you and would suspect it's at least a sizable part
It should be possible to fully automate the bot and just run a CRON job that regularly checks the Metaculus API for new questions, right?
I slightly tend towards yes, but that's mere intuition. As someone on Twitter put it, "Metaculus has a more hardcore user base, because it's less fun" - I find it plausible that the Metaculus user base and the Manifold user base differs. But higher trading volume I think would have helped.
For this particular analysis I'm not sure correcting for the number of forecasters would really be possible in a sound way. It would be great to get the MetaculusBot more active again to collect more data.
For Metaculus there are lots of ways to drive engagement: prioritise making the platform easier to use, increase cash prizes, community building and outreach etc.
But as mentioned in the article the problem in practice is that the bootstrap answer is probably misleading, as increasing the number of forecasters likely changes forecaster composition.
However, one specific example where the analysis might be actually applicable is when you're thinking about how many Pro Forecasters you hire for a job.
In principle yes, you'll just still always have the problem that people are predicting at different time points. If the best and the 2nd best predict weeks or months apart then that changes results.
Ah snap! I forgot to remove that paragraph... I did subsampling initially, then switched to bootstrapipng. Resulsts remained virtually unchanged. Thanks for pointing that out, will update the text.
Hi Simon, I'm working on a follow-up to this post that uses individual-level data. Could you please give some detail on how you "sampled" k predictors? As in, did you have access to individual data and could actually do the sampling? I'm not entirely sure what the x-axis in your plot means and what the difference betwenn ">N predictors" and "k predictors" is. Thank you!
I acknowledge that transparency is complex, that there are trade-offs and that it isn't clear what the correct amount of transparency is. I also acknowledge that it is normal that regular grants are published with a delay. So I'm not making a general claim or demand that everything needs to be public (I even explicitly say that). What I say is that
a) I'm in favour of valuing transparency highly by default.
b) I feel in this specific case more communication would have helped
My intuition is that Open Phil overall is quite transparent. I was less s...
Yeah, I think we basically agree on all of the points here, and I apologize that my characterization of your claim was, in fact, uncharitable.
I think your criticism of bikeshedding somewhat misses the point people are raising. Of course the amount of money spent on WA is tiny compared to other things. The reason it's worth talking about it is that it tells you something about EA culture and how EA operates.
This is in large parts a discussion about what culture the movement should have, what EA wants to be and how it wants to communicate to the world. The reason you care about how someone builds a bike shed is because that carries information about what kind of person they are, how trustwor...
Agreed. Effective Altruism embodies a set of values. I agree with these values. I was incredibly worried that CEA/EVF was making a big decision (15 million remains a large amount! It's millions of bednets!) that didn't embody these values. This is why I made the "Why did CEA purchase Wytham Abbey?" post. We shouldn't put too much weight on PR, spin and appearance. But we should care a lot about not losing track of what EA is all about. How EVF went about purchasing Wytham Abbey might translate to how they spend money in other areas as well, including high-...
$1000 per person per day just for the place seems pretty expensive for a 30 person conference...
I'm confused why you wouldn't feel concerned about EA potentially wasting 15M pounds (talking about your hypothetical example, not the real purchase). I feel that would mean that EA is not living up to its own standards of using evidence and reasoning to help others in the best possible way.
Since EA isn't optimizing the goal "flip houses to make a profit", I expect us to often be willing to pay more for properties than we'd expect to sell them for. Paying 2x is surprising, but it doesn't shock me if that sort of thing is worth it for some reason I'm not currently tracking.
MIRI recently spent a year scouring tens of thousands of properties in the US, trying to find a single one that met conditions like "has enough room to fit a few dozen people", "it's legal to modify the buildings or construct a new one on the land if we want to", and "near b...
Currently there are 4 people (including me) working on the project. I focus on coordination, the other three are professional forecasters and focus on the data collection. At the moment we're aiming for wide feedback from anyone who would be interested in certain base rates, but we're not actively crowd-sourcing the collection process.
Also there is now a bot (@effective_jobs) that retweets high impact job offers that may be relevant to the EA community
Also there is this Twitter bot (@EAForumPosts) that tweets top posts from the EA Forum: https://forum.effectivealtruism.org/posts/29zReRacRD6RyLxuz/a-twitter-bot-that-regularly-tweets-current-top-posts-from
Ah yes, I had seen the @ealtruist account - but as far as I can tell someone did that manually (it doesn't look like a bot) and then stopped.
We could also merge the two - use the old account with this code or something. In principle open for that
Good comment, thank you!