Project Idea: Profiles Aggregating Forecasting Performance Metrics

Damien Laird

Project Idea: Profiles Aggregating Forecasting Performance Metrics

Damien Laird

4 min readApr 17, 2023

Comments 2

Sorted by

New & upvoted

niplav

I like this idea :-)

I think that there are some tricky questions about comparing across different forecasters and their predictions. If you simply take Brier score, this can be Goodharted: people can choose the "easiest" questions and get way better scores than the ones taking on difficult questions.

I can think of some attempts to go at this:

Ranking forecasters:
- For two forecasters, they get ranked according to their Brier scores on questions they have both forecasted on. I fear that this will lead to cyclical rankings, which could be dealt with using the Smith set or Hodge decomposition.
- Forecasters are ranked according to their performance relative to all other forecasters on each question. (Making easier questions less impactful on a forecasters score).
I'd like to look into credibility theory to see whether it has some insights into ranking with different sample sizes since IMDb uses it for ranking movies.

Damien Laird

I agree with your concerns on using a pure Brier score with open platforms. I expect that currently it makes the most sense within "tournaments" where participants are answering every question. Technically, I think some sort of objective, proper scoring rule is a prerequisite to a more advanced scoring system that conveys more useful information in open contexts.

I've seen some sort of a "relative Brier score" referenced frequently in associated research (definitely in the good judgement project papers, at a minimum) that scored forecasters based on the difficulty of each question, as determined by the performance of others who forecasted it. This seems promising, and I expect there are a lot of options in that direction.

Comments

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

133

Let's taboo the V-word

lincolnq·2d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·4h ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

Project Idea: Profiles Aggregating Forecasting Performance Metrics

Project Idea: Profiles Aggregating Forecasting Performance Metrics

Summary

What I’m Proposing

Technical Challenges

Why Do This?

The Current State of Platform Metrics

Metaculus

Manifold Markets

Good Judgement Open

INFER