Thanks for writing this!
Since your decision seems to come down to the expected positive effect on your happiness, I'm curious whether you considered even cheaper happiness-boosting interventions. For example, hundreds (thousands?) of hours of meditation might give you the "love, belonging, connection" and "personal growth" benefits with fewer downsides, though this might work less reliably than having kids.
To your questions:
That's right. When defined using a base 2 logarithm, the score can be interpreted as "bits of information over the maximally uncertain (uniform) distribution". Forecasts assigning less probability mass to the true outcome than the uniform distribution result in a negative score.
Have you considered contacting the authors of the original QF paper? Glenn and Vitalik seem quite approachable. You could also post the paper on the RxC discord or (if you're willing to go for a high-effort alternative) submit it to their next conference.
Thanks for writing this up!
I think your (largely negative) results on QF under incomplete information should be more widely known. I consider myself to be relatively “plugged” into the online communities that have discussed QF the most (RxC, crypto, etc.) and I only learned about your paper a couple of months ago.
Here are a few more scattered thoughts prompted by the post:
I've been thinking about regranting on and off for about a year, specifically about whether it makes sense to use bespoke mechanisms like quadratic funding or some of its close cousins. I still don't know where I land on many design choices, so I won't say more about that now.
I'm not aware of any retrospective on FTXFF's program but it might be a good idea to do it when we have enough information to evaluate performance (so in 6-12 months?) Another thing in this vein that I think would be valuable and could happen right away is looking into SFF's S-process.
Are you pulling data from Manifold at all or is the backend "just" a squiggle model? If the latter, did you embed the markets by hand or are you automating it by searching the node text on Manifold and pulling the first market that pops up or something like that?
Thanks! That's a reasonable strategy if you can choose question wording. I agree there's no difference mathematically, but I'm not so sure that's true cognitively. Sometimes I've seen asymmetric calibration curves that look fine >50% but tend to overpredict <50%. That suggests it's easier to stay calibrated in the subset of questions you think are more likely to happen than not. This is good news for your strategy! However, note that this is based on a few anecdotal observations, so I'd caution against updating too strongly on it.
Glad you brought up real money markets because the real choice here isn't "5 unpaid superforecasters" vs "200 unpaid average forecasters" but "5 really good people who charge $200/h" vs "200 internet anons that'll do it for peanuts". Once you notice the difference in unit labor costs, the question becomes: for a fixed budget, what's the optimal trade-off between crowd size and skill? I'm really uncertain about that myself and have never seen good data on it.
I wonder what would happen if you were to do the same exercise with the fixed-year predictions under a 'constant risk' model, i.e. P(t) = 1 - exp(-l*t) with l = - year / log(1 - P(year)), to get around the problem that we're still 3 years away from 2026. Given that timelines are systematically longer with a fixed-year framing, I would expect the Brier score of those predictions would be worse. OTOH, the constant risk model doesn't seem very reasonable here, so the results wouldn't have a straightforward interpretation.
P(t) = 1 - exp(-l*t)
l = - year / log(1 - P(year))