Javier Prieto

Program Assistant, AI Governance & Policy @ Open Philanthropy
231 karmaJoined Nov 2021Working (0-5 years)


Cool app!

Are you pulling data from Manifold at all or is the backend "just" a squiggle model? If the latter, did you embed the markets by hand or are you automating it by searching the node text on Manifold and pulling the first market that pops up or something like that?

Thanks! That's a reasonable strategy if you can choose question wording. I agree there's no difference mathematically, but I'm not so sure that's true cognitively. Sometimes I've seen asymmetric calibration curves that look fine >50% but tend to overpredict <50%. That suggests it's easier to stay calibrated in the subset of questions you think are more likely to happen than not. This is good news for your strategy! However, note that this is based on a few anecdotal observations, so I'd caution against updating too strongly on it.

Glad you brought up real money markets because the real choice here isn't "5 unpaid superforecasters" vs "200 unpaid average forecasters" but "5 really good people who charge $200/h" vs "200 internet anons that'll do it for peanuts". Once you notice the difference in unit labor costs, the question becomes: for a fixed budget, what's the optimal trade-off between crowd size and skill? I'm really uncertain about that myself and have never seen good data on it.

Great analysis!

I wonder what would happen if you were to do the same exercise with the fixed-year predictions under a 'constant risk' model, i.e. P(t) = 1 - exp(-l*t) with l = - year / log(1 - P(year)), to get around the problem that we're still 3 years away from 2026. Given that timelines are systematically longer with a fixed-year framing, I would expect the Brier score of those predictions would be worse. OTOH, the constant risk model doesn't seem very reasonable here, so the results wouldn't have a straightforward interpretation.

This is really cool! As someone who's been doing these calculations in a somewhat haphazard way using a mix of pen and paper, spreadsheets, and Python scripts for years, it's nice to see someone put in the work to create a polished product that others can use.

Something that I've been meaning to incorporate to my estimates and that would be a killer feature for an app like this is a reasonable projection of future earnings, under the assumption that you'll get promoted / switch career paths at the average rate for someone in your current position. Sprinkle a bit of uncertainty on top, and you can get out a nice probability distribution over "time at FI" and "total money donated".

A product I would personally like to see because it'd be tremendously useful to me is "personal finance for nomadic EAs" i.e. if location is at most a minor constraint for you, where should you move to maximize the resources available to the effective charities of your choice? I expect that, for most people without such constraints, packing up and leaving is probably much more effective than fine-tuning the strategy to the place where they currently reside.

Your likelihood_pool method is returning Brier scores >1. How is that possible? Also, unless you extremize, it should yield the same aggregates (and scores) as regular geometric mean of odds, no?

Thanks for posting this! I think this topic is extremely neglected and the lack of side effects among natural short-sleepers strongly suggests that there could be interventions with no obvious downsides.

My main concern with your drug-centered approach is: what if the causal path from short-sleeper genes to a short-sleeper phenotype flows through nerodevelopmental pathways, such that once neural structures are locked-in in adulthood it's not possible to induce the desired phenotype by mimicking the direct effects of the genes? If this is true, then reaping the benefits of short-sleeper genes would seem to require genetic engineering (I doubt embryo selection would scale given the low frequency of the target alleles). This would obviously be politically problematic and I'm not sure it'd be technically feasible right away (last time I checked, CRISPR people were worried about off-target mutations, but I'm not up to date with that literature so this may not be an issue anymore).

Have you considered holding out some languages at random to assess the impact of the program? You could e.g. delay funding for some languages by 1-2 years and try to estimate the difference in some relevant outcome during that period. I understand this may be hard or undesirable for several reasons (finding and measuring the right outcomes, opportunity costs, managing grantee expectations).

Load more