J

jackc

9 karmaJoined

Comments
3

Thanks for the update. I think the rest of the calculations and parameters look about right actually (I didn't look at all the other sources, but the ones you use as defaults in the spreadsheet look reasonable at least). And I was doing an independent estimate that came to about the same ballpark as what I get using your spreadsheet with the corrected Samotsvety figures. So I think the post basically still holds up except that the numbers shift more in favor of AI safety work if you use the Samotsvety x-risk estimates - and the original parameters still serve as a conservative estimate.

jackc
2
0
0
1
1

This is a great article! But I am unsure where you got your default estimates attributed to Samotsvety (of which I am a member):

You cite a Samotsvety forecast of 0.75% probability of AI catastrophe this century - but https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts puts it at 25%. I expect our current forecasts would be in that ballpark too - 0.75% is lower than even the lowest individual's forecast!

And you cite a 2033 AGI timeline from Samotsvety (Jan 2026), but I am pretty sure Samotsvety didn't do a AGI timelines forecast in Jan 2026. I found a bunch of articles on the internet that incorrectly referred to a Jan 2026 forecast but actually linked to the 2023 forecast so perhaps that's where you got it from? The 2023 forecast is now quite outdated - we've done more recent informal forecasting exercises but only with a subset of the team. I personally think the best timelines forecasts are from the AI 2027 folks https://www.aifuturesmodel.com/

Thanks for the post, great analysis!

I'd be interested to see more on how accuracy on Manifold changes with the number of traders and overall trading volume.

In my anecdotal experience, Manifold accuracy improves a lot with more trading, just as you'd expect. Some of the markets in this dataset probably only ever got single-digit number of trades and were obviously mispriced (with nobody seeing them to correct the mispricing). I occasionally see this happen on Metaculus too, but much less often in my experience - I would guess this is in large part because the question set on Metaculus is highly curated and much smaller.

I'd be interested in what the analysis looks like if restricted to questions with at least a certain number of forecasts or forecasters.

The curve for Manifold looks more spiky and less smooth. I expect this to be largely a function of the number of forecasters and the trading volume. To me, the spikes mostly look like noise.

Yeah, the narrow spikes on Manifold are mostly noise due to inexperienced traders not understanding liquidity and price slippage, which are quickly corrected. The platform has changed a lot over the last year, liquidity was generally improved, so I think the amount of those noise spikes has decreased. I'd be curious if you ran the comparison on earlier vs later questions whether we'd see a significant difference in relative performance (although it would be hard to distinguish that from random noise from looking at different questions in a different time period).

it looks like on the set of questions I analyzed Metaculus forecasts tended to update faster.

This is a very interesting observation, and I think it's largely coming from the Metaculus having more predictors update their predictions over time than on Manifold, on this particular question set. Prediction markets that are well-traded should update much faster than Metaculus because there is a large profit incentive for being the first to update the market price with new information, which typically happens within minutes on the most popular markets, whereas Metaculus predictions just reward an update as much as it affects your time-averaged score. Metaculus's recency weighting works fairly well at updating quickly, but we're usually talking about days, not minutes.