As uncertainty grows around how AI development will affect culture and society, it becomes more valuable to compare track records of predictions about technological progress.
I've recently been working on automating parts of the methodology from Arb's Scoring The Big 3's Predictive Performance report[1], and have had some promising preliminary results. I hope to try to automate most of the steps in the original report, making it feasible to analyse many more track records and publish the results.
I am particularly interested in the following questions:
- Which track record(s) would you find valuable to have evaluated in a similar way to Asimov, Clarke and Heinlein’s, as in the Arb report?
- What would you want to see from an LLM-based evaluation that would give you confidence that the results are meaningful and accurate?
- ^
See also original Cold Takes post explaining why such evaluations are valuable
Great thanks!
We have two outputs in mind with this project:
1. Reports on a specific thinker (e.g. Gwern) or body of work's predictions. These would probably be published individually or showing interesting comparisons, similar to the Futurists track record in Cold Takes (based on Arb's Big Three research)
2. A dashboard ranking the track records of lots of thinkers
For (2), I agree that cherry picking would be bad, and we'd want it to cover a good range.
For our initial outputs from (1) though, I'm excited about specifically picking thinkers who people would find it especially useful to understand their track record (or to have a good-quality assessment of it that they can cite). Curious if you have thoughts of specific people who fit the bill for you?