This post was drafted with substantial LLM assistance (Claude) — research synthesis, drafting, and editing — then reviewed, fact-checked, and revised by me. The claims and judgments are mine; I take responsibility for them.
Epistemic status: Describing a tool we've built and the reasoning behind it. The core claim — that timeline statements from frontier labs are systematically incentive-compromised — I hold with high confidence and think is straightforwardly evidenced. The specific instrument (the Singularity Index) is a first attempt with real limitations, laid out below. The first full elicitation wave is still being assembled, so the current public reading is preliminary. I'm posting in part to recruit critique and qualified respondents. Conflict of interest disclosed at the end.
"AGI by [year]" is usually read as a forecast. A forecast is falsifiable: it names a definition you can check and a date you can hold against the calendar, and it costs the forecaster something to be wrong. Strip those properties out and what remains is a marketing claim in a forecast's clothing — and the fastest tell is to ask who profits when the date is believed.
The track record is not subtle:
None of this is an argument that timelines are unknowable. It's an argument that the people with the largest financial stake in the answer are the worst-placed to issue it — and that we currently have no widely-cited reading produced by anyone without that stake. (The full argument, with sources for each claim, is linked at the bottom.)
The Singularity Index (SI) is one number on a 0.0–1.0 scale, where 1.0 (Ω) is the superintelligence threshold and the score is an estimate of the share of the distance already closed — not a raw capability rating.
It's a hybrid measure: quantitative signals from observable frontier capability, combined with structured expert judgment. Specifically:
The same operation also watches what the labs do between waves — changes to their terms, privacy, and data practices — and writes them up in plain language. A recent worked example is a sourced look at how identity and age verification across the six labs has consolidated into two vendors (linked below). That monitoring is what keeps the Index grounded in what is actually shipping, not just what is announced. It is already live; the dashboard and these write-ups are public today.
I'd rather state the limitations than have them found in the comments:
- It is not a capability benchmark or a dated point-prediction. "Distance to superintelligence" is a contestable construct; Ω is not crisply operationalized, and the score is a structured collective judgment, not a measurement in the physical sense.
- Expert elicitation has well-documented failure modes (overconfidence, correlated priors, poor calibration on novel regimes). We don't claim to escape them. Reporting the spread rather than a false-precision point estimate is a partial mitigation, not a solution.
- Expert selection is the live problem. Who counts as qualified, and how a non-representative pool biases the median, is the part I most want red-teamed. The current pool is small; early readings should be treated as preliminary and wide.
- Independence removes one bias, not all of them. Being unpaid by the labs removes the financing incentive. It does not make the respondents right.
Frontier Watch is published by Q16 PBC, a public benefit corporation. Q16 has no frontier model and no funding round riding on the timeline — that independence is the design rationale. But it is a commercial product: the baseline reading is free, and there's a paid tier ("Pro") for the underlying analysis. I have a financial interest in the project, which is the reason I'm disclosing it plainly rather than burying it. I don't think it compromises the measure — the incentive runs toward accuracy, not toward any particular date — but you should weigh it yourselves.
This is where I'd genuinely value EA input:
1. Critique the methodology — especially domain selection and weighting, the aggregation approach, and expert-pool construction. Comments welcome; I'll engage with substantive objections.
2. Qualified experts: take part in the first wave. If you have relevant expertise in frontier AI capabilities, governance, or forecasting, you can request the elicitation instrument. Respondents may be acknowledged or remain anonymous; individual scores are never attributed.
3. Tell me what would make this decision-relevant for you. If there's a cut of the data or a validity check that would move it from "interesting" to "useful," I want to hear it.
- The full argument, with the sourced timeline of how the AGI definition has been bent to fit financing: https://q16pbc.com/blog/how-far-are-we-from-superintelligence
- The live dashboard and current (preliminary) baseline reading: https://watch.q16pbc.com
- Q16 blog — the monitoring write-ups: https://q16pbc.com/blog
- Worked example — how identity and age verification across the six labs has consolidated into two vendors: https://q16pbc.com/blog/ai-labs-identity-verification
- Request the elicitation instrument (qualified experts): [email protected]
I'll be in the comments. The labs will keep announcing; the goal here is to keep score from outside the process — and to do it transparently enough that you can tell me where it's wrong.