Situational Awareness: A Two-Year Scorecard

Edison

Comments 2

Sorted by

New & upvoted

I think you could argue that "open source fades, propreitary moat" is a miss but not a clear miss. Yes there is a proliferation of decent open source models, but they are waaaaaay behind the fronteir and considering the money that's pouring into the frontier companies, investors clearly think propreitary has a moat.

If the arms race argument is the central argument, I would argue that there is a clear moat. The open source models are not close enough to the frontier to reduce the urgency of an arms race.

Edison

1mo

This is a fair push, and it splits into two questions that I think have different answers.

On "miss vs clear miss": I take the point. My "Wrong" is graded against the specific claim that open source would fade — and 3-6 months behind frontier, with the pricing collapse, is the opposite of fading. But you're right that "fade" and "no durable moat" aren't the same proposition, and a reader could reasonably hold that the first is wrong while the second is open. I'd defend "clear miss" on the narrow wording but I won't pretend the margin is huge — which is exactly why it's the one verdict with an explicit flip condition (back to Open if the gap re-widens past ~18 months for two straight generations).

On the arms-race / moat point — this is the more interesting one, and I think we're partly talking past each other. "Investors think proprietary has a moat" and "there is a moat that reduces arms-race urgency" can both be true or both be false independently. Capex pouring in is consistent with a capability lead (frontier labs ship first) without implying a diffusion moat (the lead staying scarce). Aschenbrenner's geopolitical argument needs the second, not the first — the worry was that locking down weights/algorithms denies adversaries the capability. If a near-frontier open model is downloadable months later, the lockdown buys time, not denial. So I'd actually frame your closing line as the open question rather than the settled one: are the open models close enough to change arms-race urgency? I think mid-2026 evidence leans yes more than the 2024 essay assumed, but I hold that loosely and it's the part I'd most like to be wrong about.

Where would you put the gap that would make you say the moat is real — months, or capability tiers?

Comments

Curated and popular this week

Hard-to-reverse decisions destroy option value

Stefan_Schubert·9y ago·Curated 3d ago·14m read

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·4d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. Crossposted to LessWrong. ...

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·6d ago·2m read

TL;DR: Marginal Victories is a new initiative to provide 1:1 career advising, opportunities, and resources for people exploring high-leverage U.S. democracy preservation and political work. Built by impact-oriented people doing pro-democracy work, the early MVP is now up at marginalvictories.org. Fill out the 10-minute form now to receive these resources as they become available over the next few...

Recent opportunities to take action

Job: Executive Director of CEEALAR (EA Hotel)

CEEALAR·12h ago·3m read

Amsterdam Insect Protest

Bentham's Bulldog·1d ago·3m read

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·6d ago·2m read

Edison

1mo

This is a fair push, and it splits into two questions that I think have different answers.

Where would you put the gap that would make you say the moat is real — months, or capability tiers?

Prediction	Evidence as of mid-2026	Verdict
Models outpace college graduates across knowledge work by 2025/26	Frontier models score ~83% on knowledge-work benchmarks (GPT-5.4 on GDPval); agentic coding ~80% on SWE-Bench Pro (Claude Fable 5); agent products in production use	On track
Effective compute scales ~0.5 OOM/yr (compute) + ~0.5 OOM/yr (algorithms)	Delisle's audit found the pace "roughly supported," with individual launches scattered ±0.5 OOM around the trendline	On track
Massive capex acceleration toward trillion-dollar scale	Investment has run ahead of his projections; this is his most clearly vindicated claim	On track (exceeded)
Open source fades; proprietary algorithms create a durable US moat	DeepSeek V4 and Qwen 3.7 Max sit ~3–6 months behind the proprietary frontier, at a fraction of the price, with genuine architectural innovation	Wrong
AGI by 2027: models do the work of an AI researcher/engineer	Agentic coding is strong, but autonomous end-to-end research remains undemonstrated; 18 months left on the clock	Open
US government launches a formal AGI project by 27/28	National-security involvement is growing; no Project; deadline not yet elapsed	Open
Intelligence explosion 2027–29	—	Pending
Superintelligence and decisive advantage, 2030s	—	Pending

Forecaster	AGI timeline	Note
Elon Musk (xAI)	By end of 2026	Most aggressive public claim
Leopold Aschenbrenner	2027 "strikingly plausible"	The subject of this scorecard
Demis Hassabis (DeepMind)	~50% by 2030	Cautious lab leader
Samotsvety	~28% by 2030	Strongest forecasting track record
Metaculus community	25% by 2029 · 50% by 2033	Feb 2026 update, ~2,000 forecasters
Andrej Karpathy	~A decade out	Architecture skeptic
AI researcher survey (n=2,778)	50% by 2040	Academic median

Situational Awareness: A Two-Year Scorecard

Why another retrospective

Grading framework

The scorecard (June 2026)

The one clear miss, and why it matters more than it looks

Calibration context: his 2027 vs everyone else

Pre-registered: what would flip each verdict

Known weaknesses of this exercise