The AI Eval Singularity is Near
* AI capabilities seem to be doubling every 4-7 months
* Humanity's ability to measure capabilities is growing much more slowly
* This implies an "eval singularity": a point at which capabilities grow faster than our ability to measure them
* It seems like the singularity is ~here in cybersecurity, CBRN, and AI R&D (supporting quotes below)
* It's possible that this is temporary, but the people involved seem pretty worried
Appendix - quotes on eval saturation
Opus 4.6
* "For AI R&D capabilities, we found that Claude Opus 4.6 has saturated most of our
automated evaluations, meaning they no longer provide useful evidence for ruling out ASL-4 level autonomy. We report them for completeness, and we will likely discontinue them going forward. Our determination rests primarily on an internal survey of Anthropic staff, in which 0 of 16 participants believed the model could be made into a drop-in replacement for an entry-level researcher with scaffolding and tooling improvements within three months."
* "For ASL-4 evaluations [of CBRN], our automated benchmarks are now largely saturated and no longer provide meaningful signal for rule-out (though as stated above, this is not indicative of harm; it simply means we can no longer rule out certain capabilities that may be pre-requisities to a model having ASL-4 capabilities)."
* It also saturated ~100% of the cyber evaluations
Codex-5.3
* "We are treating this model as High [for cybersecurity], even though we cannot be certain that it actually has these capabilities, because it meets the requirements of each of our canary thresholds and we therefore cannot rule out the possibility that it is in fact Cyber High."
@Ryan Greenblatt and I are going to record another podcast together (see the previous one here). We'd love to hear topics that you'd like us to discuss. (The questions people proposed last time are here, for reference.) We're most likely to discuss issues related to AI, but a broad set of topics other than "preventing AI takeover" are on topic. E.g. last time we talked about the cost to the far future of humans making bad decisions about what to do with AI, and the risk of galactic scale wild animal suffering.
Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.
(A) Technical
(i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It's worth developing good assessment mechanisms/instincts about these.
(ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It's worth understanding what they're doing with AGI cos and what their theories of change are.
(iii) Orgs that don't seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don't do well on remote work tasks -- only 2.5% of tasks -- in contrast to OpenAI's findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of "economically valuable, real-world tasks" tasks.
(iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It's worth understanding why they want to make these bets, including whether it's their comparative advantage, an alignment with their incentives/grants, or whether they
Mildly against the Longtermism --> GCR shift
Epistemic status: Pretty uncertain, somewhat rambly
TL;DR replacing longtermism with GCRs might get more resources to longtermist causes, but at the expense of non-GCR longtermist interventions and broader community epistemics
Over the last ~6 months I've noticed a general shift amongst EA orgs to focus less on reducing risks from AI, Bio, nukes, etc based on the logic of longtermism, and more based on Global Catastrophic Risks (GCRs) directly. Some data points on this:
* Open Phil renaming it's EA Community Growth (Longtermism) Team to GCR Capacity Building
* This post from Claire Zabel (OP)
* Giving What We Can's new Cause Area Fund being named "Risk and Resilience," with the goal of "Reducing Global Catastrophic Risks"
* Longview-GWWC's Longtermism Fund being renamed the "Emerging Challenges Fund"
* Anecdotal data from conversations with people working on GCRs / X-risk / Longtermist causes
My guess is these changes are (almost entirely) driven by PR concerns about longtermism. I would also guess these changes increase the number of people donation / working on GCRs, which is (by longtermist lights) a positive thing. After all, no-one wants a GCR, even if only thinking about people alive today.
Yet, I can't help but feel something is off about this framing. Some concerns (no particular ordering):
1. From a longtermist (~totalist classical utilitarian) perspective, there's a huge difference between ~99% and 100% of the population dying, if humanity recovers in the former case, but not the latter. Just looking at GCRs on their own mostly misses this nuance.
* (see Parfit Reasons and Persons for the full thought experiment)
2. From a longtermist (~totalist classical utilitarian) perspective, preventing a GCR doesn't differentiate between "humanity prevents GCRs and realises 1% of it's potential" and "humanity prevents GCRs realises 99% of its potential"
* Preventing an extinction-level GCR might move u
AI governance could be much more relevant in the EU, if the EU was willing to regulate ASML. Tell ASML they can only service compliant semiconductor foundries, where a "compliant semicondunctor foundry" is defined as a foundry which only allows its chips to be used by compliant AI companies.
I think this is a really promising path for slower, more responsible AI development globally. The EU is known for its cautious approach to regulation. Many EAs believe that a cautious, risk-averse approach to AI development is appropriate. Yet EU regulations are often viewed as less important, since major AI firms are mostly outside the EU. However, ASML is located in the EU, and serves as a chokepoint for the entire AI industry. Regulating ASML addresses the standard complaint that "AI firms will simply relocate to the most permissive jurisdiction". Advocating this path could be a high-leverage way to make global AI development more responsible without the need for an international treaty.
I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt.
Write a letter to the CA and DE Attorneys General
I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and AGs are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them.
Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction.
Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.
The economist Tyler Cowen linked to my post on self-driving cars, so it ended up getting a lot more readers than I ever expected. I hope that more people now realize, at the very least, self-driving cars are not an uncontroversial, uncomplicated AI success story. In discussions around AGI, people often say things along the lines of: ‘deep learning solved self-driving cars, so surely it will be able to solve many other problems'. In fact, the lesson to draw is the opposite: self-driving is too hard a problem for the current cutting edge in deep learning (and deep reinforcement learning), and this should make us think twice before cavalierly proclaiming that deep learning will soon be able to master even more complex, more difficult tasks than driving.
PSA: If you're doing evals things, every now and then you should look back at OpenPhil's page on capabilities evals to check against their desiderata and questions in sections 2.1-2.2, 3.1-3.4, 4.1-4.3 as a way to critically appraise the work you're doing.