Is There An AI Safety GiveWell?

Michaël Trazzi

Is There An AI Safety GiveWell?

Michaël Trazzi

1 min read · Sep 5, 2025

Comments 12

Sorted by

New & upvoted

MichaelDickens

10mo*

The simple answer is no, there is no AI safety GiveWell.

For two reasons:

It is not possible to come up with a result like "donating $10M to this org would reduce extinction risk from AI by 2035 by 0.01-0.1%", at anywhere close to the same level of rigor as GiveWell does. (But sounds like you already knew that.)
It would be possible to come up with extremely rough cost-effectiveness estimates. To make the estimates, you'd have to make up some numbers for highly uncertain inputs, readers would widely disagree about what those inputs should be, and changing the inputs would radically change the results. Someone could still make that estimate if they wanted to, ~~but nobody has done it.~~ Nuno Sempere created some cost-effectiveness models in 2021. This is the only effort like this that I'm aware of.

In spite of the problems with cost-effectiveness estimates in AI safety, I still think they're underrated and that people should put more work into quantifying their beliefs.

MichaelDickens

10mo*

BTW I am open to commissions to do this sort of work; DM me if you're interested. For example I recently made a back-of-the-envelope model comparing the cost-effectiveness of AnimalHarmBench vs. conventional animal advocacy for improving superintelligent AI's values with respect to animal welfare. That model should give a sense of what kind of result to expect. (Note for example the extremely made-up inputs.)

Artūrs Kaņepājs

10mo

Thanks for the estimate, very helpful! The cost for AnimalHarmBench in particular was approximately 10x lower than assumed. 30k is my quick central overall estimate, mostly time cost of the main contributors. This excludes previous work, described in the article, and downstream costs. In-house implementation in AI companies could add to the cost a lot but I don't think this should enter the cost effectiveness calculation for comparison vs other forms of advocacy.

Mo Putera

10mo

Someone could still make that estimate if they wanted to, but nobody has done it.

My impression was Nuno Sempere did a lot of this, e.g. here way back in 2021?

NunoSempere

10mo

You might also enjoy https://forum.effectivealtruism.org/s/AbrRsXM2PrCrPShuZ and https://github.com/NunoSempere/SoGive-CSER-evaluation-public

MichaelDickens

10mo

My mistake, yes he did do that. I'll edit my answer.

aog

9mo*

Agreed with the other answers on the reasons why there's no GiveWell for AI safety. But in case it's helpful, I should say that Longview Philanthropy offers advice to donors looking to give >$100K per year to AI safety. Our methodology is a bit different from GiveWell’s, but we do use cost-effectiveness estimates. We investigate funding opportunities across the AI landscape from technical research to field-building to policy in the US, EU, and around the world, trying to find the most impactful opportunities for the marginal donor. We also do active grantmaking, such as our calls for proposals on hardware-enabled mechanisms and digital sentience. More details here. Feel free to reach out to [email protected] or [email protected] if you'd like to learn more.

ClaireZabel

10mo

In addition to what Michael said, there are a number of other barriers:

Compared to many global health interventions, AI is a more rapidly-changing field and many believe we have less time to have an impact, leading to a lot more updates-per-time about cost effectiveness, and making each estimate less useful. E.g. interventions like research on mechanistic interpretability can come into and out of fashion in a small number of years. Organizations focused on working with one political party might drop vastly in expected effectiveness after an election, etc. In contrast, GiveWell relies on studies that took longer to conduct than most of the AI safety field has existed (e.g. my understanding is Cisse et al 2016 took 8 years from start to publication; 8 years ago, about 2.5x longer than ChatGPT has existed in any form)
There is probably a much smaller base of small-to-mid-sized donors responsive to these estimates, making them less valuable
There are a large number of quite serious philosophical and empirical complexities associated with comparing GiveWell and longtermist-relevant charities, like your views about population ethics, total utilitarianism vs preference utilitarianism (vs others), the expected number of moral patients in the far future, acausal trade, etc.
[I work at Open Phil on AI safety and used to work at GiveWell, but my views are my own]

Austin

9mo

We've been considering an effort like this on Manifund's side, and will likely publish some (very rudimentary) results soon!

Here are some of my guesses why this hasn't happened already:

As others mentioned, longtermism/xrisk work has long feedback loops, and the impact of different kinds of work is very sensitive to background assumptions
AI safety is newer as a field -- it's more like early-stage venture funding (which is about speculating on unproven teams and ideas) or academic research, rather than public equities (where there's lots of data for analysts to go over)
AI safety is also a tight-knit field, so impressions travel by word of mouth rather than through public analyses
It takes a special kind of person to be able to do Givewell-type analyses well; grantmaking skill is rare. It then takes some thick skin to publish work that's critical of people in a tight-knit field
OpenPhil and Longview don't have much incentive to publish their own analyses (as opposed to just showing them to their own donors); they'll get funded either way, and on the flip side, publishing their work exposes them to downside risk

Hillary

2mo

Did you/Manifund publish anything on this yet?

Austin

2mo

Yes, Marcus Abramovitch and I put out this piece analyzing cost-effectiveness for AI safety youtubers specifically.

Manifund doesn't have other pieces in the pipeline, but I would love for more work of this kind to exist, and I know other initiatives like https://grantmaking.ai/ are interested in finding qualified folks to do this kind of analysis at scale.

Neel Nanda

9mo

Agreed with the other comments for why this is doomed. The thing closest to this that I think might make sense, is something like, "conditioned on the following assumptions/worldview we estimate that this intervention for an extra million dollars can have the following effect". I think that anything that doesn't acknowledge the fact that there are enormous fundamental cruxes here is pretty doomed. but that there might be something productive about clustering the space of worldviews and talking about what makes sense by the lights of each

Comments